1
|
Ren P, Bai T, Sun F. Bio-inspired two-stage network for efficient RGB-D salient object detection. Neural Netw 2025; 185:107244. [PMID: 39933318 DOI: 10.1016/j.neunet.2025.107244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2024] [Revised: 01/15/2025] [Accepted: 01/30/2025] [Indexed: 02/13/2025]
Abstract
Recently, with the development of the Convolutional Neural Network and Vision Transformer, the detection accuracy of the RGB-D salient object detection (SOD) model has been greatly improved. However, most of the existing methods cannot balance computational efficiency and performance well. In this paper, inspired by the P visual pathway and the M visual pathway in the primate biological visual system, we propose a Bio-inspired Two-stage Network for Efficient RGB-D SOD, named BTNet. It simulates the visual information processing of the P visual pathway and the M visual pathway. Specifically, BTNet contains two stages: region locking and object refinement. Among them, the region locking stage simulates the visual information processing process of the M visual pathway to obtain coarse-grained visual representation. The object refinement stage simulates the visual information processing process of the P visual pathway to obtain fine-grained visual representation. Experimental results show that BTNet outperforms other state-of-the-art methods on six mainstream benchmark datasets, achieving significant parameter reduction and processing 384 × 384 resolution images at a speed of 175.4 Frames Per Second (FPS). Compared with the cutting-edge method CPNet, BTNet reduces parameters by 93.6% and is nearly 7.2 times faster. The source codes are available at https://github.com/ROC-Star/BTNet.
Collapse
Affiliation(s)
- Peng Ren
- College of Computer Science and Technology, Jilin University, Changchun 130012, China; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012, China
| | - Tian Bai
- College of Computer Science and Technology, Jilin University, Changchun 130012, China; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012, China.
| | - Fuming Sun
- School of Information and Communication Engineering, Dalian Minzu University, Dalian 116600, China
| |
Collapse
|
2
|
Duan S, Yang X, Wang N, Gao X. Lightweight RGB-D Salient Object Detection From a Speed-Accuracy Tradeoff Perspective. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2025; 34:2529-2543. [PMID: 40249695 DOI: 10.1109/tip.2025.3560488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/20/2025]
Abstract
Current RGB-D methods usually leverage large-scale backbones to improve accuracy but sacrifice efficiency. Meanwhile, several existing lightweight methods are difficult to achieve high-precision performance. To balance the efficiency and performance, we propose a Speed-Accuracy Tradeoff Network (SATNet) for Lightweight RGB-D SOD from three fundamental perspectives: depth quality, modality fusion, and feature representation. Concerning depth quality, we introduce the Depth Anything Model to generate high-quality depth maps,which effectively alleviates the multi-modal gaps in the current datasets. For modality fusion, we propose a Decoupled Attention Module (DAM) to explore the consistency within and between modalities. Here, the multi-modal features are decoupled into dual-view feature vectors to project discriminable information of feature maps. For feature representation, we develop a Dual Information Representation Module (DIRM) with a bi-directional inverted framework to enlarge the limited feature space generated by the lightweight backbones. DIRM models texture features and saliency features to enrich feature space, and employ two-way prediction heads to optimal its parameters through a bi-directional backpropagation. Finally, we design a Dual Feature Aggregation Module (DFAM) in the decoder to aggregate texture and saliency features. Extensive experiments on five public RGB-D SOD datasets indicate that the proposed SATNet excels state-of-the-art (SOTA) CNN-based heavyweight models and achieves a lightweight framework with 5.2 M parameters and 415 FPS. The code is available at https://github.com/duan-song/SATNet.
Collapse
|
3
|
Guan H, Lin J, Lau RWH. A Contrastive-Learning Framework for Unsupervised Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2025; 34:2487-2498. [PMID: 40227896 DOI: 10.1109/tip.2025.3558674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2025]
Abstract
Existing unsupervised salient object detection (USOD) methods usually rely on low-level saliency priors, such as center and background priors, to detect salient objects, resulting in insufficient high-level semantic understanding. These low-level priors can be fragile and lead to failure when the natural images do not satisfy the prior assumptions, e.g., these methods may fail to detect those off-center salient objects causing fragmented objects in the segmentation. To address these problems, we propose to eliminate the dependency on flimsy low-level priors, and extract high-level saliency from natural images through a contrastive learning framework. To this end, we propose a Contrastive Saliency Network (CSNet), which is a prior-free and label-free saliency detector, with two novel modules: 1) a Contrastive Saliency Extraction (CSE) module to extract high-level saliency cues, by mimicking the human attention mechanism within an instance discriminative task through a contrastive learning framework, and 2) a Feature Re-Coordinate (FRC) module to recover spatial details, by calibrating high-level features with low-level features in an unsupervised fashion. In addition, we introduce a novel local appearance triplet (LAT) loss to assist the training process by encouraging similar saliency scores for regions with homogeneous appearances. Extensive experiments show that our approach is effective and outperforms state-of-the-art methods on popular SOD benchmarks.
Collapse
|
4
|
Song X, Tan Y, Li X, Hei X. GDVIFNet: A generated depth and visible image fusion network with edge feature guidance for salient object detection. Neural Netw 2025; 188:107445. [PMID: 40209304 DOI: 10.1016/j.neunet.2025.107445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2024] [Revised: 03/25/2025] [Accepted: 03/27/2025] [Indexed: 04/12/2025]
Abstract
In recent years, despite significant advancements in salient object detection (SOD), performance in complex interference environments remains suboptimal. To address these challenges, additional modalities like depth (SOD-D) or thermal imaging (SOD-T) are often introduced. However, existing methods typically rely on specialized depth or thermal devices to capture these modalities, which can be costly and inconvenient. To address this limitation using only a single RGB image, we propose GDVIFNet, a novel approach that leverages Depth Anything to generate depth images. Since these generated depth images may contain noise and artifacts, we incorporate self-supervised techniques to generate edge feature information. During the process of generating image edge features, the noise and artifacts present in the generated depth images can be effectively removed. Our method employs a dual-branch architecture, combining CNN and Transformer-based branches for feature extraction. We designed the step trimodal interaction unit (STIU) to fuse the RGB features with the depth features from the CNN branch and the self-cross attention fusion (SCF) to integrate RGB features with depth features from the Transformer branch. Finally, guided by edge features from our self-supervised edge guidance module (SEGM), we employ the CNN-Edge-Transformer step fusion (CETSF) to fuse features from both branches. Experimental results demonstrate that our method achieves state-of-the-art performance across multiple datasets. Code can be found at https://github.com/typist2001/GDVIFNet.
Collapse
Affiliation(s)
- Xiaogang Song
- Xi'an University of Technology, School of Computer Science and Engineering, Xi'an, 710048, China; Engineering Research Center of Human-machine integration intelligent robot, Universities of Shaanxi Province, Xi'an, 710048, China.
| | - Yuping Tan
- Xi'an University of Technology, School of Computer Science and Engineering, Xi'an, 710048, China.
| | - Xiaochang Li
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China.
| | - Xinhong Hei
- Xi'an University of Technology, School of Computer Science and Engineering, Xi'an, 710048, China; Engineering Research Center of Human-machine integration intelligent robot, Universities of Shaanxi Province, Xi'an, 710048, China.
| |
Collapse
|
5
|
Hong L, Wang X, Zhang G, Zhao M. USOD10K: A New Benchmark Dataset for Underwater Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2025; 34:1602-1615. [PMID: 37058379 DOI: 10.1109/tip.2023.3266163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Underwater salient object detection (USOD) is an emerging research area that has great potential for various underwater visual tasks. However, USOD research is still in its early stage due to the lack of large-scale datasets within which salient objects are well-defined and pixel-wise annotated. To address this issue, this paper introduces a new dataset named USOD10K. It contains 10,255 underwater images, covering 70 categories of salient objects in 12 different underwater scenes. Moreover, the USOD10K provides salient object boundaries and depth maps of all images. The USOD10K is the first large-scale dataset in the USOD community, making a significant leap in diversity, complexity, and scalability. Secondly, a simple but strong baseline termed TC-USOD is proposed for the USOD10K. The TC-USOD adopts a hybrid architecture based on an encoder-decoder design that leverages transformer and convolution as the basic computational building block of the encoder and decoder, respectively. Thirdly, we make a comprehensive summarization of 35 state-of-the-art SOD/USOD methods and benchmark them on the existing USOD dataset and the USOD10K. The results show that our TC-USOD achieves superior performance on all datasets tested. Finally, several other use cases of the USOD10K are discussed, and future directions of USOD research are pointed out. This work will promote the development of the USOD research and facilitate further research on underwater visual tasks and visually-guided underwater robots. To pave the road in the USOD research field, the dataset, code, and benchmark results are publicly available: https://github.com/Underwater-Robotic-Lab/USOD10K.
Collapse
|
6
|
Zhou W, Guo Q, Lei J, Yu L, Hwang JN. IRFR-Net: Interactive Recursive Feature-Reshaping Network for Detecting Salient Objects in RGB-D Images. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:4132-4144. [PMID: 34415839 DOI: 10.1109/tnnls.2021.3105484] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Using attention mechanisms in saliency detection networks enables effective feature extraction, and using linear methods can promote proper feature fusion, as verified in numerous existing models. Current networks usually combine depth maps with red-green-blue (RGB) images for salient object detection (SOD). However, fully leveraging depth information complementary to RGB information by accurately highlighting salient objects deserves further study. We combine a gated attention mechanism and a linear fusion method to construct a dual-stream interactive recursive feature-reshaping network (IRFR-Net). The streams for RGB and depth data communicate through a backbone encoder to thoroughly extract complementary information. First, we design a context extraction module (CEM) to obtain low-level depth foreground information. Subsequently, the gated attention fusion module (GAFM) is applied to the RGB depth (RGB-D) information to obtain advantageous structural and spatial fusion features. Then, adjacent depth information is globally integrated to obtain complementary context features. We also introduce a weighted atrous spatial pyramid pooling (WASPP) module to extract the multiscale local information of depth features. Finally, global and local features are fused in a bottom-up scheme to effectively highlight salient objects. Comprehensive experiments on eight representative datasets demonstrate that the proposed IRFR-Net outperforms 11 state-of-the-art (SOTA) RGB-D approaches in various evaluation indicators.
Collapse
|
7
|
Chen G, Wang Q, Dong B, Ma R, Liu N, Fu H, Xia Y. EM-Trans: Edge-Aware Multimodal Transformer for RGB-D Salient Object Detection. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:3175-3188. [PMID: 38356213 DOI: 10.1109/tnnls.2024.3358858] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/16/2024]
Abstract
RGB-D salient object detection (SOD) has gained tremendous attention in recent years. In particular, transformer has been employed and shown great potential. However, existing transformer models usually overlook the vital edge information, which is a major issue restricting the further improvement of SOD accuracy. To this end, we propose a novel edge-aware RGB-D SOD transformer, called EM-Trans, which explicitly models the edge information in a dual-band decomposition framework. Specifically, we employ two parallel decoder networks to learn the high-frequency edge and low-frequency body features from the low- and high-level features extracted from a two-steam multimodal backbone network, respectively. Next, we propose a cross-attention complementarity exploration module to enrich the edge/body features by exploiting the multimodal complementarity information. The refined features are then fed into our proposed color-hint guided fusion module for enhancing the depth feature and fusing the multimodal features. Finally, the resulting features are fused using our deeply supervised progressive fusion module, which progressively integrates edge and body features for predicting saliency maps. Our model explicitly considers the edge information for accurate RGB-D SOD, overcoming the limitations of existing methods and effectively improving the performance. Extensive experiments on benchmark datasets demonstrate that EM-Trans is an effective RGB-D SOD framework that outperforms the current state-of-the-art models, both quantitatively and qualitatively. A further extension to RGB-T SOD demonstrates the promising potential of our model in various kinds of multimodal SOD tasks.
Collapse
|
8
|
Wei W, Wei P, Liao Z, Qin J, Cheng X, Liu M, Zheng N. Semantic Consistency Reasoning for 3-D Object Detection in Point Clouds. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:3356-3369. [PMID: 38113156 DOI: 10.1109/tnnls.2023.3341097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2023]
Abstract
Point cloud-based 3-D object detection is a significant and critical issue in numerous applications. While most existing methods attempt to capitalize on the geometric characteristics of point clouds, they neglect the internal semantic properties of point and the consistency between the semantic and geometric clues. We introduce a semantic consistency (SC) mechanism for 3-D object detection in this article, by reasoning about the semantic relations between 3-D object boxes and its internal points. This mechanism is based on a natural principle: the semantic category of a 3-D bounding box should be consistent with the categories of all points within the box. Driven by the SC mechanism, we propose a novel SC network (SCNet) to detect 3-D objects from point clouds. Specifically, the SCNet is composed of a feature extraction module, a detection decision module, and a semantic segmentation module. In inference, the feature extraction and the detection decision modules are used to detect 3-D objects. In training, the semantic segmentation module is jointly trained with the other two modules to produce more robust and applicable model parameters. The performance is greatly boosted through reasoning about the relations between the output 3-D object boxes and segmented points. The proposed SC mechanism is model-agnostic and can be integrated into other base 3-D object detection models. We test the proposed model on three challenging indoor and outdoor benchmark datasets: ScanNetV2, SUN RGB-D, and KITTI. Furthermore, to validate the universality of the SC mechanism, we implement it in three different 3-D object detectors. The experiments show that the performance is impressively improved and the extensive ablation studies also demonstrate the effectiveness of the proposed model.
Collapse
|
9
|
Tang H, Li Z, Zhang D, He S, Tang J. Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T Salient Object Detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; PP:1958-1974. [PMID: 40030445 DOI: 10.1109/tpami.2024.3511621] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
RGB-Thermal Salient Object Detection (RGB-T SOD) aims to pinpoint prominent objects within aligned pairs of visible and thermal infrared images. A key challenge lies in bridging the inherent disparities between RGB and Thermal modalities for effective saliency map prediction. Traditional encoder-decoder architectures, while designed for cross-modality feature interactions, may not have adequately considered the robustness against noise originating from defective modalities, thereby leading to suboptimal performance in complex scenarios. Inspired by hierarchical human visual systems, we propose the ConTriNet, a robust Confluent Triple-Flow Network employing a "Divide-and-Conquer" strategy. This framework utilizes a unified encoder with specialized decoders, each addressing different subtasks of exploring modality-specific and modality-complementary information for RGB-T SOD, thereby enhancing the final saliency map prediction. Specifically, ConTriNet comprises three flows: two modality-specific flows explore cues from RGB and Thermal modalities, and a third modality-complementary flow integrates cues from both modalities. ConTriNet presents several notable advantages. It incorporates a Modality-induced Feature Modulator (MFM) in the modality-shared union encoder to minimize inter-modality discrepancies and mitigate the impact of defective samples. Additionally, a foundational Residual Atrous Spatial Pyramid Module (RASPM) in the separated flows enlarges the receptive field, allowing for the capture of multi-scale contextual information. Furthermore, a Modality-aware Dynamic Aggregation Module (MDAM) in the modality-complementary flow dynamically aggregates saliency-related cues from both modality-specific flows. Leveraging the proposed parallel triple-flow framework, we further refine saliency maps derived from different flows through a flow-cooperative fusion strategy, yielding a high-quality, full-resolution saliency map for the final prediction. To evaluate the robustness and stability of our approach, we collect a comprehensive RGB-T SOD benchmark, VT-IMAG, covering various real-world challenging scenarios. Extensive experiments on public benchmarks and our VT-IMAG dataset demonstrate that ConTriNet consistently outperforms state-of-the-art competitors in both common and challenging scenarios, even when dealing with incomplete modality data. The code and VT-IMAG will be available at: https://cser-tang-hao.github.io/contrinet.html.
Collapse
|
10
|
Song Z, Kang X, Wei X, Li S. Pixel-Centric Context Perception Network for Camouflaged Object Detection. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:18576-18589. [PMID: 37819817 DOI: 10.1109/tnnls.2023.3319323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/13/2023]
Abstract
Camouflaged object detection (COD) aims to identify object pixels visually embedded in the background environment. Existing deep learning methods fail to utilize the context information around different pixels adequately and efficiently. In order to solve this problem, a novel pixel-centric context perception network (PCPNet) is proposed, the core of which is to customize the personalized context of each pixel based on the automatic estimation of its surroundings. Specifically, PCPNet first employs an elegant encoder equipped with the designed vital component generation (VCG) module to obtain a set of compact features rich in low-level spatial and high-level semantic information across multiple subspaces. Then, we present a parameter-free pixel importance estimation (PIE) function based on multiwindow information fusion. Object pixels with complex backgrounds will be assigned with higher PIE values. Subsequently, PIE is utilized to regularize the optimization loss. In this way, the network can pay more attention to those pixels with higher PIE values in the decoding stage. Finally, a local continuity refinement module (LCRM) is used to refine the detection results. Extensive experiments on four COD benchmarks, five salient object detection (SOD) benchmarks, and five polyp segmentation benchmarks demonstrate the superiority of PCPNet with respect to other state-of-the-art methods.
Collapse
|
11
|
Tang Y, Li M. DMGNet: Depth mask guiding network for RGB-D salient object detection. Neural Netw 2024; 180:106751. [PMID: 39332209 DOI: 10.1016/j.neunet.2024.106751] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 07/26/2024] [Accepted: 09/19/2024] [Indexed: 09/29/2024]
Abstract
Though depth images can provide supplementary spatial structural cues for salient object detection (SOD) task, inappropriate utilization of depth features may introduce noisy or misleading features, which may greatly destroy SOD performance. To address this issue, we propose a depth mask guiding network (DMGNet) for RGB-D SOD. In this network, a depth mask guidance module (DMGM) is designed to pre-segment the salient objects from depth images and then create masks using pre-segmented objects to guide the RGB subnetwork to extract more discriminative features. Furthermore, a feature fusion pyramid module (FFPM) is employed to acquire more informative fused features using multi-branch convolutional channels with varying receptive fields, further enhancing the fusion of cross-modal features. Extensive experiments on nine benchmark datasets demonstrate the effectiveness of the proposed network.
Collapse
Affiliation(s)
- Yinggan Tang
- School of Electrical Engineering, Yanshan University, Qinhuangdao, Hebei 066004, China; Key Laboratory of Intelligent Rehabilitation and Neromodulation of Hebei Province, Yanshan University, Qinhuangdao, Hebei 066004, China; Key Laboratory of Industrial Computer Control Engineering of Hebei Province, Yanshan University, Qinhuangdao, Hebei 066004, China.
| | - Mengyao Li
- School of Electrical Engineering, Yanshan University, Qinhuangdao, Hebei 066004, China.
| |
Collapse
|
12
|
Ma S, Zhu X, Xu L, Zhou L, Chen D. LRNet: lightweight attention-oriented residual fusion network for light field salient object detection. Sci Rep 2024; 14:26030. [PMID: 39472603 PMCID: PMC11522285 DOI: 10.1038/s41598-024-76874-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Accepted: 10/17/2024] [Indexed: 11/02/2024] Open
Abstract
Light field imaging contains abundant scene structure information, which can improve the accuracy of salient object detection in challenging tasks and has received widespread attention. However, how to apply the abundant information of light field imaging to salient object detection still faces enormous challenges. In this paper, the lightweight attention and residual convLSTM network is proposed to address this issue, which is mainly composed of the lightweight attention-based feature enhancement module (LFM) and residual convLSTM-based feature integration module (RFM). The LFM can provide an attention map for each focal slice through the attention mechanism to focus on the features related to the object, thereby enhancing saliency features. The RFM leverages the residual mechanism and convLSTM to fully utilize the spatial structural information of focal slices, thereby achieving high-precision feature fusion. Experimental results on three publicly available light field datasets show that the proposed method surpasses the existing 17 state-of-the-art methods and achieves the highest score among five quantitative indicators.
Collapse
Affiliation(s)
- Shuai Ma
- ChengDu Aircraft Industrial (Group) Co., Ltd., Qingyang, Chengdu, 610092, Sichuan, China
| | - Xusheng Zhu
- ChengDu Aircraft Industrial (Group) Co., Ltd., Qingyang, Chengdu, 610092, Sichuan, China.
| | - Long Xu
- ChengDu Aircraft Industrial (Group) Co., Ltd., Qingyang, Chengdu, 610092, Sichuan, China
| | - Li Zhou
- ChengDu Aircraft Industrial (Group) Co., Ltd., Qingyang, Chengdu, 610092, Sichuan, China
| | - Daixin Chen
- ChengDu Aircraft Industrial (Group) Co., Ltd., Qingyang, Chengdu, 610092, Sichuan, China
| |
Collapse
|
13
|
Yue H, Guo J, Yin X, Zhang Y, Zheng S. Salient object detection in low-light RGB-T scene via spatial-frequency cues mining. Neural Netw 2024; 178:106406. [PMID: 38838393 DOI: 10.1016/j.neunet.2024.106406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 01/26/2024] [Accepted: 05/21/2024] [Indexed: 06/07/2024]
Abstract
Low-light conditions pose significant challenges to vision tasks, such as salient object detection (SOD), due to insufficient photons. Light-insensitive RGB-T SOD models mitigate the above problems to some extent, but they are limited in performance as they only focus on spatial feature fusion while ignoring the frequency discrepancy. To this end, we propose an RGB-T SOD model by mining spatial-frequency cues, called SFMNet, for low-light scenes. Our SFMNet consists of spatial-frequency feature exploration (SFFE) modules and spatial-frequency feature interaction (SFFI) modules. To be specific, the SFFE module aims to separate spatial-frequency features and adaptively extract high and low-frequency features. Moreover, the SFFI module integrates cross-modality and cross-domain information to capture effective feature representations. By deploying both modules in a top-down pathway, our method generates high-quality saliency predictions. Furthermore, we construct the first low-light RGB-T SOD dataset as a benchmark for evaluating performance. Extensive experiments demonstrate that our SFMNet can achieve higher accuracy than the existing models for low-light scenes.
Collapse
Affiliation(s)
- Huihui Yue
- School of Electrical and Information Engineering, Tianjin University, Tianjin, 300072, China.
| | - Jichang Guo
- School of Electrical and Information Engineering, Tianjin University, Tianjin, 300072, China.
| | - Xiangjun Yin
- School of Electrical and Information Engineering, Tianjin University, Tianjin, 300072, China.
| | - Yi Zhang
- School of Electrical and Information Engineering, Tianjin University, Tianjin, 300072, China.
| | - Sida Zheng
- School of Electrical and Information Engineering, Tianjin University, Tianjin, 300072, China.
| |
Collapse
|
14
|
Zhong Z, Liu X, Jiang J, Zhao D, Ji X. Deep Attentional Guided Image Filtering. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:12236-12250. [PMID: 37015130 DOI: 10.1109/tnnls.2023.3253472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Guided filter is a fundamental tool in computer vision and computer graphics, which aims to transfer structure information from the guide image to the target image. Most existing methods construct filter kernels from the guidance itself without considering the mutual dependency between the guidance and the target. However, since there typically exist significantly different edges in two images, simply transferring all structural information from the guide to the target would result in various artifacts. To cope with this problem, we propose an effective framework named deep attentional guided image filtering, the filtering process of which can fully integrate the complementary information contained in both images. Specifically, we propose an attentional kernel learning module to generate dual sets of filter kernels from the guidance and the target and then adaptively combine them by modeling the pixelwise dependency between the two images. Meanwhile, we propose a multiscale guided image filtering module to progressively generate the filtering result with the constructed kernels in a coarse-to-fine manner. Correspondingly, a multiscale fusion strategy is introduced to reuse the intermediate results in the coarse-to-fine process. Extensive experiments show that the proposed framework compares favorably with the state-of-the-art methods in a wide range of guided image filtering applications, such as guided super-resolution (SR), cross-modality restoration, and semantic segmentation. Moreover, our scheme achieved the first place in the real depth map SR challenge held in ACM ICMR 2021. The codes can be found at https://github.com/zhwzhong/DAGF.
Collapse
|
15
|
Luo H, Lin G, Shen F, Huang X, Yao Y, Shen H. Robust-EQA: Robust Learning for Embodied Question Answering With Noisy Labels. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:12083-12094. [PMID: 37028297 DOI: 10.1109/tnnls.2023.3251984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Embodied question answering (EQA) is a recently emerged research field in which an agent is asked to answer the user's questions by exploring the environment and collecting visual information. Plenty of researchers turn their attention to the EQA field due to its broad potential application areas, such as in-home robots, self-driven mobile, and personal assistants. High-level visual tasks, such as EQA, are susceptible to noisy inputs, because they have complex reasoning processes. Before the profits of the EQA field can be applied to practical applications, good robustness against label noise needs to be equipped. To tackle this problem, we propose a novel label noise-robust learning algorithm for the EQA task. First, a joint training co-regularization noise-robust learning method is proposed for noisy filtering of the visual question answering (VQA) module, which trains two parallel network branches by one loss function. Then, a two-stage hierarchical robust learning algorithm is proposed to filter out noisy navigation labels in both trajectory level and action level. Finally, by taking purified labels as inputs, a joint robust learning mechanism is given to coordinate the work of the whole EQA system. Empirical results demonstrate that, under extremely noisy environments (45% of noisy labels) and low-level noisy environments (20% of noisy labels), the robustness of deep learning models trained by our algorithm is superior to the existing EQA models in noisy environments.
Collapse
|
16
|
Pei J, Jiang T, Tang H, Liu N, Jin Y, Fan DP, Heng PA. CalibNet: Dual-Branch Cross-Modal Calibration for RGB-D Salient Instance Segmentation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:4348-4362. [PMID: 39074016 DOI: 10.1109/tip.2024.3432328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/31/2024]
Abstract
In this study, we propose a novel approach for RGB-D salient instance segmentation using a dual-branch cross-modal feature calibration architecture called CalibNet. Our method simultaneously calibrates depth and RGB features in the kernel and mask branches to generate instance-aware kernels and mask features. CalibNet consists of three simple modules, a dynamic interactive kernel (DIK) and a weight-sharing fusion (WSF), which work together to generate effective instance-aware kernels and integrate cross-modal features. To improve the quality of depth features, we incorporate a depth similarity assessment (DSA) module prior to DIK and WSF. In addition, we further contribute a new DSIS dataset, which contains 1,940 images with elaborate instance-level annotations. Extensive experiments on three challenging benchmarks show that CalibNet yields a promising result, i.e., 58.0% AP with 320×480 input size on the COME15K-E test set, which significantly surpasses the alternative frameworks. Our code and dataset will be publicly available at: https://github.com/PJLallen/CalibNet.
Collapse
|
17
|
Zhang Z, Chen S, Wang Z, Yang J. PlaneSeg: Building a Plug-In for Boosting Planar Region Segmentation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:11486-11500. [PMID: 37027268 DOI: 10.1109/tnnls.2023.3262544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Existing methods in planar region segmentation suffer the problems of vague boundaries and failure to detect small-sized regions. To address these, this study presents an end-to-end framework, named PlaneSeg, which can be easily integrated into various plane segmentation models. Specifically, PlaneSeg contains three modules, namely, the edge feature extraction module, the multiscale module, and the resolution-adaptation module. First, the edge feature extraction module produces edge-aware feature maps for finer segmentation boundaries. The learned edge information acts as a constraint to mitigate inaccurate boundaries. Second, the multiscale module combines feature maps of different layers to harvest spatial and semantic information from planar objects. The multiformity of object information can help recognize small-sized objects to produce more accurate segmentation results. Third, the resolution-adaptation module fuses the feature maps produced by the two aforementioned modules. For this module, a pairwise feature fusion is adopted to resample the dropped pixels and extract more detailed features. Extensive experiments demonstrate that PlaneSeg outperforms other state-of-the-art approaches on three downstream tasks, including plane segmentation, 3-D plane reconstruction, and depth prediction. Code is available at https://github.com/nku-zhichengzhang/PlaneSeg.
Collapse
|
18
|
Peng Y, Zhai Z, Feng M. SLMSF-Net: A Semantic Localization and Multi-Scale Fusion Network for RGB-D Salient Object Detection. SENSORS (BASEL, SWITZERLAND) 2024; 24:1117. [PMID: 38400274 PMCID: PMC10892948 DOI: 10.3390/s24041117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 02/02/2024] [Accepted: 02/06/2024] [Indexed: 02/25/2024]
Abstract
Salient Object Detection (SOD) in RGB-D images plays a crucial role in the field of computer vision, with its central aim being to identify and segment the most visually striking objects within a scene. However, optimizing the fusion of multi-modal and multi-scale features to enhance detection performance remains a challenge. To address this issue, we propose a network model based on semantic localization and multi-scale fusion (SLMSF-Net), specifically designed for RGB-D SOD. Firstly, we designed a Deep Attention Module (DAM), which extracts valuable depth feature information from both channel and spatial perspectives and efficiently merges it with RGB features. Subsequently, a Semantic Localization Module (SLM) is introduced to enhance the top-level modality fusion features, enabling the precise localization of salient objects. Finally, a Multi-Scale Fusion Module (MSF) is employed to perform inverse decoding on the modality fusion features, thus restoring the detailed information of the objects and generating high-precision saliency maps. Our approach has been validated across six RGB-D salient object detection datasets. The experimental results indicate an improvement of 0.20~1.80%, 0.09~1.46%, 0.19~1.05%, and 0.0002~0.0062, respectively in maxF, maxE, S, and MAE metrics, compared to the best competing methods (AFNet, DCMF, and C2DFNet).
Collapse
Affiliation(s)
- Yanbin Peng
- School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou 310023, China
| | | | | |
Collapse
|
19
|
Lv C, Wan B, Zhou X, Sun Y, Zhang J, Yan C. Lightweight Cross-Modal Information Mutual Reinforcement Network for RGB-T Salient Object Detection. ENTROPY (BASEL, SWITZERLAND) 2024; 26:130. [PMID: 38392385 PMCID: PMC10888287 DOI: 10.3390/e26020130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 01/26/2024] [Accepted: 01/29/2024] [Indexed: 02/24/2024]
Abstract
RGB-T salient object detection (SOD) has made significant progress in recent years. However, most existing works are based on heavy models, which are not applicable to mobile devices. Additionally, there is still room for improvement in the design of cross-modal feature fusion and cross-level feature fusion. To address these issues, we propose a lightweight cross-modal information mutual reinforcement network for RGB-T SOD. Our network consists of a lightweight encoder, the cross-modal information mutual reinforcement (CMIMR) module, and the semantic-information-guided fusion (SIGF) module. To reduce the computational cost and the number of parameters, we employ the lightweight module in both the encoder and decoder. Furthermore, to fuse the complementary information between two-modal features, we design the CMIMR module to enhance the two-modal features. This module effectively refines the two-modal features by absorbing previous-level semantic information and inter-modal complementary information. In addition, to fuse the cross-level feature and detect multiscale salient objects, we design the SIGF module, which effectively suppresses the background noisy information in low-level features and extracts multiscale information. We conduct extensive experiments on three RGB-T datasets, and our method achieves competitive performance compared to the other 15 state-of-the-art methods.
Collapse
Affiliation(s)
- Chengtao Lv
- School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Bin Wan
- School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Xiaofei Zhou
- School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Yaoqi Sun
- School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China
- Lishui Institute, Hangzhou Dianzi University, Lishui 323000, China
| | - Jiyong Zhang
- School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Chenggang Yan
- School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China
| |
Collapse
|
20
|
Wang S, Jiang F, Xu B. Swin Transformer-Based Edge Guidance Network for RGB-D Salient Object Detection. SENSORS (BASEL, SWITZERLAND) 2023; 23:8802. [PMID: 37960501 PMCID: PMC10650861 DOI: 10.3390/s23218802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 10/09/2023] [Accepted: 10/24/2023] [Indexed: 11/15/2023]
Abstract
Salient object detection (SOD), which is used to identify the most distinctive object in a given scene, plays an important role in computer vision tasks. Most existing RGB-D SOD methods employ a CNN-based network as the backbone to extract features from RGB and depth images; however, the inherent locality of a CNN-based network limits the performance of CNN-based methods. To tackle this issue, we propose a novel Swin Transformer-based edge guidance network (SwinEGNet) for RGB-D SOD in which the Swin Transformer is employed as a powerful feature extractor to capture the global context. An edge-guided cross-modal interaction module is proposed to effectively enhance and fuse features. In particular, we employed the Swin Transformer as the backbone to extract features from RGB images and depth maps. Then, we introduced the edge extraction module (EEM) to extract edge features and the depth enhancement module (DEM) to enhance depth features. Additionally, a cross-modal interaction module (CIM) was used to integrate cross-modal features from global and local contexts. Finally, we employed a cascaded decoder to refine the prediction map in a coarse-to-fine manner. Extensive experiments demonstrated that our SwinEGNet achieved the best performance on the LFSD, NLPR, DES, and NJU2K datasets and achieved comparable performance on the STEREO dataset compared to 14 state-of-the-art methods. Our model achieved better performance compared to SwinNet, with 88.4% parameters and 77.2% FLOPs. Our code will be publicly available.
Collapse
Affiliation(s)
| | | | - Boqian Xu
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China; (S.W.); (F.J.)
| |
Collapse
|
21
|
Xu K, Guo J. RGB-D salient object detection via convolutional capsule network based on feature extraction and integration. Sci Rep 2023; 13:17652. [PMID: 37848501 PMCID: PMC10582015 DOI: 10.1038/s41598-023-44698-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 10/11/2023] [Indexed: 10/19/2023] Open
Abstract
Fully convolutional neural network has shown advantages in the salient object detection by using the RGB or RGB-D images. However, there is an object-part dilemma since most fully convolutional neural network inevitably leads to an incomplete segmentation of the salient object. Although the capsule network is capable of recognizing a complete object, it is highly computational demand and time consuming. In this paper, we propose a novel convolutional capsule network based on feature extraction and integration for dealing with the object-part relationship, with less computation demand. First and foremost, RGB features are extracted and integrated by using the VGG backbone and feature extraction module. Then, these features, integrating with depth images by using feature depth module, are upsampled progressively to produce a feature map. In the next step, the feature map is fed into the feature-integrated convolutional capsule network to explore the object-part relationship. The proposed capsule network extracts object-part information by using convolutional capsules with locally-connected routing and predicts the final salient map based on the deconvolutional capsules. Experimental results on four RGB-D benchmark datasets show that our proposed method outperforms 23 state-of-the-art algorithms.
Collapse
Affiliation(s)
- Kun Xu
- School of Electrical and Information Engineering, Tianjin University, Tianjin, 300000, People's Republic of China
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan, 250014, People's Republic of China
- Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan, China
| | - Jichang Guo
- School of Electrical and Information Engineering, Tianjin University, Tianjin, 300000, People's Republic of China.
| |
Collapse
|
22
|
Liu Z, Hayat M, Yang H, Peng D, Lei Y. Deep Hypersphere Feature Regularization for Weakly Supervised RGB-D Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:5423-5437. [PMID: 37773910 DOI: 10.1109/tip.2023.3318953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/01/2023]
Abstract
We propose a weakly supervised approach for salient object detection from multi-modal RGB-D data. Our approach only relies on labels from scribbles, which are much easier to annotate, compared with dense labels used in conventional fully supervised setting. In contrast to existing methods that employ supervision signals on the output space, our design regularizes the intermediate latent space to enhance discrimination between salient and non-salient objects. We further introduce a contour detection branch to implicitly constrain the semantic boundaries and achieve precise edges of detected salient objects. To enhance the long-range dependencies among local features, we introduce a Cross-Padding Attention Block (CPAB). Extensive experiments on seven benchmark datasets demonstrate that our method not only outperforms existing weakly supervised methods, but is also on par with several fully-supervised state-of-the-art models. Code is available at https://github.com/leolyj/DHFR-SOD.
Collapse
|
23
|
Zhang Y, Chen F, Peng Z, Zou W, Zhang C. Exploring Focus and Depth-Induced Saliency Detection for Light Field. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1336. [PMID: 37761635 PMCID: PMC10530224 DOI: 10.3390/e25091336] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 08/30/2023] [Accepted: 09/05/2023] [Indexed: 09/29/2023]
Abstract
An abundance of features in the light field has been demonstrated to be useful for saliency detection in complex scenes. However, bottom-up saliency detection models are limited in their ability to explore light field features. In this paper, we propose a light field saliency detection method that focuses on depth-induced saliency, which can more deeply explore the interactions between different cues. First, we localize a rough saliency region based on the compactness of color and depth. Then, the relationships among depth, focus, and salient objects are carefully investigated, and the focus cue of the focal stack is used to highlight the foreground objects. Meanwhile, the depth cue is utilized to refine the coarse salient objects. Furthermore, considering the consistency of color smoothing and depth space, an optimization model referred to as color and depth-induced cellular automata is improved to increase the accuracy of saliency maps. Finally, to avoid interference of redundant information, the mean absolute error is chosen as the indicator of the filter to obtain the best results. The experimental results on three public light field datasets show that the proposed method performs favorably against the state-of-the-art conventional light field saliency detection approaches and even light field saliency detection approaches based on deep learning.
Collapse
Affiliation(s)
- Yani Zhang
- School of Electrical and Electronic Engineering, Chongqing University of Technology, Chongqing 400054, China; (Y.Z.); (Z.P.); (C.Z.)
| | - Fen Chen
- School of Electrical and Electronic Engineering, Chongqing University of Technology, Chongqing 400054, China; (Y.Z.); (Z.P.); (C.Z.)
- Faculty of Information Science and Engineering, Ningbo University, No. 818, Ningbo 315211, China;
| | - Zongju Peng
- School of Electrical and Electronic Engineering, Chongqing University of Technology, Chongqing 400054, China; (Y.Z.); (Z.P.); (C.Z.)
- Faculty of Information Science and Engineering, Ningbo University, No. 818, Ningbo 315211, China;
| | - Wenhui Zou
- Faculty of Information Science and Engineering, Ningbo University, No. 818, Ningbo 315211, China;
| | - Changhe Zhang
- School of Electrical and Electronic Engineering, Chongqing University of Technology, Chongqing 400054, China; (Y.Z.); (Z.P.); (C.Z.)
| |
Collapse
|
24
|
Wang S, Jiang F, Xu B. Global Guided Cross-Modal Cross-Scale Network for RGB-D Salient Object Detection. SENSORS (BASEL, SWITZERLAND) 2023; 23:7221. [PMID: 37631757 PMCID: PMC10459329 DOI: 10.3390/s23167221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2023] [Revised: 08/09/2023] [Accepted: 08/11/2023] [Indexed: 08/27/2023]
Abstract
RGB-D saliency detection aims to accurately localize salient regions using the complementary information of a depth map. Global contexts carried by the deep layer are key to salient objection detection, but they are diluted when transferred to shallower layers. Besides, depth maps may contain misleading information due to the depth sensors. To tackle these issues, in this paper, we propose a new cross-modal cross-scale network for RGB-D salient object detection, where the global context information provides global guidance to boost performance in complex scenarios. First, we introduce a global guided cross-modal and cross-scale module named G2CMCSM to realize global guided cross-modal cross-scale fusion. Then, we employ feature refinement modules for progressive refinement in a coarse-to-fine manner. In addition, we adopt a hybrid loss function to supervise the training of G2CMCSNet over different scales. With all these modules working together, G2CMCSNet effectively enhances both salient object details and salient object localization. Extensive experiments on challenging benchmark datasets demonstrate that our G2CMCSNet outperforms existing state-of-the-art methods.
Collapse
Affiliation(s)
| | | | - Boqian Xu
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China; (S.W.); (F.J.)
| |
Collapse
|
25
|
Zhang X, Zhang Y, Liu H, Li S, Liu L. Dynamic Sweep Experiments on a Heterogeneous Phase Composite System Based on Branched-Preformed Particle Gel in High Water-Cut Reservoirs after Polymer Flooding. Gels 2023; 9:gels9050364. [PMID: 37232956 DOI: 10.3390/gels9050364] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 04/20/2023] [Accepted: 04/24/2023] [Indexed: 05/27/2023] Open
Abstract
Heterogeneous phase composite (HPC) flooding technology that is based on branched-preformed particle gel (B-PPG) is an important technology for enhancing oil recovery in high water-cut reservoirs. In this paper, we conducted a series of visualization experiments under the condition of developed high-permeability channels after polymer flooding, with respect to well pattern densification and adjustment, and HPC flooding and its synergistic regulation. The experiments show that for polymer-flooded reservoirs, HPC flooding can significantly reduce the water cut and increase oil recovery, but that the injected HPC system mainly advances along the high-permeability channel with limited sweep expansion. Furthermore, well pattern densification and adjustment can divert the original mainstream direction, which has a positive effect on HPC flooding, and can effectively expand the sweeping range under the synergistic effect of residual polymers. Due to the synergistic effect of multiple chemical agents in the HPC system, after well pattern densification and adjustment, the production time for HPC flooding with the water cut lower than 95% was significantly prolonged. In addition, conversion schemes, in which the original production well is converted into the injection well, are better than non-conversion schemes in terms of expanding sweep efficiency and enhancing oil recovery. Therefore, for well groups with obvious high-water-consuming channels after polymer flooding, the implementation of HPC flooding can be combined with well pattern conversion and intensification in order to further improve oil displacement.
Collapse
Affiliation(s)
- Xianmin Zhang
- Key Laboratory of Unconventional Oil & Gas Development, Ministry of Education, China University of Petroleum (East China), Qingdao 266580, China
- School of Petroleum Engineering, China University of Petroleum (East China), Qingdao 266580, China
| | - Yiming Zhang
- Key Laboratory of Unconventional Oil & Gas Development, Ministry of Education, China University of Petroleum (East China), Qingdao 266580, China
- School of Petroleum Engineering, China University of Petroleum (East China), Qingdao 266580, China
| | - Haicheng Liu
- Research Institute of Exploration and Development, Shengli Oilfield Company, SINOPEC, Dongying 257015, China
| | - Shanshan Li
- Key Laboratory of Unconventional Oil & Gas Development, Ministry of Education, China University of Petroleum (East China), Qingdao 266580, China
- School of Petroleum Engineering, China University of Petroleum (East China), Qingdao 266580, China
| | - Lijie Liu
- Research Institute of Exploration and Development, Shengli Oilfield Company, SINOPEC, Dongying 257015, China
| |
Collapse
|
26
|
Kong Y, Wang H, Kong L, Liu Y, Yao C, Yin B. Absolute and Relative Depth-Induced Network for RGB-D Salient Object Detection. SENSORS (BASEL, SWITZERLAND) 2023; 23:3611. [PMID: 37050670 PMCID: PMC10098920 DOI: 10.3390/s23073611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 03/21/2023] [Accepted: 03/23/2023] [Indexed: 06/19/2023]
Abstract
Detecting salient objects in complicated scenarios is a challenging problem. Except for semantic features from the RGB image, spatial information from the depth image also provides sufficient cues about the object. Therefore, it is crucial to rationally integrate RGB and depth features for the RGB-D salient object detection task. Most existing RGB-D saliency detectors modulate RGB semantic features with absolution depth values. However, they ignore the appearance contrast and structure knowledge indicated by relative depth values between pixels. In this work, we propose a depth-induced network (DIN) for RGB-D salient object detection, to take full advantage of both absolute and relative depth information, and further, enforce the in-depth fusion of the RGB-D cross-modalities. Specifically, an absolute depth-induced module (ADIM) is proposed, to hierarchically integrate absolute depth values and RGB features, to allow the interaction between the appearance and structural information in the encoding stage. A relative depth-induced module (RDIM) is designed, to capture detailed saliency cues, by exploring contrastive and structural information from relative depth values in the decoding stage. By combining the ADIM and RDIM, we can accurately locate salient objects with clear boundaries, even from complex scenes. The proposed DIN is a lightweight network, and the model size is much smaller than that of state-of-the-art algorithms. Extensive experiments on six challenging benchmarks, show that our method outperforms most existing RGB-D salient object detection models.
Collapse
Affiliation(s)
- Yuqiu Kong
- School of Innovation and Entrepreneurship, Dalian University of Technology, Dalian 116024, China; (Y.K.)
| | - He Wang
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Lingwei Kong
- School of Information and Communication Engineering, Dalian University of Technology, Dalian 116024, China
| | - Yang Liu
- School of Innovation and Entrepreneurship, Dalian University of Technology, Dalian 116024, China; (Y.K.)
| | - Cuili Yao
- School of Innovation and Entrepreneurship, Dalian University of Technology, Dalian 116024, China; (Y.K.)
| | - Baocai Yin
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| |
Collapse
|
27
|
Zhuge M, Fan DP, Liu N, Zhang D, Xu D, Shao L. Salient Object Detection via Integrity Learning. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:3738-3752. [PMID: 35666793 DOI: 10.1109/tpami.2022.3179526] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Although current salient object detection (SOD) works have achieved significant progress, they are limited when it comes to the integrity of the predicted salient regions. We define the concept of integrity at both a micro and macro level. Specifically, at the micro level, the model should highlight all parts that belong to a certain salient object. Meanwhile, at the macro level, the model needs to discover all salient objects in a given image. To facilitate integrity learning for SOD, we design a novel Integrity Cognition Network (ICON), which explores three important components for learning strong integrity features. 1) Unlike existing models, which focus more on feature discriminability, we introduce a diverse feature aggregation (DFA) component to aggregate features with various receptive fields (i.e., kernel shape and context) and increase feature diversity. Such diversity is the foundation for mining the integral salient objects. 2) Based on the DFA features, we introduce an integrity channel enhancement (ICE) component with the goal of enhancing feature channels that highlight the integral salient objects, while suppressing the other distracting ones. 3) After extracting the enhanced features, the part-whole verification (PWV) method is employed to determine whether the part and whole object features have strong agreement. Such part-whole agreements can further improve the micro-level integrity for each salient object. To demonstrate the effectiveness of our ICON, comprehensive experiments are conducted on seven challenging benchmarks. Our ICON outperforms the baseline methods in terms of a wide range of metrics. Notably, our ICON achieves ∼ 10% relative improvement over the previous best model in terms of average false negative ratio (FNR), on six datasets. Codes and results are available at: https://github.com/mczhuge/ICON.
Collapse
|
28
|
Zhou W, Zhu Y, Lei J, Yang R, Yu L. LSNet: Lightweight Spatial Boosting Network for Detecting Salient Objects in RGB-Thermal Images. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:1329-1340. [PMID: 37022901 DOI: 10.1109/tip.2023.3242775] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Most recent methods for RGB (red-green-blue)-thermal salient object detection (SOD) involve several floating-point operations and have numerous parameters, resulting in slow inference, especially on common processors, and impeding their deployment on mobile devices for practical applications. To address these problems, we propose a lightweight spatial boosting network (LSNet) for efficient RGB-thermal SOD with a lightweight MobileNetV2 backbone to replace a conventional backbone (e.g., VGG, ResNet). To improve feature extraction using a lightweight backbone, we propose a boundary boosting algorithm that optimizes the predicted saliency maps and reduces information collapse in low-dimensional features. The algorithm generates boundary maps based on predicted saliency maps without incurring additional calculations or complexity. As multimodality processing is essential for high-performance SOD, we adopt attentive feature distillation and selection and propose semantic and geometric transfer learning to enhance the backbone without increasing the complexity during testing. Experimental results demonstrate that the proposed LSNet achieves state-of-the-art performance compared with 14 RGB-thermal SOD methods on three datasets while improving the numbers of floating-point operations (1.025G) and parameters (5.39M), model size (22.1 MB), and inference speed (9.95 fps for PyTorch, batch size of 1, and Intel i5-7500 processor; 93.53 fps for PyTorch, batch size of 1, and NVIDIA TITAN V graphics processor; 936.68 fps for PyTorch, batch size of 20, and graphics processor; 538.01 fps for TensorRT and batch size of 1; and 903.01 fps for TensorRT/FP16 and batch size of 1). The code and results can be found from the link of https://github.com/zyrant/LSNet.
Collapse
|
29
|
Fan DP, Zhang J, Xu G, Cheng MM, Shao L. Salient Objects in Clutter. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:2344-2366. [PMID: 35404809 DOI: 10.1109/tpami.2022.3166451] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In this paper, we identify and address a serious design bias of existing salient object detection (SOD) datasets, which unrealistically assume that each image should contain at least one clear and uncluttered salient object. This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets. However, these models are still far from satisfactory when applied to real-world scenes. Based on our analyses, we propose a new high-quality dataset and update the previous saliency benchmark. Specifically, our dataset, called Salient Objects in Clutter (SOC), includes images with both salient and non-salient objects from several common object categories. In addition to object category annotations, each salient image is accompanied by attributes that reflect common challenges in common scenes, which can help provide deeper insight into the SOD problem. Further, with a given saliency encoder, e.g., the backbone network, existing saliency models are designed to achieve mapping from the training image set to the training ground-truth set. We therefore argue that improving the dataset can yield higher performance gains than focusing only on the decoder design. With this in mind, we investigate several dataset-enhancement strategies, including label smoothing to implicitly emphasize salient boundaries, random image augmentation to adapt saliency models to various scenarios, and self-supervised learning as a regularization strategy to learn from small datasets. Our extensive results demonstrate the effectiveness of these tricks. We also provide a comprehensive benchmark for SOD, which can be found in our repository: https://github.com/DengPingFan/SODBenchmark.
Collapse
|
30
|
Pang Y, Zhao X, Zhang L, Lu H. CAVER: Cross-Modal View-Mixed Transformer for Bi-Modal Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:892-904. [PMID: 37018701 DOI: 10.1109/tip.2023.3234702] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Most of the existing bi-modal (RGB-D and RGB-T) salient object detection methods utilize the convolution operation and construct complex interweave fusion structures to achieve cross-modal information integration. The inherent local connectivity of the convolution operation constrains the performance of the convolution-based methods to a ceiling. In this work, we rethink these tasks from the perspective of global information alignment and transformation. Specifically, the proposed cross-modal view-mixed transformer (CAVER) cascades several cross-modal integration units to construct a top-down transformer-based information propagation path. CAVER treats the multi-scale and multi-modal feature integration as a sequence-to-sequence context propagation and update process built on a novel view-mixed attention mechanism. Besides, considering the quadratic complexity w.r.t. the number of input tokens, we design a parameter-free patch-wise token re-embedding strategy to simplify operations. Extensive experimental results on RGB-D and RGB-T SOD datasets demonstrate that such a simple two-stream encoder-decoder framework can surpass recent state-of-the-art methods when it is equipped with the proposed components.
Collapse
|
31
|
Wu Z, Allibert G, Meriaudeau F, Ma C, Demonceaux C. HiDAnet: RGB-D Salient Object Detection via Hierarchical Depth Awareness. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:2160-2173. [PMID: 37027289 DOI: 10.1109/tip.2023.3263111] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
RGB-D saliency detection aims to fuse multi-modal cues to accurately localize salient regions. Existing works often adopt attention modules for feature modeling, with few methods explicitly leveraging fine-grained details to merge with semantic cues. Thus, despite the auxiliary depth information, it is still challenging for existing models to distinguish objects with similar appearances but at distinct camera distances. In this paper, from a new perspective, we propose a novel Hierarchical Depth Awareness network (HiDAnet) for RGB-D saliency detection. Our motivation comes from the observation that the multi-granularity properties of geometric priors correlate well with the neural network hierarchies. To realize multi-modal and multi-level fusion, we first use a granularity-based attention scheme to strengthen the discriminatory power of RGB and depth features separately. Then we introduce a unified cross dual-attention module for multi-modal and multi-level fusion in a coarse-to-fine manner. The encoded multi-modal features are gradually aggregated into a shared decoder. Further, we exploit a multi-scale loss to take full advantage of the hierarchical information. Extensive experiments on challenging benchmark datasets demonstrate that our HiDAnet performs favorably over the state-of-the-art methods by large margins. The source code can be found in https://github.com/Zongwei97/HIDANet/.
Collapse
|
32
|
Wen H, Song K, Huang L, Wang H, Yan Y. Cross-modality salient object detection network with universality and anti-interference. Knowl Based Syst 2023. [DOI: 10.1016/j.knosys.2023.110322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
|
33
|
Li G, Liu Z, Zeng D, Lin W, Ling H. Adjacent Context Coordination Network for Salient Object Detection in Optical Remote Sensing Images. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:526-538. [PMID: 35417367 DOI: 10.1109/tcyb.2022.3162945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Salient object detection (SOD) in optical remote sensing images (RSIs), or RSI-SOD, is an emerging topic in understanding optical RSIs. However, due to the difference between optical RSIs and natural scene images (NSIs), directly applying NSI-SOD methods to optical RSIs fails to achieve satisfactory results. In this article, we propose a novel adjacent context coordination network (ACCoNet) to explore the coordination of adjacent features in an encoder-decoder architecture for RSI-SOD. Specifically, ACCoNet consists of three parts: 1) an encoder; 2) adjacent context coordination modules (ACCoMs); and 3) a decoder. As the key component of ACCoNet, ACCoM activates the salient regions of output features of the encoder and transmits them to the decoder. ACCoM contains a local branch and two adjacent branches to coordinate the multilevel features simultaneously. The local branch highlights the salient regions in an adaptive way, while the adjacent branches introduce global information of adjacent levels to enhance salient regions. In addition, to extend the capabilities of the classic decoder block (i.e., several cascaded convolutional layers), we extend it with two bifurcations and propose a bifurcation-aggregation block (BAB) to capture the contextual information in the decoder. Extensive experiments on two benchmark datasets demonstrate that the proposed ACCoNet outperforms 22 state-of-the-art methods under nine evaluation metrics, and runs up to 81 fps on a single NVIDIA Titan X GPU. The code and results of our method are available at https://github.com/MathLee/ACCoNet.
Collapse
|
34
|
Liu N, Zhang N, Shao L, Han J. Learning Selective Mutual Attention and Contrast for RGB-D Saliency Detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:9026-9042. [PMID: 34699348 DOI: 10.1109/tpami.2021.3122139] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
How to effectively fuse cross-modal information is a key problem for RGB-D salient object detection. Early fusion and result fusion schemes fuse RGB and depth information at the input and output stages, respectively, and hence incur distribution gaps or information loss. Many models instead employ a feature fusion strategy, but they are limited by their use of low-order point-to-point fusion methods. In this paper, we propose a novel mutual attention model by fusing attention and context from different modalities. We use the non-local attention of one modality to propagate long-range contextual dependencies for the other, thus leveraging complementary attention cues to achieve high-order and trilinear cross-modal interaction. We also propose to induce contrast inference from the mutual attention and obtain a unified model. Considering that low-quality depth data may be detrimental to model performance, we further propose a selective attention to reweight the added depth cues. We embed the proposed modules in a two-stream CNN for RGB-D SOD. Experimental results demonstrate the effectiveness of our proposed model. Moreover, we also construct a new and challenging large-scale RGB-D SOD dataset of high-quality, which can promote both the training and evaluation of deep models.
Collapse
|
35
|
Wu YH, Liu Y, Xu J, Bian JW, Gu YC, Cheng MM. MobileSal: Extremely Efficient RGB-D Salient Object Detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:10261-10269. [PMID: 34898430 DOI: 10.1109/tpami.2021.3134684] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The high computational cost of neural networks has prevented recent successes in RGB-D salient object detection (SOD) from benefiting real-world applications. Hence, this article introduces a novel network, MobileSal, which focuses on efficient RGB-D SOD using mobile networks for deep feature extraction. However, mobile networks are less powerful in feature representation than cumbersome networks. To this end, we observe that the depth information of color images can strengthen the feature representation related to SOD if leveraged properly. Therefore, we propose an implicit depth restoration (IDR) technique to strengthen the mobile networks' feature representation capability for RGB-D SOD. IDR is only adopted in the training phase and is omitted during testing, so it is computationally free. Besides, we propose compact pyramid refinement (CPR) for efficient multi-level feature aggregation to derive salient objects with clear boundaries. With IDR and CPR incorporated, MobileSal performs favorably against state-of-the-art methods on six challenging RGB-D SOD datasets with much faster speed (450fps for the input size of 320×320) and fewer parameters (6.5M). The code is released at https://mmcheng.net/mobilesal.
Collapse
|
36
|
Three-stream interaction decoder network for RGB-thermal salient object detection. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.110007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
37
|
Chen T, Xiao J, Hu X, Zhang G, Wang S. Adaptive Fusion Network For RGB-D Salient Object Detection. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.12.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
38
|
Xu C, Li Q, Zhou Q, Jiang X, Yu D, Zhou Y. Asymmetric cross-modal activation network for RGB-T salient object detection. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.110047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
39
|
Zhao X, Pang Y, Zhang L, Lu H. Joint Learning of Salient Object Detection, Depth Estimation and Contour Extraction. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:7350-7362. [PMID: 36409818 DOI: 10.1109/tip.2022.3222641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Benefiting from color independence, illumination invariance and location discrimination attributed by the depth map, it can provide important supplemental information for extracting salient objects in complex environments. However, high-quality depth sensors are expensive and can not be widely applied. While general depth sensors produce the noisy and sparse depth information, which brings the depth-based networks with irreversible interference. In this paper, we propose a novel multi-task and multi-modal filtered transformer (MMFT) network for RGB-D salient object detection (SOD). Specifically, we unify three complementary tasks: depth estimation, salient object detection and contour estimation. The multi-task mechanism promotes the model to learn the task-aware features from the auxiliary tasks. In this way, the depth information can be completed and purified. Moreover, we introduce a multi-modal filtered transformer (MFT) module, which equips with three modality-specific filters to generate the transformer-enhanced feature for each modality. The proposed model works in a depth-free style during the testing phase. Experiments show that it not only significantly surpasses the depth-based RGB-D SOD methods on multiple datasets, but also precisely predicts a high-quality depth map and salient contour at the same time. And, the resulted depth map can help existing RGB-D SOD methods obtain significant performance gain.
Collapse
|
40
|
Liao X, Li J, Li L, Shangguan C, Huang S. RGBD Salient Object Detection, Based on Specific Object Imaging. SENSORS (BASEL, SWITZERLAND) 2022; 22:8973. [PMID: 36433571 PMCID: PMC9696882 DOI: 10.3390/s22228973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/03/2022] [Revised: 11/16/2022] [Accepted: 11/16/2022] [Indexed: 06/16/2023]
Abstract
RGBD salient object detection, based on the convolutional neural network, has achieved rapid development in recent years. However, existing models often focus on detecting salient object edges, instead of objects. Importantly, detecting objects can more intuitively display the complete information of the detection target. To take care of this issue, we propose a RGBD salient object detection method, based on specific object imaging, which can quickly capture and process important information on object features, and effectively screen out the salient objects in the scene. The screened target objects include not only the edge of the object, but also the complete feature information of the object, which realizes the detection and imaging of the salient objects. We conduct experiments on benchmark datasets and validate with two common metrics, and the results show that our method reduces the error by 0.003 and 0.201 (MAE) on D3Net and JLDCF, respectively. In addition, our method can still achieve a very good detection and imaging performance in the case of the greatly reduced training data.
Collapse
Affiliation(s)
- Xiaolian Liao
- School of Physics and Telecommunications Engineering, South China Normal University, Guangzhou 510006, China
| | - Jun Li
- School of Physics and Telecommunications Engineering, South China Normal University, Guangzhou 510006, China
- School of Electronics and Information Engineering, South China Normal University, Foshan 528225, China
| | - Leyi Li
- School of Physics and Telecommunications Engineering, South China Normal University, Guangzhou 510006, China
| | - Caoxi Shangguan
- School of Physics and Telecommunications Engineering, South China Normal University, Guangzhou 510006, China
| | - Shaoyan Huang
- School of Physics and Telecommunications Engineering, South China Normal University, Guangzhou 510006, China
| |
Collapse
|
41
|
Li Z, Lang C, Li G, Wang T, Li Y. Depth Guided Feature Selection for RGBD Salient Object Detection. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.11.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
42
|
Transformers and CNNs Fusion Network for Salient Object Detection. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.10.081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
43
|
Zong G, Wei L, Guo S, Wang Y. A cascaded refined rgb-d salient object detection network based on the attention mechanism. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04186-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
44
|
Liang Z, Wang P, Xu K, Zhang P, Lau RWH. Weakly-Supervised Salient Object Detection on Light Fields. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:6295-6305. [PMID: 36149997 DOI: 10.1109/tip.2022.3207605] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Most existing salient object detection (SOD) methods are designed for RGB images and do not take advantage of the abundant information provided by light fields. Hence, they may fail to detect salient objects of complex structures and delineate their boundaries. Although some methods have explored multi-view information of light field images for saliency detection, they require tedious pixel-level manual annotations of ground truths. In this paper, we propose a novel weakly-supervised learning framework for salient object detection on light field images based on bounding box annotations. Our method has two major novelties. First, given an input light field image and a bounding-box annotation indicating the salient object, we propose a ground truth label hallucination method to generate a pixel-level pseudo saliency map, to avoid heavy cost of pixel-level annotations. This method generates high quality pseudo ground truth saliency maps to help supervise the training, by exploiting information obtained from the light field (including depths and RGB images). Second, to exploit the multi-view nature of the light field data in learning, we propose a fusion attention module to calibrate the spatial and channel-wise light field representations. It learns to focus on informative features and suppress redundant information from the multi-view inputs. Based on these two novelties, we are able to train a new salient object detector with two branches in a weakly-supervised manner. While the RGB branch focuses on modeling the color contrast in the all-in-focus image for locating the salient objects, the Focal branch exploits the depth and the background spatial redundancy of focal slices for eliminating background distractions. Extensive experiments show that our method outperforms existing weakly-supervised methods and most fully supervised methods.
Collapse
|
45
|
Scribble-attention hierarchical network for weakly supervised salient object detection in optical remote sensing images. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04014-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
46
|
Bi H, Wu R, Liu Z, Zhang J, Zhang C, Xiang TZ, Wang X. PSNet: Parallel symmetric network for RGB-T salient object detection. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.09.052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
47
|
Fan DP, Ji GP, Cheng MM, Shao L. Concealed Object Detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:6024-6042. [PMID: 34061739 DOI: 10.1109/tpami.2021.3085766] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
We present the first systematic study on concealed object detection (COD), which aims to identify objects that are visually embedded in their background. The high intrinsic similarities between the concealed objects and their background make COD far more challenging than traditional object detection/segmentation. To better understand this task, we collect a large-scale dataset, called COD10K, which consists of 10,000 images covering concealed objects in diverse real-world scenarios from 78 object categories. Further, we provide rich annotations including object categories, object boundaries, challenging attributes, object-level labels, and instance-level annotations. Our COD10K is the largest COD dataset to date, with the richest annotations, which enables comprehensive concealed object understanding and can even be used to help progress several other vision tasks, such as detection, segmentation, classification etc. Motivated by how animals hunt in the wild, we also design a simple but strong baseline for COD, termed the Search Identification Network (SINet). Without any bells and whistles, SINet outperforms twelve cutting-edge baselines on all datasets tested, making them robust, general architectures that could serve as catalysts for future research in COD. Finally, we provide some interesting findings, and highlight several potential applications and future directions. To spark research in this new field, our code, dataset, and online demo are available at our project page: http://mmcheng.net/cod.
Collapse
|
48
|
Song M, Song W, Yang G, Chen C. Improving RGB-D Salient Object Detection via Modality-Aware Decoder. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:6124-6138. [PMID: 36112559 DOI: 10.1109/tip.2022.3205747] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Most existing RGB-D salient object detection (SOD) methods are primarily focusing on cross-modal and cross-level saliency fusion, which has been proved to be efficient and effective. However, these methods still have a critical limitation, i.e., their fusion patterns - typically the combination of selective characteristics and its variations, are too highly dependent on the network's non-linear adaptability. In such methods, the balances between RGB and D (Depth) are formulated individually considering the intermediate feature slices, but the relation at the modality level may not be learned properly. The optimal RGB-D combinations differ depending on the RGB-D scenarios, and the exact complementary status is frequently determined by multiple modality-level factors, such as D quality, the complexity of the RGB scene, and degree of harmony between them. Therefore, given the existing approaches, it may be difficult for them to achieve further performance breakthroughs, as their methodologies belong to some methods that are somewhat less modality sensitive. To conquer this problem, this paper presents the Modality-aware Decoder (MaD). The critical technical innovations include a series of feature embedding, modality reasoning, and feature back-projecting and collecting strategies, all of which upgrade the widely-used multi-scale and multi-level decoding process to be modality-aware. Our MaD achieves competitive performance over other state-of-the-art (SOTA) models without using any fancy tricks in the decoder's design. Codes and results will be publicly available at https://github.com/MengkeSong/MaD.
Collapse
|
49
|
SL-Net: self-learning and mutual attention-based distinguished window for RGBD complex salient object detection. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07772-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
50
|
Sun P, Zhang W, Li S, Guo Y, Song C, Li X. Learnable Depth-Sensitive Attention for Deep RGB-D Saliency Detection with Multi-modal Fusion Architecture Search. Int J Comput Vis 2022. [DOI: 10.1007/s11263-022-01646-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|