1
|
Zhou Q, Shi H, Xiang W, Kang B, Latecki LJ. DPNet: Dual-Path Network for Real-Time Object Detection With Lightweight Attention. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:4504-4518. [PMID: 38536700 DOI: 10.1109/tnnls.2024.3376563] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
The recent advances in compressing high-accuracy convolutional neural networks (CNNs) have witnessed remarkable progress in real-time object detection. To accelerate detection speed, lightweight detectors always have few convolution layers using a single-path backbone. Single-path architecture, however, involves continuous pooling and downsampling operations, always resulting in coarse and inaccurate feature maps that are disadvantageous to locate objects. On the other hand, due to limited network capacity, recent lightweight networks are often weak in representing large-scale visual data. To address these problems, we present a dual-path network, named DPNet, with a lightweight attention scheme for real-time object detection. The dual-path architecture enables us to extract in parallel high-level semantic features and low-level object details. Although DPNet has a nearly duplicated shape with respect to single-path detectors, the computational costs and model size are not significantly increased. To enhance representation capability, a lightweight self-correlation module (LSCM) is designed to capture global interactions, with only a few computational overheads and network parameters. In the neck, LSCM is extended into a lightweight cross correlation module (LCCM), capturing mutual dependencies among neighboring scale features. We have conducted exhaustive experiments on MS COCO, Pascal VOC 2007, and ImageNet datasets. The experimental results demonstrate that DPNet achieves a state-of-the-art trade off between detection accuracy and implementation efficiency. More specifically, DPNet achieves 31.3% AP on MS COCO test-dev, 82.7% mAP on Pascal VOC 2007 test set, and 41.6% mAP on ImageNet validation set, together with nearly 2.5M model size, 1.04 GFLOPs, and 164 and 196 frames/s (FPS) FPS for input images of three datasets.
Collapse
|
2
|
Wen F, Wang Q, Zou R, Wang Y, Liu F, Chen Y, Yu L, Du S, Yuan C. A Salient Object Detection Method Based on Boundary Enhancement. SENSORS (BASEL, SWITZERLAND) 2023; 23:7077. [PMID: 37631615 PMCID: PMC10458911 DOI: 10.3390/s23167077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 08/07/2023] [Accepted: 08/09/2023] [Indexed: 08/27/2023]
Abstract
Visual saliency refers to the human's ability to quickly focus on important parts of their visual field, which is a crucial aspect of image processing, particularly in fields like medical imaging and robotics. Understanding and simulating this mechanism is crucial for solving complex visual problems. In this paper, we propose a salient object detection method based on boundary enhancement, which is applicable to both 2D and 3D sensors data. To address the problem of large-scale variation of salient objects, our method introduces a multi-level feature aggregation module that enhances the expressive ability of fixed-resolution features by utilizing adjacent features to complement each other. Additionally, we propose a multi-scale information extraction module to capture local contextual information at different scales for back-propagated level-by-level features, which allows for better measurement of the composition of the feature map after back-fusion. To tackle the low confidence issue of boundary pixels, we also introduce a boundary extraction module to extract the boundary information of salient regions. This information is then fused with salient target information to further refine the saliency prediction results. During the training process, our method uses a mixed loss function to constrain the model training from two levels: pixels and images. The experimental results demonstrate that our salient target detection method based on boundary enhancement shows good detection effects on targets of different scales, multi-targets, linear targets, and targets in complex scenes. We compare our method with the best method in four conventional datasets and achieve an average improvement of 6.2% on the mean absolute error (MAE) indicators. Overall, our approach shows promise for improving the accuracy and efficiency of salient object detection in a variety of settings, including those involving 2D/3D semantic analysis and reconstruction/inpainting of image/video/point cloud data.
Collapse
Affiliation(s)
- Falin Wen
- School of Physics and Mechanical and Electrical Engineering, Longyan University, Longyan 364012, China; (F.W.); (Q.W.); (R.Z.); (Y.W.); (F.L.); (Y.C.)
| | - Qinghui Wang
- School of Physics and Mechanical and Electrical Engineering, Longyan University, Longyan 364012, China; (F.W.); (Q.W.); (R.Z.); (Y.W.); (F.L.); (Y.C.)
| | - Ruirui Zou
- School of Physics and Mechanical and Electrical Engineering, Longyan University, Longyan 364012, China; (F.W.); (Q.W.); (R.Z.); (Y.W.); (F.L.); (Y.C.)
| | - Ying Wang
- School of Physics and Mechanical and Electrical Engineering, Longyan University, Longyan 364012, China; (F.W.); (Q.W.); (R.Z.); (Y.W.); (F.L.); (Y.C.)
| | - Fenglin Liu
- School of Physics and Mechanical and Electrical Engineering, Longyan University, Longyan 364012, China; (F.W.); (Q.W.); (R.Z.); (Y.W.); (F.L.); (Y.C.)
| | - Yang Chen
- School of Physics and Mechanical and Electrical Engineering, Longyan University, Longyan 364012, China; (F.W.); (Q.W.); (R.Z.); (Y.W.); (F.L.); (Y.C.)
| | - Linghao Yu
- School of Software Engineering, Xi’an Jiaotong University, Xi’an 710049, China;
| | - Shaoyi Du
- Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an 710049, China;
| | - Chengzhi Yuan
- Department of Mechanical, Industrial and Systems Engineering, University of Rhode Island, Kingston, RI 02881, USA
| |
Collapse
|
3
|
Ndayikengurukiye D, Mignotte M. CoSOV1Net: A Cone- and Spatial-Opponent Primary Visual Cortex-Inspired Neural Network for Lightweight Salient Object Detection. SENSORS (BASEL, SWITZERLAND) 2023; 23:6450. [PMID: 37514744 PMCID: PMC10386563 DOI: 10.3390/s23146450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 07/12/2023] [Accepted: 07/14/2023] [Indexed: 07/30/2023]
Abstract
Salient object-detection models attempt to mimic the human visual system's ability to select relevant objects in images. To this end, the development of deep neural networks on high-end computers has recently achieved high performance. However, developing deep neural network models with the same performance for resource-limited vision sensors or mobile devices remains a challenge. In this work, we propose CoSOV1net, a novel lightweight salient object-detection neural network model, inspired by the cone- and spatial-opponent processes of the primary visual cortex (V1), which inextricably link color and shape in human color perception. Our proposed model is trained from scratch, without using backbones from image classification or other tasks. Experiments on the most widely used and challenging datasets for salient object detection show that CoSOV1Net achieves competitive performance (i.e., Fβ=0.931 on the ECSSD dataset) with state-of-the-art salient object-detection models while having a low number of parameters (1.14 M), low FLOPS (1.4 G) and high FPS (211.2) on GPU (Nvidia GeForce RTX 3090 Ti) compared to the state of the art in lightweight or nonlightweight salient object-detection tasks. Thus, CoSOV1net has turned out to be a lightweight salient object-detection model that can be adapted to mobile environments and resource-constrained devices.
Collapse
Affiliation(s)
- Didier Ndayikengurukiye
- Département d'Informatique et de Recherche Opérationnelle, Université de Montréal, Montreal, QC H3C 3J7, Canada
| | - Max Mignotte
- Département d'Informatique et de Recherche Opérationnelle, Université de Montréal, Montreal, QC H3C 3J7, Canada
| |
Collapse
|
4
|
Li S, Liu F, Jiao L, Liu X, Chen P. Learning Salient Feature for Salient Object Detection Without Labels. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:1012-1025. [PMID: 36227820 DOI: 10.1109/tcyb.2022.3209978] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Supervised salient object detection (SOD) methods achieve state-of-the-art performance by relying on human-annotated saliency maps, while unsupervised methods attempt to achieve SOD by not using any annotations. In unsupervised SOD, how to obtain saliency in a completely unsupervised manner is a huge challenge. Existing unsupervised methods usually gain saliency by introducing other handcrafted feature-based saliency methods. In general, the location information of salient objects is included in the feature maps. If the features belonging to salient objects are called salient features and the features that do not belong to salient objects, such as background, are called nonsalient features, by dividing the feature maps into salient features and nonsalient features in an unsupervised way, then the object at the location of the salient feature is the salient object. Based on the above motivation, a novel method called learning salient feature (LSF) is proposed, which achieves unsupervised SOD by LSF from the data itself. This method takes enhancing salient feature and suppressing nonsalient features as the objective. Furthermore, a salient object localization method is proposed to roughly locate objects where the salient feature is located, so as to obtain the salient activation map. Usually, the object in the salient activation map is incomplete and contains a lot of noise. To address this issue, a saliency map update strategy is introduced to gradually remove noise and strengthen boundaries. The visualization of images and their salient activation maps show that our method can effectively learn salient visual objects. Experiments show that we achieve superior unsupervised performance on a series of datasets.
Collapse
|
5
|
Fan DP, Zhang J, Xu G, Cheng MM, Shao L. Salient Objects in Clutter. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:2344-2366. [PMID: 35404809 DOI: 10.1109/tpami.2022.3166451] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In this paper, we identify and address a serious design bias of existing salient object detection (SOD) datasets, which unrealistically assume that each image should contain at least one clear and uncluttered salient object. This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets. However, these models are still far from satisfactory when applied to real-world scenes. Based on our analyses, we propose a new high-quality dataset and update the previous saliency benchmark. Specifically, our dataset, called Salient Objects in Clutter (SOC), includes images with both salient and non-salient objects from several common object categories. In addition to object category annotations, each salient image is accompanied by attributes that reflect common challenges in common scenes, which can help provide deeper insight into the SOD problem. Further, with a given saliency encoder, e.g., the backbone network, existing saliency models are designed to achieve mapping from the training image set to the training ground-truth set. We therefore argue that improving the dataset can yield higher performance gains than focusing only on the decoder design. With this in mind, we investigate several dataset-enhancement strategies, including label smoothing to implicitly emphasize salient boundaries, random image augmentation to adapt saliency models to various scenarios, and self-supervised learning as a regularization strategy to learn from small datasets. Our extensive results demonstrate the effectiveness of these tricks. We also provide a comprehensive benchmark for SOD, which can be found in our repository: https://github.com/DengPingFan/SODBenchmark.
Collapse
|
6
|
Song M, Song W, Yang G, Chen C. Improving RGB-D Salient Object Detection via Modality-Aware Decoder. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:6124-6138. [PMID: 36112559 DOI: 10.1109/tip.2022.3205747] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Most existing RGB-D salient object detection (SOD) methods are primarily focusing on cross-modal and cross-level saliency fusion, which has been proved to be efficient and effective. However, these methods still have a critical limitation, i.e., their fusion patterns - typically the combination of selective characteristics and its variations, are too highly dependent on the network's non-linear adaptability. In such methods, the balances between RGB and D (Depth) are formulated individually considering the intermediate feature slices, but the relation at the modality level may not be learned properly. The optimal RGB-D combinations differ depending on the RGB-D scenarios, and the exact complementary status is frequently determined by multiple modality-level factors, such as D quality, the complexity of the RGB scene, and degree of harmony between them. Therefore, given the existing approaches, it may be difficult for them to achieve further performance breakthroughs, as their methodologies belong to some methods that are somewhat less modality sensitive. To conquer this problem, this paper presents the Modality-aware Decoder (MaD). The critical technical innovations include a series of feature embedding, modality reasoning, and feature back-projecting and collecting strategies, all of which upgrade the widely-used multi-scale and multi-level decoding process to be modality-aware. Our MaD achieves competitive performance over other state-of-the-art (SOTA) models without using any fancy tricks in the decoder's design. Codes and results will be publicly available at https://github.com/MengkeSong/MaD.
Collapse
|
7
|
Wu YH, Liu Y, Zhang L, Cheng MM, Ren B. EDN: Salient Object Detection via Extremely-Downsampled Network. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:3125-3136. [PMID: 35412981 DOI: 10.1109/tip.2022.3164550] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Recent progress on salient object detection (SOD) mainly benefits from multi-scale learning, where the high-level and low-level features collaborate in locating salient objects and discovering fine details, respectively. However, most efforts are devoted to low-level feature learning by fusing multi-scale features or enhancing boundary representations. High-level features, which although have long proven effective for many other tasks, yet have been barely studied for SOD. In this paper, we tap into this gap and show that enhancing high-level features is essential for SOD as well. To this end, we introduce an Extremely-Downsampled Network (EDN), which employs an extreme downsampling technique to effectively learn a global view of the whole image, leading to accurate salient object localization. To accomplish better multi-level feature fusion, we construct the Scale-Correlated Pyramid Convolution (SCPC) to build an elegant decoder for recovering object details from the above extreme downsampling. Extensive experiments demonstrate that EDN achieves state-of-the-art performance with real-time speed. Our efficient EDN-Lite also achieves competitive performance with a speed of 316fps. Hence, this work is expected to spark some new thinking in SOD. Code is available at https://github.com/yuhuan-wu/EDN.
Collapse
|
8
|
Revise-Net: Exploiting Reverse Attention Mechanism for Salient Object Detection. REMOTE SENSING 2021. [DOI: 10.3390/rs13234941] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Recently, deep learning-based methods, especially utilizing fully convolutional neural networks, have shown extraordinary performance in salient object detection. Despite its success, the clean boundary detection of the saliency objects is still a challenging task. Most of the contemporary methods focus on exclusive edge detection modules in order to avoid noisy boundaries. In this work, we propose leveraging on the extraction of finer semantic features from multiple encoding layers and attentively re-utilize it in the generation of the final segmentation result. The proposed Revise-Net model is divided into three parts: (a) the prediction module, (b) a residual enhancement module, and (c) reverse attention modules. Firstly, we generate the coarse saliency map through the prediction modules, which are fine-tuned in the enhancement module. Finally, multiple reverse attention modules at varying scales are cascaded between the two networks to guide the prediction module by employing the intermediate segmentation maps generated at each downsampling level of the REM. Our method efficiently classifies the boundary pixels using a combination of binary cross-entropy, similarity index, and intersection over union losses at the pixel, patch, and map levels, thereby effectively segmenting the saliency objects in an image. In comparison with several state-of-the-art frameworks, our proposed Revise-Net model outperforms them with a significant margin on three publicly available datasets, DUTS-TE, ECSSD, and HKU-IS, both on regional and boundary estimation measures.
Collapse
|