1
|
CRCNet: Few-Shot Segmentation with Cross-Reference and Region–Global Conditional Networks. Int J Comput Vis 2022. [DOI: 10.1007/s11263-022-01677-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
2
|
Fan DP, Li T, Lin Z, Ji GP, Zhang D, Cheng MM, Fu H, Shen J. Re-Thinking Co-Salient Object Detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:4339-4354. [PMID: 33600309 DOI: 10.1109/tpami.2021.3060412] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In this article, we conduct a comprehensive study on the co-salient object detection (CoSOD) problem for images. CoSOD is an emerging and rapidly growing extension of salient object detection (SOD), which aims to detect the co-occurring salient objects in a group of images. However, existing CoSOD datasets often have a serious data bias, assuming that each group of images contains salient objects of similar visual appearances. This bias can lead to the ideal settings and effectiveness of models trained on existing datasets, being impaired in real-life situations, where similarities are usually semantic or conceptual. To tackle this issue, we first introduce a new benchmark, called CoSOD3k in the wild, which requires a large amount of semantic context, making it more challenging than existing CoSOD datasets. Our CoSOD3k consists of 3,316 high-quality, elaborately selected images divided into 160 groups with hierarchical annotations. The images span a wide range of categories, shapes, object sizes, and backgrounds. Second, we integrate the existing SOD techniques to build a unified, trainable CoSOD framework, which is long overdue in this field. Specifically, we propose a novel CoEG-Net that augments our prior model EGNet with a co-attention projection strategy to enable fast common information learning. CoEG-Net fully leverages previous large-scale SOD datasets and significantly improves the model scalability and stability. Third, we comprehensively summarize 40 cutting-edge algorithms, benchmarking 18 of them over three challenging CoSOD datasets (iCoSeg, CoSal2015, and our CoSOD3k), and reporting more detailed (i.e., group-level) performance analysis. Finally, we discuss the challenges and future works of CoSOD. We hope that our study will give a strong boost to growth in the CoSOD community. The benchmark toolbox and results are available on our project page at https://dpfan.net/CoSOD3K.
Collapse
|
3
|
Wang B, Tao D, Dong R, Tang Y, Gao X. A Contour Co-Tracking Method for Image Pairs. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:5402-5412. [PMID: 34003751 DOI: 10.1109/tip.2021.3079798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
We proposed a contour co-tracking method for co-segmentation of image pairs based on active contour model. Our method comprehensively re-models objects and backgrounds signified by level set functions, and leverages Hellinger distance to measure the similarity between image regions encoded by probability distributions. The main contribution are as follows. 1) The new energy functional, combining a rewarding and a penalty term, relaxes the assumptions of co-segmentation methods. 2) Hellinger distance, fulfilling the triangle inequality, ensures a coherence measurement between probability distributions in metric space, and contributes to finding a unique solution to the energy functional. The proposed contour co-tracking method was carefully verified against five representative methods on four popular datasets, i.e., the images pair dataset (105 pairs), MSRC dataset (30 pairs), iCoseg dataset (66 pairs) and Coseg-rep dataset (25 pairs). The comparison experiments suggest that our method achieves the competitive and even better performance compared to the state-of-the-art co-segmentation methods.
Collapse
|
4
|
Ma G, Li S, Chen C, Hao A, Qin H. Rethinking Image Salient Object Detection: Object-Level Semantic Saliency Reranking First, Pixelwise Saliency Refinement Later. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:4238-4252. [PMID: 33819154 DOI: 10.1109/tip.2021.3068649] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Human attention is an interactive activity between our visual system and our brain, using both low-level visual stimulus and high-level semantic information. Previous image salient object detection (SOD) studies conduct their saliency predictions via a multitask methodology in which pixelwise saliency regression and segmentation-like saliency refinement are conducted simultaneously. However, this multitask methodology has one critical limitation: the semantic information embedded in feature backbones might be degenerated during the training process. Our visual attention is determined mainly by semantic information, which is evidenced by our tendency to pay more attention to semantically salient regions even if these regions are not the most perceptually salient at first glance. This fact clearly contradicts the widely used multitask methodology mentioned above. To address this issue, this paper divides the SOD problem into two sequential steps. First, we devise a lightweight, weakly supervised deep network to coarsely locate the semantically salient regions. Next, as a postprocessing refinement, we selectively fuse multiple off-the-shelf deep models on the semantically salient regions identified by the previous step to formulate a pixelwise saliency map. Compared with the state-of-the-art (SOTA) models that focus on learning the pixelwise saliency in single images using only perceptual clues, our method aims at investigating the object-level semantic ranks between multiple images, of which the methodology is more consistent with the human attention mechanism. Our method is simple yet effective, and it is the first attempt to consider salient object detection as mainly an object-level semantic reranking problem.
Collapse
|
5
|
Wang L, Zhai C, Zhang Q, Tang W, Zheng N, Hua G. Graph-based temporal action co-localization from an untrimmed video. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.12.126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
6
|
Ren X, Li J, Hua Z, Jiang X. Consistent image processing based on co‐saliency. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY 2021. [DOI: 10.1049/cit2.12020] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Affiliation(s)
- Xiangnan Ren
- School of Computer Science and Technology Shandong Technology and Business University Yantai China
- Co‐innovation Center of Shandong Colleges and Universities Future Intelligent Computing, Shandong Technology and Business University Yantai China
| | - Jinjiang Li
- School of Computer Science and Technology Shandong Technology and Business University Yantai China
- Co‐innovation Center of Shandong Colleges and Universities Future Intelligent Computing, Shandong Technology and Business University Yantai China
| | - Zhen Hua
- Co‐innovation Center of Shandong Colleges and Universities Future Intelligent Computing, Shandong Technology and Business University Yantai China
- School of Information and Electronic Engineering Shandong Technology and Business University Yantai China
| | - Xinbo Jiang
- School of Computer Science and Technology Shandong Technology and Business University Yantai China
- Shandong Provincial Key Laboratory of Software Engineering Shandong University Jinan China
| |
Collapse
|
7
|
Jerripothula KR, Cai J, Lu J, Yuan J. Image Co-Skeletonization via Co-Segmentation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:2784-2797. [PMID: 33523810 DOI: 10.1109/tip.2021.3054464] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Recent advances in the joint processing of a set of images have shown its advantages over individual processing. Unlike the existing works geared towards co-segmentation or co-localization, in this article, we explore a new joint processing topic: image co-skeletonization, which is defined as joint skeleton extraction of the foreground objects in an image collection. It is well known that object skeletonization in a single natural image is challenging, because there is hardly any prior knowledge available about the object present in the image. Therefore, we resort to the idea of image co-skeletonization, hoping that the commonness prior that exists across the semantically similar images can be leveraged to have such knowledge, similar to other joint processing problems such as co-segmentation. Moreover, earlier research has found that augmenting a skeletonization process with the object's shape information is highly beneficial in capturing the image context. Having made these two observations, we propose a coupled framework for co-skeletonization and co-segmentation tasks to facilitate shape information discovery for our co-skeletonization process through the co-segmentation process. While image co-skeletonization is our primary goal, the co-segmentation process might also benefit, in turn, from exploiting skeleton outputs of the co-skeletonization process as central object seeds through such a coupled framework. As a result, both can benefit from each other synergistically. For evaluating image co-skeletonization results, we also construct a novel benchmark dataset by annotating nearly 1.8 K images and dividing them into 38 semantic categories. Although the proposed idea is essentially a weakly supervised method, it can also be employed in supervised and unsupervised scenarios. Extensive experiments demonstrate that the proposed method achieves promising results in all three scenarios.
Collapse
|
8
|
Zheng Y, Yang B, Sarem M. Hierarchical Image Segmentation Based on Nonsymmetry and Anti-Packing Pattern Representation Model. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:2408-2421. [PMID: 33493116 DOI: 10.1109/tip.2021.3052359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Image segmentation is the foundation of high-level image analysis and image understanding. How to effectively segment an image into regions that are "meaningful" to the human visual perception and ensure that the segmented regions are consistent at different resolutions is still a very challenging issue. Inspired by the idea of the Nonsymmetry and Anti-packing pattern representation Model in the Lab color space (NAMLab) and the "global-first" invariant perceptual theory, in this paper, we propose a novel framework for hierarchical image segmentation. Firstly, by defining the dissimilarity between two pixels in the Lab color space, we propose an NAMLab-based color image representation approach that is more in line with the human visual perception characteristics and can make the image pixels fast and effectively merge into the NAMLab blocks. Then, by defining the dissimilarity between two NAMLab-based regions and iteratively executing NAMLab-based merging algorithm of adjacent regions into larger ones to progressively generate a segmentation dendrogram, we propose a fast NAMLab-based algorithm for hierarchical image segmentation. Finally, the complexities of our proposed NAMLab-based algorithm for hierarchical image segmentation are analyzed in details. The experimental results presented in this paper show that our proposed algorithm when compared with the state-of-the-art algorithms not only can preserve more details of the object boundaries, but also it can better identify the foreground objects with similar color distributions. Also, our proposed algorithm can be executed much faster and takes up less memory and therefore it is a better algorithm for hierarchical image segmentation.
Collapse
|
9
|
Zhang Z, Lin Z, Xu J, Jin WD, Lu SP, Fan DP. Bilateral Attention Network for RGB-D Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:1949-1961. [PMID: 33439842 DOI: 10.1109/tip.2021.3049959] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
RGB-D salient object detection (SOD) aims to segment the most attractive objects in a pair of cross-modal RGB and depth images. Currently, most existing RGB-D SOD methods focus on the foreground region when utilizing the depth images. However, the background also provides important information in traditional SOD methods for promising performance. To better explore salient information in both foreground and background regions, this paper proposes a Bilateral Attention Network (BiANet) for the RGB-D SOD task. Specifically, we introduce a Bilateral Attention Module (BAM) with a complementary attention mechanism: foreground-first (FF) attention and background-first (BF) attention. The FF attention focuses on the foreground region with a gradual refinement style, while the BF one recovers potentially useful salient information in the background region. Benefited from the proposed BAM module, our BiANet can capture more meaningful foreground and background cues, and shift more attention to refining the uncertain details between foreground and background regions. Additionally, we extend our BAM by leveraging the multi-scale techniques for better SOD performance. Extensive experiments on six benchmark datasets demonstrate that our BiANet outperforms other state-of-the-art RGB-D SOD methods in terms of objective metrics and subjective visual comparison. Our BiANet can run up to 80 fps on 224×224 RGB-D images, with an NVIDIA GeForce RTX 2080Ti GPU. Comprehensive ablation studies also validate our contributions.
Collapse
|
10
|
Chen J, Chen Y, Li W, Ning G, Tong M, Hilton A. Channel and spatial attention based deep object co-segmentation. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2020.106550] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
11
|
Li R, Wu CH, Liu S, Wang J, Wang G, Liu G, Zeng B. SDP-GAN: Saliency Detail Preservation Generative Adversarial Networks for High Perceptual Quality Style Transfer. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 30:374-385. [PMID: 33186111 DOI: 10.1109/tip.2020.3036754] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The paper proposes a solution to effectively handle salient regions for style transfer between unpaired datasets. Recently, Generative Adversarial Networks (GAN) have demonstrated their potentials of translating images from source domain X to target domain Y in the absence of paired examples. However, such a translation cannot guarantee to generate high perceptual quality results. Existing style transfer methods work well with relatively uniform content, they often fail to capture geometric or structural patterns that always belong to salient regions. Detail losses in structured regions and undesired artifacts in smooth regions are unavoidable even if each individual region is correctly transferred into the target style. In this paper, we propose SDP-GAN, a GAN-based network for solving such problems while generating enjoyable style transfer results. We introduce a saliency network, which is trained with the generator simultaneously. The saliency network has two functions: (1) providing constraints for content loss to increase punishment for salient regions, and (2) supplying saliency features to generator to produce coherent results. Moreover, two novel losses are proposed to optimize the generator and saliency networks. The proposed method preserves the details on important salient regions and improves the total image perceptual quality. Qualitative and quantitative comparisons against several leading prior methods demonstrates the superiority of our method.
Collapse
|
12
|
Recent Advances in Saliency Estimation for Omnidirectional Images, Image Groups, and Video Sequences. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10155143] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
We present a review of methods for automatic estimation of visual saliency: the perceptual property that makes specific elements in a scene stand out and grab the attention of the viewer. We focus on domains that are especially recent and relevant, as they make saliency estimation particularly useful and/or effective: omnidirectional images, image groups for co-saliency, and video sequences. For each domain, we perform a selection of recent methods, we highlight their commonalities and differences, and describe their unique approaches. We also report and analyze the datasets involved in the development of such methods, in order to reveal additional peculiarities of each domain, such as the representation used for the ground truth saliency information (scanpaths, saliency maps, or salient object regions). We define domain-specific evaluation measures, and provide quantitative comparisons on the basis of common datasets and evaluation criteria, highlighting the different impact of existing approaches on each domain. We conclude by synthesizing the emerging directions for research in the specialized literature, which include novel representations for omnidirectional images, inter- and intra- image saliency decomposition for co-saliency, and saliency shift for video saliency estimation.
Collapse
|
13
|
Kompella A, Kulkarni RV. Weakly supervised multi-scale recurrent convolutional neural network for co-saliency detection and co-segmentation. Neural Comput Appl 2019. [DOI: 10.1007/s00521-019-04265-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
14
|
Li Z, Lang C, Feng J, Li Y, Wang T, Feng S. Co-saliency Detection with Graph Matching. ACM T INTEL SYST TEC 2019. [DOI: 10.1145/3313874] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Recently, co-saliency detection, which aims to automatically discover common and salient objects appeared in several relevant images, has attracted increased interest in the computer vision community. In this article, we present a novel graph-matching based model for co-saliency detection in image pairs. A solution of graph matching is proposed to integrate the visual appearance, saliency coherence, and spatial structural continuity for detecting co-saliency collaboratively. Since the saliency and the visual similarity have been seamlessly integrated, such a joint inference schema is able to produce more accurate and reliable results. More concretely, the proposed model first computes the intra-saliency for each image by aggregating multiple saliency cues. The common and salient regions across multiple images are thus discovered via a graph matching procedure. Then, a graph reconstruction scheme is proposed to refine the intra-saliency iteratively. Compared to existing co-saliency detection methods that only utilize visual appearance cues, our proposed model can effectively exploit both visual appearance and structure information to better guide co-saliency detection. Extensive experiments on several challenging image pair databases demonstrate that our model outperforms state-of-the-art baselines significantly.
Collapse
Affiliation(s)
- Zun Li
- Beijing Jiaotong University, Beijing, China
| | | | | | - Yidong Li
- Beijing Jiaotong University, Beijing, China
| | - Tao Wang
- Beijing Jiaotong University, Beijing, China
| | | |
Collapse
|