Chen H, Deng Y, Li Y, Hung TY, Lin G. RGBD Salient Object Detection via Disentangled Cross-modal Fusion.
IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020;
PP:8407-8416. [PMID:
32784141 DOI:
10.1109/tip.2020.3014734]
[Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Depth is beneficial for salient object detection (SOD) for its additional saliency cues. Existing RGBD SOD methods focus on tailoring complicated cross-modal fusion topologies, which although achieve encouraging performance, are with a high risk of over-fitting and ambiguous in studying cross-modal complementarity. Different from these conventional approaches combining cross-modal features entirely without differentiating, we concentrate our attention on decoupling the diverse cross-modal complements to simplify the fusion process and enhance the fusion sufficiency. We argue that if cross-modal heterogeneous representations can be disentangled explicitly, the cross-modal fusion process can hold less uncertainty, while enjoying better adaptability. To this end, we design a disentangled cross-modal fusion network to expose structural and content representations from both modalities by cross-modal reconstruction. For different scenes, the disentangled representations allow the fusion module to easily identify, and incorporate desired complements for informative multi-modal fusion. Extensive experiments show the effectiveness of our designs and a large outperformance over state-of-the-art methods.
Collapse