Lv P, Ren J, Han G, Lu J, Xu M. Local Cross-Patch Activation From Multi-Direction for Weakly Supervised Object Localization.
IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2025;
34:2213-2227. [PMID:
40168203 DOI:
10.1109/tip.2025.3554398]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/03/2025]
Abstract
Weakly supervised object localization (WSOL) learns to localize objects using only image-level labels. Recently, some studies apply transformers in WSOL to capture the long-range feature dependency and alleviate the partial activation issue of CNN-based methods. However, existing transformer-based methods still face two challenges. The first challenge is the over-activation of backgrounds. Specifically, the object boundaries and background are often semantically similar, and localization models may misidentify the background as a part of objects. The second challenge is the incomplete activation of occluded objects, since transformer architecture makes it difficult to capture local features across patches due to ignoring semantic and spatial coherence. To address these issues, in this paper, we propose LCA-MD, a novel transformer-based WSOL method using local cross-patch activation from multi-direction, which can capture more details of local features while inhibiting the background over-activation. In LCA-MD, first, combining contrastive learning with the transformer, we propose a token feature contrast module (TCM) that can maximize the difference between foregrounds and backgrounds and further separate them more accurately. Second, we propose a semantic-spatial fusion module (SFM), which leverages multi-directional perception to capture the local cross-patch features and diffuse activation across occlusions. Experiment results on the CUB-200-2011 and ILSVRC datasets demonstrate that our LCA-MD is significantly superior and has achieved state-of-the-art results in WSOL. The project code is available at https://github.com/rjy-fighting/LCA-MD.
Collapse