1
|
Ren P, Bai T, Sun F. Bio-inspired two-stage network for efficient RGB-D salient object detection. Neural Netw 2025; 185:107244. [PMID: 39933318 DOI: 10.1016/j.neunet.2025.107244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2024] [Revised: 01/15/2025] [Accepted: 01/30/2025] [Indexed: 02/13/2025]
Abstract
Recently, with the development of the Convolutional Neural Network and Vision Transformer, the detection accuracy of the RGB-D salient object detection (SOD) model has been greatly improved. However, most of the existing methods cannot balance computational efficiency and performance well. In this paper, inspired by the P visual pathway and the M visual pathway in the primate biological visual system, we propose a Bio-inspired Two-stage Network for Efficient RGB-D SOD, named BTNet. It simulates the visual information processing of the P visual pathway and the M visual pathway. Specifically, BTNet contains two stages: region locking and object refinement. Among them, the region locking stage simulates the visual information processing process of the M visual pathway to obtain coarse-grained visual representation. The object refinement stage simulates the visual information processing process of the P visual pathway to obtain fine-grained visual representation. Experimental results show that BTNet outperforms other state-of-the-art methods on six mainstream benchmark datasets, achieving significant parameter reduction and processing 384 × 384 resolution images at a speed of 175.4 Frames Per Second (FPS). Compared with the cutting-edge method CPNet, BTNet reduces parameters by 93.6% and is nearly 7.2 times faster. The source codes are available at https://github.com/ROC-Star/BTNet.
Collapse
Affiliation(s)
- Peng Ren
- College of Computer Science and Technology, Jilin University, Changchun 130012, China; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012, China
| | - Tian Bai
- College of Computer Science and Technology, Jilin University, Changchun 130012, China; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012, China.
| | - Fuming Sun
- School of Information and Communication Engineering, Dalian Minzu University, Dalian 116600, China
| |
Collapse
|
2
|
Jiang Q, Cheng J, Wu Z, Cong R, Timofte R. High-Precision Dichotomous Image Segmentation With Frequency and Scale Awareness. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:8619-8631. [PMID: 39150797 DOI: 10.1109/tnnls.2024.3426529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/18/2024]
Abstract
Dichotomous image segmentation (DIS) with rich fine-grained details within a single image is a challenging task. Despite the plausible results achieved by deep learning-based methods, most of them fail to segment generic objects when the boundary is cluttered with the background. In fact, the gradual decrease in feature map resolution during the encoding stage and the misleading texture clue may be the main issues. To handle these issues, we devise a novel frequency- and scale-aware deep neural network (FSANet) for high-precision DIS. The core of our proposed FSANet is twofold. First, a multimodality fusion (MF) module that integrates the information in spatial and frequency domains is adopted to enhance the representation capability of image features. Second, a collaborative scale fusion module (CSFM) which deviates from the traditional serial structures is introduced to maintain high resolution during the entire feature encoding stage. In the decoder side, we introduce hierarchical context fusion (HCF) and selective feature fusion (SFF) modules to infer the segmentation results from the output features of the CSFM module. We conduct extensive experiments on several benchmark datasets and compare our proposed method with existing state-of-the-art (SOTA) methods. The experimental results demonstrate that our FSANet achieves superior performance both qualitatively and quantitatively. The code will be made available at https://github.com/chasecjg/FSANet.
Collapse
|
3
|
Zheng C, Lu J, Hu K, Xiang Q, Miao L. A ternary encoding network fusing scale awareness and large kernel attention for camouflaged object detection. Sci Rep 2025; 15:14345. [PMID: 40274958 PMCID: PMC12022351 DOI: 10.1038/s41598-025-97857-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2025] [Accepted: 04/08/2025] [Indexed: 04/26/2025] Open
Abstract
To address the issues of structural information loss and object occlusion arising from existing camouflaged object detection methods in handling complex situations, we propose a novel network that integrates scale awareness and enhanced large kernel attention (SALK-Net). Specifically, our network takes ternary images as input to mine the additional information contained at different scales. Firstly, we use a shared feature encoder to extract features and align channels from multi-scale input images. Secondly, enhanced large kernel attention is introduced to guide the fusion of scale features, which aims to fully perceive global semantic information and minimize the loss of valuable clues. Thirdly, in the designed hybrid-scale mixed-scale decoder, we adopt a progressive structure to explore and gradually accumulate the clue information contained in the feature channels. Finally, a dynamic weighting strategy for boundary and structure is introduced to loss constraints together with prior knowledge to help the model predict challenging pixels. We compared the proposed model with 12 state-of-the-art methods in 4 public datasets. The results were then assessed on 4 metrics. The structural similarity measure and enhanced alignment measure in a large trained dataset reached 0.861 and 0.927 respectively whereas 0.872 and 0.926 respectively in the untrained large dataset, which demonstrates the competitiveness of our method over state-of-the-art methods.
Collapse
Affiliation(s)
- Chaoquan Zheng
- School of Information and Control Engineering, Southwest University of Science and Technology, Mianyang, 621010, China
| | - Jinzheng Lu
- School of Information and Control Engineering, Southwest University of Science and Technology, Mianyang, 621010, China.
| | - Kun Hu
- School of Information and Control Engineering, Southwest University of Science and Technology, Mianyang, 621010, China
| | - Qiang Xiang
- School of Information and Control Engineering, Southwest University of Science and Technology, Mianyang, 621010, China
| | - Ling Miao
- School of Information and Control Engineering, Southwest University of Science and Technology, Mianyang, 621010, China
| |
Collapse
|
4
|
Sun K, Chen Z, Lin X, Sun X, Liu H, Ji R. Conditional Diffusion Models for Camouflaged and Salient Object Detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2025; 47:2833-2848. [PMID: 40030981 DOI: 10.1109/tpami.2025.3527469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Camouflaged Object Detection (COD) poses a significant challenge in computer vision, playing a critical role in applications. Existing COD methods often exhibit challenges in accurately predicting nuanced boundaries with high-confidence predictions. In this work, we introduce CamoDiffusion, a new learning method that employs a conditional diffusion model to generate masks that progressively refine the boundaries of camouflaged objects. In particular, we first design an adaptive transformer conditional network, specifically designed for integration into a Denoising Network, which facilitates iterative refinement of the saliency masks. Second, based on the classical diffusion model training, we investigate a variance noise schedule and a structure corruption strategy, which aim to enhance the accuracy of our denoising model by effectively handling uncertain input. Third, we introduce a Consensus Time Ensemble technique, which integrates intermediate predictions using a sampling mechanism, thus reducing overconfidence and incorrect predictions. Finally, we conduct extensive experiments on three benchmark datasets that show that: 1) the efficacy and universality of our method is demonstrated in both camouflaged and salient object detection tasks. 2) compared to existing state-of-the-art methods, CamoDiffusion demonstrates superior performance 3) CamoDiffusion offers flexible enhancements, such as an accelerated version based on the VQ-VAE model and a skip approach.
Collapse
|
5
|
Li G, Wang J, Wei J, Xu Z. IRFNet: Cognitive-Inspired Iterative Refinement Fusion Network for Camouflaged Object Detection. SENSORS (BASEL, SWITZERLAND) 2025; 25:1555. [PMID: 40096411 PMCID: PMC11902440 DOI: 10.3390/s25051555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2025] [Revised: 02/27/2025] [Accepted: 02/27/2025] [Indexed: 03/19/2025]
Abstract
Camouflaged Object Detection (COD) aims to identify objects that are intentionally concealed within their surroundings through appearance, texture, or pattern adaptations. Despite recent advances, extreme object-background similarity causes existing methods struggle with accurately capturing discriminative features and effectively modeling multiscale patterns while preserving fine details. To address these challenges, we propose Iterative Refinement Fusion Network (IRFNet), a novel framework that mimics human visual cognition through progressive feature enhancement and iterative optimization. Our approach incorporates the following: (1) a Hierarchical Feature Enhancement Module (HFEM) coupled with a dynamic channel-spatial attention mechanism, which enriches multiscale feature representations through bilateral and trilateral fusion pathways; and (2) a Context-guided Iterative Optimization Framework (CIOF) that combines transformer-based global context modeling with iterative refinement through dual-branch supervision. Extensive experiments on three challenging benchmark datasets (CAMO, COD10K, and NC4K) demonstrate that IRFNet consistently outperforms fourteen state-of-the-art methods, achieving improvements of 0.9-13.7% across key metrics. Comprehensive ablation studies validate the effectiveness of each proposed component and demonstrate how our iterative refinement strategy enables progressive improvement in detection accuracy.
Collapse
Affiliation(s)
- Guohan Li
- Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201210, China; (G.L.); (J.W.)
- School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jingxin Wang
- Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201210, China; (G.L.); (J.W.)
- School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China
| | - Jianming Wei
- Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201210, China; (G.L.); (J.W.)
| | - Zhengyi Xu
- Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201210, China; (G.L.); (J.W.)
| |
Collapse
|
6
|
Song J, Wang Z, Xue K, Chen Y, Guo G, Li M, Nandi AK. A texture enhanced attention model for defect detection in thermal protection materials. Sci Rep 2025; 15:4864. [PMID: 39929950 PMCID: PMC11811214 DOI: 10.1038/s41598-025-89376-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2024] [Accepted: 02/05/2025] [Indexed: 02/13/2025] Open
Abstract
Thermal protection materials are widely used in the aerospace field, where detecting internal defects is crucial for ensuring spacecraft structural integrity and safety in extreme temperature environments. Existing detection models struggle with these materials due to challenges like defect-background similarity, tiny size, and multi-scale characteristics. Besides, there is a lack of defect datasets in real-world scenarios. To address these issues, we first construct a thermal protection material digital radiographic (DR) image dataset (TPMDR-dataset), which contains 670 images from actual production and 6,269 defect instances annotated under expert guidance. And we propose an innovative texture-enhanced attention defect detection (TADD) model that enables accurate, efficient, and real-time defect detection. To implement the TADD model, we design a texture enhancement module that can enhance the concealed defect textures and features. Then we develop a non-local dual attention module to address the issue of severe feature loss in tiny defects. Moreover, we improve the model's ability to detect multi-scale defects through a path aggregation network. The evaluation on the TPMDR-dataset and public dataset shows that the TADD model achieves a higher mean Average Precision (mAP) compared to other methods while maintaining 25 frames per second, exceeding the baseline model by 11.05%.
Collapse
Affiliation(s)
- Jialin Song
- School of Information and Communication Engineering, North University of China, Taiyuan, 030051, China
| | - Zhaoba Wang
- School of Information and Communication Engineering, North University of China, Taiyuan, 030051, China.
- National Key Laboratory of Dynamic Measurement Technology, North University of China, Taiyuan, 030051, China.
| | - Kailiang Xue
- School of Information and Communication Engineering, North University of China, Taiyuan, 030051, China
| | - Youxing Chen
- School of Information and Communication Engineering, North University of China, Taiyuan, 030051, China
- National Key Laboratory of Dynamic Measurement Technology, North University of China, Taiyuan, 030051, China
| | - Guodong Guo
- School of Information and Communication Engineering, North University of China, Taiyuan, 030051, China
| | - Maozhen Li
- Department of Electronic and Electrical Engineering, Brunel University London, Uxbridge, UB8 3PH, UK
| | - Asoke K Nandi
- Department of Electronic and Electrical Engineering, Brunel University London, Uxbridge, UB8 3PH, UK
| |
Collapse
|
7
|
Hao C, Yu Z, Liu X, Xu J, Yue H, Yang J. A Simple yet Effective Network based on Vision Transformer for Camouflaged Object and Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2025; PP:608-622. [PMID: 40030902 DOI: 10.1109/tip.2025.3528347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Camouflaged object detection (COD) and salient object detection (SOD) are two distinct yet closely-related computer vision tasks widely studied during the past decades. Though sharing the same purpose of segmenting an image into binary foreground and background regions, their distinction lies in the fact that COD focuses on concealed objects hidden in the image, while SOD concentrates on the most prominent objects in the image. Building universal segmentation models is currently a hot topic in the community. Previous works achieved good performance on certain task by stacking various hand-designed modules and multi-scale features. However, these careful task-specific designs also make them lose their potential as general-purpose architectures. Therefore, we hope to build general architectures that can be applied to both tasks. In this work, we propose a simple yet effective network (SENet) based on vision Transformer (ViT), by employing a simple design of an asymmetric ViT-based encoder-decoder structure, we yield competitive results on both tasks, exhibiting greater versatility than meticulously crafted ones. To enhance the performance of universal architectures on both tasks, we propose some general methods targeting some common difficulties of the two tasks. First, we use image reconstruction as an auxiliary task during training to increase the difficulty of training, forcing the network to have a better perception of the image as a whole to help with segmentation tasks. In addition, we propose a local information capture module (LICM) to make up for the limitations of the patch-level attention mechanism in pixel-level COD and SOD tasks and a dynamic weighted loss (DW loss) to solve the problem that small target samples are more difficult to locate and segment in both tasks. Finally, we also conduct a preliminary exploration of joint training, trying to use one model to complete two tasks simultaneously. Extensive experiments on multiple benchmark datasets demonstrate the effectiveness of our method. The code is available at https://github.com/linuxsino/SENet.
Collapse
|
8
|
Jha D, Sharma V, Banik D, Bhattacharya D, Roy K, Hicks SA, Tomar NK, Thambawita V, Krenzer A, Ji GP, Poudel S, Batchkala G, Alam S, Ahmed AMA, Trinh QH, Khan Z, Nguyen TP, Shrestha S, Nathan S, Gwak J, Jha RK, Zhang Z, Schlaefer A, Bhattacharjee D, Bhuyan MK, Das PK, Fan DP, Parasa S, Ali S, Riegler MA, Halvorsen P, de Lange T, Bagci U. Validating polyp and instrument segmentation methods in colonoscopy through Medico 2020 and MedAI 2021 Challenges. Med Image Anal 2025; 99:103307. [PMID: 39303447 DOI: 10.1016/j.media.2024.103307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 08/11/2024] [Accepted: 08/12/2024] [Indexed: 09/22/2024]
Abstract
Automatic analysis of colonoscopy images has been an active field of research motivated by the importance of early detection of precancerous polyps. However, detecting polyps during the live examination can be challenging due to various factors such as variation of skills and experience among the endoscopists, lack of attentiveness, and fatigue leading to a high polyp miss-rate. Therefore, there is a need for an automated system that can flag missed polyps during the examination and improve patient care. Deep learning has emerged as a promising solution to this challenge as it can assist endoscopists in detecting and classifying overlooked polyps and abnormalities in real time, improving the accuracy of diagnosis and enhancing treatment. In addition to the algorithm's accuracy, transparency and interpretability are crucial to explaining the whys and hows of the algorithm's prediction. Further, conclusions based on incorrect decisions may be fatal, especially in medicine. Despite these pitfalls, most algorithms are developed in private data, closed source, or proprietary software, and methods lack reproducibility. Therefore, to promote the development of efficient and transparent methods, we have organized the "Medico automatic polyp segmentation (Medico 2020)" and "MedAI: Transparency in Medical Image Segmentation (MedAI 2021)" competitions. The Medico 2020 challenge received submissions from 17 teams, while the MedAI 2021 challenge also gathered submissions from another 17 distinct teams in the following year. We present a comprehensive summary and analyze each contribution, highlight the strength of the best-performing methods, and discuss the possibility of clinical translations of such methods into the clinic. Our analysis revealed that the participants improved dice coefficient metrics from 0.8607 in 2020 to 0.8993 in 2021 despite adding diverse and challenging frames (containing irregular, smaller, sessile, or flat polyps), which are frequently missed during a routine clinical examination. For the instrument segmentation task, the best team obtained a mean Intersection over union metric of 0.9364. For the transparency task, a multi-disciplinary team, including expert gastroenterologists, accessed each submission and evaluated the team based on open-source practices, failure case analysis, ablation studies, usability and understandability of evaluations to gain a deeper understanding of the models' credibility for clinical deployment. The best team obtained a final transparency score of 21 out of 25. Through the comprehensive analysis of the challenge, we not only highlight the advancements in polyp and surgical instrument segmentation but also encourage subjective evaluation for building more transparent and understandable AI-based colonoscopy systems. Moreover, we discuss the need for multi-center and out-of-distribution testing to address the current limitations of the methods to reduce the cancer burden and improve patient care.
Collapse
Affiliation(s)
- Debesh Jha
- Machine & Hybrid Intelligence Lab, Department of Radiology, Northwestern University, Chicago, USA.
| | | | | | - Debayan Bhattacharya
- Institute of Medical Technology and Intelligent Systems, Technische Universität Hamburg, Germany
| | | | | | - Nikhil Kumar Tomar
- Machine & Hybrid Intelligence Lab, Department of Radiology, Northwestern University, Chicago, USA
| | | | | | - Ge-Peng Ji
- College of Engineering, Australian National University, Canberra, Australia
| | - Sahadev Poudel
- Department of IT Convergence Engineering, Gachon University, Seongnam 13120, South Korea
| | - George Batchkala
- Department of Engineering Science, University of Oxford, Oxford, UK
| | | | | | - Quoc-Huy Trinh
- Faculty of Information Technology, University of Science, VNU-HCM, Viet Nam
| | - Zeshan Khan
- National University of Computer and Emerging Sciences, Karachi Campus, Pakistan
| | - Tien-Phat Nguyen
- Faculty of Information Technology, University of Science, VNU-HCM, Viet Nam
| | - Shruti Shrestha
- NepAL Applied Mathematics and Informatics Institute for Research (NAAMII), Kathmandu, Nepal
| | | | - Jeonghwan Gwak
- Department of Software, Korea National University of Transportation, Chungju-si, South Korea
| | - Ritika K Jha
- Machine & Hybrid Intelligence Lab, Department of Radiology, Northwestern University, Chicago, USA
| | - Zheyuan Zhang
- Machine & Hybrid Intelligence Lab, Department of Radiology, Northwestern University, Chicago, USA
| | - Alexander Schlaefer
- Institute of Medical Technology and Intelligent Systems, Technische Universität Hamburg, Germany
| | | | - M K Bhuyan
- Indian Institute of Technology, Guwahati, India
| | | | - Deng-Ping Fan
- Computer Vision Lab (CVL), ETH Zurich, Zurich, Switzerland
| | | | - Sharib Ali
- School of Computing, University of Leeds, LS2 9JT, Leeds, United Kingdom
| | - Michael A Riegler
- SimulaMet, Oslo, Norway; Oslo Metropolitan University, Oslo, Norway.
| | - Pål Halvorsen
- SimulaMet, Oslo, Norway; Oslo Metropolitan University, Oslo, Norway
| | - Thomas de Lange
- Department of Medicine and Emergencies - Mölndal Sahlgrenska University Hospital, Region Västra Götaland, Sweden; Department of Molecular and Clinical Medicin, Sahlgrenska Academy, University of Gothenburg, Sweden
| | - Ulas Bagci
- Machine & Hybrid Intelligence Lab, Department of Radiology, Northwestern University, Chicago, USA
| |
Collapse
|
9
|
Song Z, Kang X, Wei X, Li S. Pixel-Centric Context Perception Network for Camouflaged Object Detection. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:18576-18589. [PMID: 37819817 DOI: 10.1109/tnnls.2023.3319323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/13/2023]
Abstract
Camouflaged object detection (COD) aims to identify object pixels visually embedded in the background environment. Existing deep learning methods fail to utilize the context information around different pixels adequately and efficiently. In order to solve this problem, a novel pixel-centric context perception network (PCPNet) is proposed, the core of which is to customize the personalized context of each pixel based on the automatic estimation of its surroundings. Specifically, PCPNet first employs an elegant encoder equipped with the designed vital component generation (VCG) module to obtain a set of compact features rich in low-level spatial and high-level semantic information across multiple subspaces. Then, we present a parameter-free pixel importance estimation (PIE) function based on multiwindow information fusion. Object pixels with complex backgrounds will be assigned with higher PIE values. Subsequently, PIE is utilized to regularize the optimization loss. In this way, the network can pay more attention to those pixels with higher PIE values in the decoding stage. Finally, a local continuity refinement module (LCRM) is used to refine the detection results. Extensive experiments on four COD benchmarks, five salient object detection (SOD) benchmarks, and five polyp segmentation benchmarks demonstrate the superiority of PCPNet with respect to other state-of-the-art methods.
Collapse
|
10
|
Zhang C, Bi H, Xiang TZ, Wu R, Tong J, Wang X. Collaborative Camouflaged Object Detection: A Large-Scale Dataset and Benchmark. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:18470-18484. [PMID: 37889825 DOI: 10.1109/tnnls.2023.3317091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/29/2023]
Abstract
In this article, we provide a comprehensive study of a new task called collaborative camouflaged object detection (CoCOD), which aims to simultaneously detect camouflaged objects with the same properties from a group of relevant images. To this end, we meticulously construct the first large-scale dataset, termed CoCOD8K, which consists of 8528 high-quality and elaborately selected images with object mask annotations, covering five superclasses and 70 subclasses. The dataset spans a wide range of natural and artificial camouflage scenes with diverse object appearances and backgrounds, making it a very challenging dataset for CoCOD. Besides, we propose the first baseline model for CoCOD, named bilateral-branch network (BBNet), which explores and aggregates co-camouflaged cues within a single image and between images within a group, respectively, for accurate camouflaged object detection (COD) in given images. This is implemented by an interimage collaborative feature exploration (CFE) module, an intraimage object feature search (OFS) module, and a local-global refinement (LGR) module. We benchmark 18 state-of-the-art (SOTA) models, including 12 COD algorithms and six CoSOD algorithms, on the proposed CoCOD8K dataset under five widely used evaluation metrics. Extensive experiments demonstrate the effectiveness of the proposed method and the significantly superior performance compared to other competitors. We hope that our proposed dataset and model will boost growth in the COD community. The dataset, model, and results will be available at: https://github.com/zc199823/BBNet-CoCOD.
Collapse
|
11
|
Ju J, Qiu D, Lei H, Ren S, Zhao W, Xu P, Zhao X, Guan Z. CDI-NSTSEG: A Clinical Diagnosis-Inspired Effective and Efficient Framework for Non-Salient Small Tumor Segmentation. IEEE J Biomed Health Inform 2024; 28:7469-7479. [PMID: 39120985 DOI: 10.1109/jbhi.2024.3440925] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/11/2024]
Abstract
To accurately segment various clinical lesions from computed tomography(CT) images is a critical task for the diagnosis and treatment of many diseases. However, current segmentation frameworks are tailored to specific diseases, and limited frameworks can detect and segment different types of lesions. Besides, it is another challenging problem for current segmentation frameworks to segment visually inconspicuous and small-scale tumors (such as small intestinal stromal tumors and pancreatic tumors). Our proposed framework, CDI-NSTSEG, efficiently segments small non-salient tumors using multi-scale visual information and non-local target mining. CDI-NSTSEG follows the diagnostic process of clinicians, including preliminary screening, localization, refinement, and segmentation. Specifically, we first explore to extract the unique features at three different scales (1×, 0.5×, and 1.5×) based on the scale space theory. Our proposed scale fusion module (SFM) hierarchically fuses features to obtain a comprehensive representation, similar to preliminary screening in clinical diagnosis. The global localization module (GLM) is designed with a non-local attention mechanism. It captures the long-range semantic dependencies of channels and spatial locations from the fused features. GLM enables us to locate the tumor from a global perspective and output the initial prediction results. Finally, we design the layer focusing module (LFM) to gradually refine the initial results. LFM mainly conducts context exploration based on foreground and background features, focuses on suspicious areas layer-by-layer, and performs element-by-element addition and subtraction to eliminate errors. Our framework achieves state-of-the-art segmentation performance on small intestinal stromal tumor and pancreatic tumor datasets.
Collapse
|
12
|
Xu W, Xu R, Wang C, Li X, Xu S, Guo L. PSTNet: Enhanced Polyp Segmentation With Multi-Scale Alignment and Frequency Domain Integration. IEEE J Biomed Health Inform 2024; 28:6042-6053. [PMID: 38954569 DOI: 10.1109/jbhi.2024.3421550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/04/2024]
Abstract
Accurate segmentation of colorectal polyps in colonoscopy images is crucial for effective diagnosis and management of colorectal cancer (CRC). However, current deep learning-based methods primarily rely on fusing RGB information across multiple scales, leading to limitations in accurately identifying polyps due to restricted RGB domain information and challenges in feature misalignment during multi-scale aggregation. To address these limitations, we propose the Polyp Segmentation Network with Shunted Transformer (PSTNet), a novel approach that integrates both RGB and frequency domain cues present in the images. PSTNet comprises three key modules: the Frequency Characterization Attention Module (FCAM) for extracting frequency cues and capturing polyp characteristics, the Feature Supplementary Alignment Module (FSAM) for aligning semantic information and reducing misalignment noise, and the Cross Perception localization Module (CPM) for synergizing frequency cues with high-level semantics to achieve efficient polyp segmentation. Extensive experiments on challenging datasets demonstrate PSTNet's significant improvement in polyp segmentation accuracy across various metrics, consistently outperforming state-of-the-art methods. The integration of frequency domain cues and the novel architectural design of PSTNet contribute to advancing computer-assisted polyp segmentation, facilitating more accurate diagnosis and management of CRC.
Collapse
|
13
|
Hao Q, Ren R, Niu S, Wang K, Wang M, Zhang J. UGEE-Net: Uncertainty-guided and edge-enhanced network for image splicing localization. Neural Netw 2024; 178:106430. [PMID: 38870563 DOI: 10.1016/j.neunet.2024.106430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 05/29/2024] [Accepted: 06/02/2024] [Indexed: 06/15/2024]
Abstract
Image splicing, a prevalent method for image tampering, has significantly undermined image authenticity. Existing methods for Image Splicing Localization (ISL) struggle with challenges like limited accuracy and subpar performance when dealing with imperceptible tampering and multiple tampered regions. We introduce an Uncertainty-Guided and Edge-Enhanced Network (UGEE-Net) for ISL to tackle these issues. UGEE-Net consists of two core tasks: uncertainty guidance and edge enhancement. We employ Bayesian learning to model uncertainty maps of tampered regions, directing the model's focus to challenging pixels. Simultaneously, we employ a frequency domain-auxiliary edge enhancement strategy to imbue localization features with global contour information and fine-grained local details. These mechanisms work in parallel, synergistically boosting performance. Additionally, we introduce a cross-level fusion and propagation mechanism that effectively utilizes contextual information for cross-layer feature integration and leverages channel-level correlations for cross-layer feature propagation, gradually enhancing the localization feature's details. Experiment results affirm UGEE-Net's superiority in terms of detection accuracy, robustness, and generalization capabilities. Furthermore, to meet the growing demand for high-quality datasets in image forensics, we present the HTSI12K dataset, which includes 12,000 spliced images with imperceptible tampering traces and diverse categories, rendering it suitable for real-world auxiliary model training.
Collapse
Affiliation(s)
- Qixian Hao
- Beijing Key Lab of Intelligent Telecommunication Software and Multimedia, School of Computer, Beijing University of Posts and Telecommunications, Beijing 100876, China
| | - Ruyong Ren
- Beijing Key Lab of Intelligent Telecommunication Software and Multimedia, School of Computer, Beijing University of Posts and Telecommunications, Beijing 100876, China
| | - Shaozhang Niu
- Beijing Key Lab of Intelligent Telecommunication Software and Multimedia, School of Computer, Beijing University of Posts and Telecommunications, Beijing 100876, China; Southeast Digital Economy Development Institute, Quzhou 324000, China.
| | - Kai Wang
- Beijing Key Lab of Intelligent Telecommunication Software and Multimedia, School of Computer, Beijing University of Posts and Telecommunications, Beijing 100876, China
| | - Maosen Wang
- Southeast Digital Economy Development Institute, Quzhou 324000, China
| | - Jiwei Zhang
- Beijing Key Lab of Intelligent Telecommunication Software and Multimedia, School of Computer, Beijing University of Posts and Telecommunications, Beijing 100876, China
| |
Collapse
|
14
|
Wu C, Li S, Xie T, Wang X, Zhou J. WoodenCube: An Innovative Dataset for Object Detection in Concealed Industrial Environments. SENSORS (BASEL, SWITZERLAND) 2024; 24:5903. [PMID: 39338650 PMCID: PMC11435468 DOI: 10.3390/s24185903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2024] [Revised: 09/06/2024] [Accepted: 09/10/2024] [Indexed: 09/30/2024]
Abstract
With the rapid advancement of intelligent manufacturing technologies, the operating environments of modern robotic arms are becoming increasingly complex. In addition to the diversity of objects, there is often a high degree of similarity between the foreground and the background. Although traditional RGB-based object-detection models have achieved remarkable success in many fields, they still face the challenge of effectively detecting targets with textures similar to the background. To address this issue, we introduce the WoodenCube dataset, which contains over 5000 images of 10 different types of blocks. All images are densely annotated with object-level categories, bounding boxes, and rotation angles. Additionally, a new evaluation metric, Cube-mAP, is proposed to more accurately assess the detection performance of cube-like objects. In addition, we have developed a simple, yet effective, framework for WoodenCube, termed CS-SKNet, which captures strong texture features in the scene by enlarging the network's receptive field. The experimental results indicate that our CS-SKNet achieves the best performance on the WoodenCube dataset, as evaluated by the Cube-mAP metric. We further evaluate the CS-SKNet on the challenging DOTAv1.0 dataset, with the consistent enhancement demonstrating its strong generalization capability.
Collapse
Affiliation(s)
- Chao Wu
- School of Mathematical Sciences, Zhejiang University of Technology, Hangzhou 310023, China
| | - Shilong Li
- School of Mathematical Sciences, Zhejiang University of Technology, Hangzhou 310023, China
| | - Tao Xie
- School of Mathematical Sciences, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xiangdong Wang
- School of Mathematical Sciences, Zhejiang University of Technology, Hangzhou 310023, China
| | - Jiali Zhou
- School of Mathematical Sciences, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
15
|
Tang S, Ran H, Yang S, Wang Z, Li W, Li H, Meng Z. A frequency selection network for medical image segmentation. Heliyon 2024; 10:e35698. [PMID: 39220902 PMCID: PMC11365330 DOI: 10.1016/j.heliyon.2024.e35698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Revised: 07/18/2024] [Accepted: 08/01/2024] [Indexed: 09/04/2024] Open
Abstract
Existing medical image segmentation methods may only consider feature extraction and information processing in spatial domain, or lack the design of interaction between frequency information and spatial information, or ignore the semantic gaps between shallow and deep features, and lead to inaccurate segmentation results. Therefore, in this paper, we propose a novel frequency selection segmentation network (FSSN), which achieves more accurate lesion segmentation by fusing local spatial features and global frequency information, better design of feature interactions, and suppressing low correlation frequency components for mitigating semantic gaps. Firstly, we propose a global-local feature aggregation module (GLAM) to simultaneously capture multi-scale local features in the spatial domain and exploits global frequency information in the frequency domain, and achieves complementary fusion of local details features and global frequency information. Secondly, we propose a feature filter module (FFM) to mitigate semantic gaps when we conduct cross-level features fusion, and makes FSSN discriminatively determine which frequency information should be preserved for accurate lesion segmentation. Finally, in order to make better use of local information, especially the boundary of lesion region, we employ deformable convolution (DC) to extract pertinent features in the local range, and makes our FSSN can focus on relevant image contents better. Extensive experiments on two public benchmark datasets show that compared with representative medical image segmentation methods, our FSSN can obtain more accurate lesion segmentation results in terms of both objective evaluation indicators and subjective visual effects with fewer parameters and lower computational complexity.
Collapse
Affiliation(s)
- Shu Tang
- Chongqing University of Posts and Telecommunications, No.2 Road of Chongwen, Nanan District, 400000, Chongqing,China
| | - Haiheng Ran
- Chongqing University of Posts and Telecommunications, No.2 Road of Chongwen, Nanan District, 400000, Chongqing,China
| | - Shuli Yang
- Chongqing University of Posts and Telecommunications, No.2 Road of Chongwen, Nanan District, 400000, Chongqing,China
| | - Zhaoxia Wang
- Chongqing Emergency Medical Center, Chongqing University Central Hospital, School of Medicine, Chongqing University, Chongqing, China
| | - Wei Li
- Children’s Hospital of Chongqing Medical University, China
| | - Haorong Li
- Chongqing University of Posts and Telecommunications, No.2 Road of Chongwen, Nanan District, 400000, Chongqing,China
| | - Zihao Meng
- Chongqing University of Posts and Telecommunications, No.2 Road of Chongwen, Nanan District, 400000, Chongqing,China
| |
Collapse
|
16
|
Lang L, Chen XQ, Zhou Q. Enhancing tunnel crack detection with linear seam using mixed stride convolution and attention mechanism. Sci Rep 2024; 14:14997. [PMID: 38951575 PMCID: PMC11217461 DOI: 10.1038/s41598-024-65909-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Accepted: 06/25/2024] [Indexed: 07/03/2024] Open
Abstract
Cracks in tunnel lining structures constitute a common and serious problem that jeopardizes the safety of traffic and the durability of the tunnel. The similarity between lining seams and cracks in terms of strength and morphological characteristics renders the detection of cracks in tunnel lining structures challenging. To address this issue, a new deep learning-based method for crack detection in tunnel lining structures is proposed. First, an improved attention mechanism is introduced for the morphological features of lining seams, which not only aggregates global spatial information but also features along two dimensions, height and width, to mine more long-distance feature information. Furthermore, a mixed strip convolution module leveraging four different directions of strip convolution is proposed. This module captures remote contextual information from various angles to avoid interference from background pixels. To evaluate the proposed approach, the two modules are integrated into a U-shaped network, and experiments are conducted on Tunnel200, a tunnel lining crack dataset, as well as the publicly available crack datasets Crack500 and DeepCrack. The results show that the approach outperforms existing methods and achieves superior performance on these datasets.
Collapse
Affiliation(s)
- Lang Lang
- School of Intelligent Manufacturing, Chongqing Three Gorges Vocational College, Chongqing, 404155, China
| | - Xiao-Qin Chen
- School of Intelligent Manufacturing, Chongqing Three Gorges Vocational College, Chongqing, 404155, China
| | - Qiang Zhou
- School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| |
Collapse
|
17
|
Tran XT, Do T, Pal NR, Jung TP, Lin CT. Multimodal fusion for anticipating human decision performance. Sci Rep 2024; 14:13217. [PMID: 38851836 PMCID: PMC11162455 DOI: 10.1038/s41598-024-63651-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 05/30/2024] [Indexed: 06/10/2024] Open
Abstract
Anticipating human decisions while performing complex tasks remains a formidable challenge. This study proposes a multimodal machine-learning approach that leverages image features and electroencephalography (EEG) data to predict human response correctness in a demanding visual searching task. Notably, we extract a novel set of image features pertaining to object relationships using the Segment Anything Model (SAM), which enhances prediction accuracy compared to traditional features. Additionally, our approach effectively utilizes a combination of EEG signals and image features to streamline the feature set required for the Random Forest Classifier (RFC) while maintaining high accuracy. The findings of this research hold substantial potential for developing advanced fault alert systems, particularly in critical decision-making environments such as the medical and defence sectors.
Collapse
Affiliation(s)
- Xuan-The Tran
- GrapheneX-UTS HAI Centre, Australian AI Institute, Faculty of Engineering and Information Technology (FEIT), University of Technology Sydney (UTS), Sydney, NSW, 2007, Australia
| | - Thomas Do
- GrapheneX-UTS HAI Centre, Australian AI Institute, Faculty of Engineering and Information Technology (FEIT), University of Technology Sydney (UTS), Sydney, NSW, 2007, Australia
| | - Nikhil R Pal
- Electronics and Communication Sciences Unit, Indian Statistical Institute, Calcutta, West Bengal, 700108, India
| | - Tzyy-Ping Jung
- Institute for Neural Computation and Institute of Engineering in Medicine, University of California, San Diego (UCSD), La Jolla, CA, 92093, USA
| | - Chin-Teng Lin
- GrapheneX-UTS HAI Centre, Australian AI Institute, Faculty of Engineering and Information Technology (FEIT), University of Technology Sydney (UTS), Sydney, NSW, 2007, Australia.
| |
Collapse
|
18
|
Wei T, Wang Y, Zhang Y, Wang Y, Zhao L. Boundary-Sensitive Segmentation of Small Liver Lesions. IEEE J Biomed Health Inform 2024; 28:2991-3002. [PMID: 38466585 DOI: 10.1109/jbhi.2024.3375609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/13/2024]
Abstract
Early diagnosis plays a pivotal role in handling the global health challenge posed by liver diseases. However, early-stage lesions are typically quite small, presenting significant difficulties due to insufficient regions for developing effective features, indistinguishable boundaries of small lesions, and a lack of tiny liver lesion masks. To address these issues, we approach the solution in two-fold: an efficient model and a high-quality dataset. The model is built upon the advantages of path signature and camouflaged object detection. The path signature narrows down the ambiguous boundaries between lesions and other tissues while the camouflaged object detection achieves high accuracy in detecting inconspicuous lesions. The two are seamlessly integrated to ensure high accuracy and fidelity. For the dataset, we collect more than ten thousand liver images with over four thousand lesions, approximately half of which are small. Experiments on both an established dataset and our newly constructed one show that the proposed model outperforms state-of-the-art semantic segmentation and camouflaged object detection models, particularly in detecting small lesions. Moreover, the decisive and faithful salience maps generated by the model at the boundary regions demonstrate its strong robustness.
Collapse
|
19
|
Yu H, Weng L, Wu S, He J, Yuan Y, Wang J, Xu X, Feng X. Time-Series Field Phenotyping of Soybean Growth Analysis by Combining Multimodal Deep Learning and Dynamic Modeling. PLANT PHENOMICS (WASHINGTON, D.C.) 2024; 6:0158. [PMID: 38524738 PMCID: PMC10959008 DOI: 10.34133/plantphenomics.0158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/22/2023] [Accepted: 02/21/2024] [Indexed: 03/26/2024]
Abstract
The rate of soybean canopy establishment largely determines photoperiodic sensitivity, subsequently influencing yield potential. However, assessing the rate of soybean canopy development in large-scale field breeding trials is both laborious and time-consuming. High-throughput phenotyping methods based on unmanned aerial vehicle (UAV) systems can be used to monitor and quantitatively describe the development of soybean canopies for different genotypes. In this study, high-resolution and time-series raw data from field soybean populations were collected using UAVs. The RGB (red, green, and blue) and infrared images are used as inputs to construct the multimodal image segmentation model-the RGB & Infrared Feature Fusion Segmentation Network (RIFSeg-Net). Subsequently, the segment anything model was employed to extract complete individual leaves from the segmentation results obtained from RIFSeg-Net. These leaf aspect ratios facilitated the accurate categorization of soybean populations into 2 distinct varieties: oval leaf type variety and lanceolate leaf type variety. Finally, dynamic modeling was conducted to identify 5 phenotypic traits associated with the canopy development rate that differed substantially among the classified soybean varieties. The results showed that the developed multimodal image segmentation model RIFSeg-Net for extracting soybean canopy cover from UAV images outperformed traditional deep learning image segmentation networks (precision = 0.94, recall = 0.93, F1-score = 0.93). The proposed method has high practical value in the field of germplasm resource identification. This approach could lead to the use of a practical tool for further genotypic differentiation analysis and the selection of target genes.
Collapse
Affiliation(s)
- Hui Yu
- Key Laboratory of Soybean Molecular Design Breeding, State Key Laboratory of Black Soils Conservation and Utilization, Northeast Institute of Geography and Agroecology,
Chinese Academy of Sciences, Changchun 130102, China
- Zhejiang Lab, Hangzhou 310012, China
| | - Lin Weng
- Zhejiang Lab, Hangzhou 310012, China
| | | | | | | | - Jun Wang
- Zhejiang Lab, Hangzhou 310012, China
| | | | - Xianzhong Feng
- Key Laboratory of Soybean Molecular Design Breeding, State Key Laboratory of Black Soils Conservation and Utilization, Northeast Institute of Geography and Agroecology,
Chinese Academy of Sciences, Changchun 130102, China
- Zhejiang Lab, Hangzhou 310012, China
| |
Collapse
|
20
|
Zhang Z, Wang T, Wang J, Sun Y. Features Split and Aggregation Network for Camouflaged Object Detection. J Imaging 2024; 10:24. [PMID: 38249009 PMCID: PMC11154448 DOI: 10.3390/jimaging10010024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Revised: 01/08/2024] [Accepted: 01/15/2024] [Indexed: 01/23/2024] Open
Abstract
Higher standards have been proposed for detection systems since camouflaged objects are not distinct enough, making it possible to ignore the difference between their background and foreground. In this paper, we present a new framework for Camouflaged Object Detection (COD) named FSANet, which consists mainly of three operations: spatial detail mining (SDM), cross-scale feature combination (CFC), and hierarchical feature aggregation decoder (HFAD). The framework simulates the three-stage detection process of the human visual mechanism when observing a camouflaged scene. Specifically, we have extracted five feature layers using the backbone and divided them into two parts with the second layer as the boundary. The SDM module simulates the human cursory inspection of the camouflaged objects to gather spatial details (such as edge, texture, etc.) and fuses the features to create a cursory impression. The CFC module is used to observe high-level features from various viewing angles and extracts the same features by thoroughly filtering features of various levels. We also design side-join multiplication in the CFC module to avoid detail distortion and use feature element-wise multiplication to filter out noise. Finally, we construct an HFAD module to deeply mine effective features from these two stages, direct the fusion of low-level features using high-level semantic knowledge, and improve the camouflage map using hierarchical cascade technology. Compared to the nineteen deep-learning-based methods in terms of seven widely used metrics, our proposed framework has clear advantages on four public COD datasets, demonstrating the effectiveness and superiority of our model.
Collapse
Affiliation(s)
- Zejin Zhang
- HDU-ITMO Joint Institute, Hangzhou Dianzi University, Hangzhou 310018, China; (Z.Z.); (T.W.); (Y.S.)
| | - Tao Wang
- HDU-ITMO Joint Institute, Hangzhou Dianzi University, Hangzhou 310018, China; (Z.Z.); (T.W.); (Y.S.)
| | - Jian Wang
- HDU-ITMO Joint Institute, Hangzhou Dianzi University, Hangzhou 310018, China; (Z.Z.); (T.W.); (Y.S.)
- School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Yao Sun
- HDU-ITMO Joint Institute, Hangzhou Dianzi University, Hangzhou 310018, China; (Z.Z.); (T.W.); (Y.S.)
- School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China
| |
Collapse
|
21
|
Li H, Feng CM, Xu Y, Zhou T, Yao L, Chang X. Zero-Shot Camouflaged Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:5126-5137. [PMID: 37643103 DOI: 10.1109/tip.2023.3308295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
The goal of Camouflaged object detection (COD) is to detect objects that are visually embedded in their surroundings. Existing COD methods only focus on detecting camouflaged objects from seen classes, while they suffer from performance degradation to detect unseen classes. However, in a real-world scenario, collecting sufficient data for seen classes is extremely difficult and labeling them requires high professional skills, thereby making these COD methods not applicable. In this paper, we propose a new zero-shot COD framework (termed as ZSCOD), which can effectively detect the never unseen classes. Specifically, our framework includes a Dynamic Graph Searching Network (DGSNet) and a Camouflaged Visual Reasoning Generator (CVRG). In details, DGSNet is proposed to adaptively capture more edge details for boosting the COD performance. CVRG is utilized to produce pseudo-features that are closer to the real features of the seen camouflaged objects, which can transfer knowledge from seen classes to unseen classes to help detect unseen objects. Besides, our graph reasoning is built on a dynamic searching strategy, which can pay more attention to the boundaries of objects for reducing the influences of background. More importantly, we construct the first zero-shot COD benchmark based on the COD10K dataset. Experimental results on public datasets show that our ZSCOD not only detects the camouflaged object of unseen classes but also achieves state-of-the-art performance in detecting seen classes.
Collapse
|
22
|
Tran XT, Do TTT, Lin CT. Early Detection of Human Decision-Making in Concealed Object Visual Searching Tasks: An EEG-BiLSTM Study. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2023; 2023:1-4. [PMID: 38082585 DOI: 10.1109/embc40787.2023.10340547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]
Abstract
Detecting concealed objects presents a significant challenge for human and artificial intelligent systems. Detecting concealed objects task necessitates a high level of human attention and cognitive effort to complete the task successfully. Thus, in this study, we use concealed objects as stimuli for our decision-making experimental paradigms to quantify participants' decision-making performance. We applied a deep learning model, Bi-directional Long Short Term Memory (BiLSTM), to predict the participant's decision accuracy by using their electroencephalogram (EEG) signals as input. The classifier model demonstrated high accuracy, reaching 96.1% with an epoching time range of 500 ms following the stimulus event onset. The results revealed that the parietal-occipital brain region provides highly informative information for the classifier in the concealed visual searching tasks. Furthermore, the neural mechanism underlying the concealed visual-searching and decision-making process was explained by analyzing serial EEG components. The findings of this study could contribute to the development of a fault alert system, which has the potential to improve human decision-making performance.
Collapse
|
23
|
Liu K, Qiu T, Yu Y, Li S, Li X. Edge-Guided Camouflaged Object Detection via Multi-Level Feature Integration. SENSORS (BASEL, SWITZERLAND) 2023; 23:5789. [PMID: 37447638 DOI: 10.3390/s23135789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 05/30/2023] [Accepted: 06/15/2023] [Indexed: 07/15/2023]
Abstract
Camouflaged object detection (COD) aims to segment those camouflaged objects that blend perfectly into their surroundings. Due to the low boundary contrast between camouflaged objects and their surroundings, their detection poses a significant challenge. Despite the numerous excellent camouflaged object detection methods developed in recent years, issues such as boundary refinement and multi-level feature extraction and fusion still need further exploration. In this paper, we propose a novel multi-level feature integration network (MFNet) for camouflaged object detection. Firstly, we design an edge guidance module (EGM) to improve the COD performance by providing additional boundary semantic information by combining high-level semantic information and low-level spatial details to model the edges of camouflaged objects. Additionally, we propose a multi-level feature integration module (MFIM), which leverages the fine local information of low-level features and the rich global information of high-level features in adjacent three-level features to provide a supplementary feature representation for the current-level features, effectively integrating the full context semantic information. Finally, we propose a context aggregation refinement module (CARM) to efficiently aggregate and refine the cross-level features to obtain clear prediction maps. Our extensive experiments on three benchmark datasets show that the MFNet model is an effective COD model and outperforms other state-of-the-art models in all four evaluation metrics (Sα, Eϕ, Fβw, and MAE).
Collapse
Affiliation(s)
- Kangwei Liu
- Key Laboratory of Signal Detection and Processing, Department of Information Science and Engineering, Xinjiang University, Urumqi 830017, China
| | - Tianchi Qiu
- Key Laboratory of Signal Detection and Processing, Department of Information Science and Engineering, Xinjiang University, Urumqi 830017, China
| | - Yinfeng Yu
- Key Laboratory of Signal Detection and Processing, Department of Information Science and Engineering, Xinjiang University, Urumqi 830017, China
| | - Songlin Li
- Key Laboratory of Signal Detection and Processing, Department of Information Science and Engineering, Xinjiang University, Urumqi 830017, China
| | - Xiuhong Li
- Key Laboratory of Signal Detection and Processing, Department of Information Science and Engineering, Xinjiang University, Urumqi 830017, China
| |
Collapse
|
24
|
Song Z, Kang X, Wei X, Liu H, Dian R, Li S. FSNet: Focus Scanning Network for Camouflaged Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:2267-2278. [PMID: 37067971 DOI: 10.1109/tip.2023.3266659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Camouflaged object detection (COD) aims to discover objects that blend in with the background due to similar colors or textures, etc. Existing deep learning methods do not systematically illustrate the key tasks in COD, which seriously hinders the improvement of its performance. In this paper, we introduce the concept of focus areas that represent some regions containing discernable colors or textures, and develop a two-stage focus scanning network for camouflaged object detection. Specifically, a novel encoder-decoder module is first designed to determine a region where the focus areas may appear. In this process, a multi-layer Swin transformer is deployed to encode global context information between the object and the background, and a novel cross-connection decoder is proposed to fuse cross-layer textures or semantics. Then, we utilize the multi-scale dilated convolution to obtain discriminative features with different scales in focus areas. Meanwhile, the dynamic difficulty aware loss is designed to guide the network paying more attention to structural details. Extensive experimental results on the benchmarks, including CAMO, CHAMELEON, COD10K, and NC4K, illustrate that the proposed method performs favorably against other state-of-the-art methods.
Collapse
|
25
|
Xiao J, Chen T, Hu X, Zhang G, Wang S. Boundary-guided context-aware network for camouflaged object detection. Neural Comput Appl 2023. [DOI: 10.1007/s00521-023-08502-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/08/2023]
|
26
|
Zhai Q, Li X, Yang F, Jiao Z, Luo P, Cheng H, Liu Z. MGL: Mutual Graph Learning for Camouflaged Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:1897-1910. [PMID: 36417725 DOI: 10.1109/tip.2022.3223216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Camouflaged object detection, which aims to detect/segment the object(s) that blend in with their surrounding, remains challenging for deep models due to the intrinsic similarities between foreground objects and background surroundings. Ideally, an effective model should be capable of finding valuable clues from the given scene and integrating them into a joint learning framework to co-enhance the representation. Inspired by this observation, we propose a novel Mutual Graph Learning (MGL) model by shifting the conventional perspective of mutual learning from regular grids to graph domain. Specifically, an image is decoupled by MGL into two task-specific feature maps - one for finding the rough location of the target and the other for capturing its accurate boundary details. Then, the mutual benefits can be fully exploited by reasoning their high-order relations through graphs recurrently. It should be noted that our method is different from most mutual learning models that model all between-task interactions with the use of a shared function. To increase information interactions, MGL is built with typed functions for dealing with different complementary relations. To overcome the accuracy loss caused by interpolation to higher resolution and the computational redundancy resulting from recurrent learning, the S-MGL is equipped with a multi-source attention contextual recovery module, called R-MGL_v2, which uses the pixel feature information iteratively. Experiments on challenging datasets, including CHAMELEON, CAMO, COD10K, and NC4K demonstrate the effectiveness of our MGL with superior performance to existing state-of-the-art methods. The code can be found at https://github.com/fanyang587/MGL.
Collapse
|
27
|
Fan DP, Zhang J, Xu G, Cheng MM, Shao L. Salient Objects in Clutter. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:2344-2366. [PMID: 35404809 DOI: 10.1109/tpami.2022.3166451] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In this paper, we identify and address a serious design bias of existing salient object detection (SOD) datasets, which unrealistically assume that each image should contain at least one clear and uncluttered salient object. This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets. However, these models are still far from satisfactory when applied to real-world scenes. Based on our analyses, we propose a new high-quality dataset and update the previous saliency benchmark. Specifically, our dataset, called Salient Objects in Clutter (SOC), includes images with both salient and non-salient objects from several common object categories. In addition to object category annotations, each salient image is accompanied by attributes that reflect common challenges in common scenes, which can help provide deeper insight into the SOD problem. Further, with a given saliency encoder, e.g., the backbone network, existing saliency models are designed to achieve mapping from the training image set to the training ground-truth set. We therefore argue that improving the dataset can yield higher performance gains than focusing only on the decoder design. With this in mind, we investigate several dataset-enhancement strategies, including label smoothing to implicitly emphasize salient boundaries, random image augmentation to adapt saliency models to various scenarios, and self-supervised learning as a regularization strategy to learn from small datasets. Our extensive results demonstrate the effectiveness of these tricks. We also provide a comprehensive benchmark for SOD, which can be found in our repository: https://github.com/DengPingFan/SODBenchmark.
Collapse
|
28
|
Ji GP, Fan DP, Chou YC, Dai D, Liniger A, Van Gool L. Deep Gradient Learning for Efficient Camouflaged Object Detection. MACHINE INTELLIGENCE RESEARCH 2023. [PMCID: PMC9831373 DOI: 10.1007/s11633-022-1365-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
AbstractThis paper introduces deep gradient network (DGNet), a novel deep framework that exploits object gradient supervision for camouflaged object detection (COD). It decouples the task into two connected branches, i.e., a context and a texture encoder. The essential connection is the gradient-induced transition, representing a soft grouping between context and texture features. Benefiting from the simple but efficient framework, DGNet outperforms existing state-of-the-art COD models by a large margin. Notably, our efficient version, DGNet-S, runs in real-time (80 fps) and achieves comparable results to the cutting-edge model JCSOD-CVPR21 with only 6.82% parameters. The application results also show that the proposed DGNet performs well in the polyp segmentation, defect detection, and transparent object segmentation tasks. The code will be made available at https://github.com/GewelsJI/DGNet.
Collapse
|
29
|
Pang Y, Zhao X, Zhang L, Lu H. CAVER: Cross-Modal View-Mixed Transformer for Bi-Modal Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:892-904. [PMID: 37018701 DOI: 10.1109/tip.2023.3234702] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Most of the existing bi-modal (RGB-D and RGB-T) salient object detection methods utilize the convolution operation and construct complex interweave fusion structures to achieve cross-modal information integration. The inherent local connectivity of the convolution operation constrains the performance of the convolution-based methods to a ceiling. In this work, we rethink these tasks from the perspective of global information alignment and transformation. Specifically, the proposed cross-modal view-mixed transformer (CAVER) cascades several cross-modal integration units to construct a top-down transformer-based information propagation path. CAVER treats the multi-scale and multi-modal feature integration as a sequence-to-sequence context propagation and update process built on a novel view-mixed attention mechanism. Besides, considering the quadratic complexity w.r.t. the number of input tokens, we design a parameter-free patch-wise token re-embedding strategy to simplify operations. Extensive experimental results on RGB-D and RGB-T SOD datasets demonstrate that such a simple two-stream encoder-decoder framework can surpass recent state-of-the-art methods when it is equipped with the proposed components.
Collapse
|
30
|
Jiang X, Cai W, Zhang Z, Jiang B, Yang Z, Wang X. MAGNet: A Camouflaged Object Detection Network Simulating the Observation Effect of a Magnifier. ENTROPY (BASEL, SWITZERLAND) 2022; 24:1804. [PMID: 36554209 PMCID: PMC9778132 DOI: 10.3390/e24121804] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 12/01/2022] [Accepted: 12/06/2022] [Indexed: 06/17/2023]
Abstract
In recent years, protecting important objects by simulating animal camouflage has been widely employed in many fields. Therefore, camouflaged object detection (COD) technology has emerged. COD is more difficult to achieve than traditional object detection techniques due to the high degree of fusion of objects camouflaged with the background. In this paper, we strive to more accurately and efficiently identify camouflaged objects. Inspired by the use of magnifiers to search for hidden objects in pictures, we propose a COD network that simulates the observation effect of a magnifier called the MAGnifier Network (MAGNet). Specifically, our MAGNet contains two parallel modules: the ergodic magnification module (EMM) and the attention focus module (AFM). The EMM is designed to mimic the process of a magnifier enlarging an image, and AFM is used to simulate the observation process in which human attention is highly focused on a particular region. The two sets of output camouflaged object maps were merged to simulate the observation of an object by a magnifier. In addition, a weighted key point area perception loss function, which is more applicable to COD, was designed based on two modules to give greater attention to the camouflaged object. Extensive experiments demonstrate that compared with 19 cutting-edge detection models, MAGNet can achieve the best comprehensive effect on eight evaluation metrics in the public COD dataset. Additionally, compared to other COD methods, MAGNet has lower computational complexity and faster segmentation. We also validated the model's generalization ability on a military camouflaged object dataset constructed in-house. Finally, we experimentally explored some extended applications of COD.
Collapse
Affiliation(s)
| | - Wei Cai
- Xi’an Research Institute of High Technology, Xi’an 710064, China
| | | | | | | | | |
Collapse
|
31
|
ERINet: efficient and robust identification network for image copy-move forgery detection and localization. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04104-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
|
32
|
Fu B, Cao T, Zheng Y, Fang Z, Chen L, Wang Y, Wang Y, Wang Y. Polarization-driven camouflaged object segmentation via gated fusion. APPLIED OPTICS 2022; 61:8017-8027. [PMID: 36255923 DOI: 10.1364/ao.466339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Accepted: 08/21/2022] [Indexed: 06/16/2023]
Abstract
Recently, polarization-based models for camouflaged object segmentation have attracted research attention. However, to construct this camouflaged object segmentation model, the main challenge is to effectively fuse polarization and light intensity features. Therefore, we propose a multi-modal camouflaged object segmentation method via gated fusion. First, the spatial positioning module is designed to perform channel calibration and global spatial attention alignment between polarization mode and light intensity mode from high-level feature representation to locate object positioning accurately. Then, the gated fusion module (GFM) is designed to selectively fuse the object information contained in the polarization and light intensity features. Among them, semantic information of location features is introduced in the GFM to guide each mode to aggregate dominant features. Finally, the features of each layer are aggregated to obtain an accurate segmentation result map. At the same time, considering the lack of public evaluation and training data on light intensity-polarization (I-P) camouflaged detection, we build the light I-P camouflaged detection dataset. Experimental results demonstrate that our proposed method outperforms other typical multi-modal segmentation methods in this dataset.
Collapse
|
33
|
Yue G, Han W, Li S, Zhou T, Lv J, Wang T. Automated polyp segmentation in colonoscopy images via deep network with lesion-aware feature selection and refinement. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2022.103846] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
34
|
Spatiotemporal context-aware network for video salient object detection. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07330-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
35
|
Selecting Post-Processing Schemes for Accurate Detection of Small Objects in Low-Resolution Wide-Area Aerial Imagery. REMOTE SENSING 2022. [DOI: 10.3390/rs14020255] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In low-resolution wide-area aerial imagery, object detection algorithms are categorized as feature extraction and machine learning approaches, where the former often requires a post-processing scheme to reduce false detections and the latter demands multi-stage learning followed by post-processing. In this paper, we present an approach on how to select post-processing schemes for aerial object detection. We evaluated combinations of each of ten vehicle detection algorithms with any of seven post-processing schemes, where the best three schemes for each algorithm were determined using average F-score metric. The performance improvement is quantified using basic information retrieval metrics as well as the classification of events, activities and relationships (CLEAR) metrics. We also implemented a two-stage learning algorithm using a hundred-layer densely connected convolutional neural network for small object detection and evaluated its degree of improvement when combined with the various post-processing schemes. The highest average F-scores after post-processing are 0.902, 0.704 and 0.891 for the Tucson, Phoenix and online VEDAI datasets, respectively. The combined results prove that our enhanced three-stage post-processing scheme achieves a mean average precision (mAP) of 63.9% for feature extraction methods and 82.8% for the machine learning approach.
Collapse
|
36
|
AGFNet: Attention Guided Fusion Network for Camouflaged Object Detection. ARTIF INTELL 2022. [DOI: 10.1007/978-3-031-20497-5_39] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|