1
|
Wu Z, Liu C, Wen J, Xu Y, Yang J, Li X. Spatial Continuity and Nonequal Importance in Salient Object Detection With Image-Category Supervision. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:8565-8576. [PMID: 39231056 DOI: 10.1109/tnnls.2024.3436519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/06/2024]
Abstract
Due to the inefficiency of pixel-level annotations, weakly supervised salient object detection with image-category labels (WSSOD) has been receiving increasing attention. Previous works usually endeavor to generate high-quality pseudolabels to train the detectors in a fully supervised manner. However, we find that the detection performance is often limited by two types of noise contained in pseudolabels: 1) holes inside the object or at the edge and outliers in the background and 2) missing object portions and redundant surrounding regions. To mitigate the adverse effects caused by them, we propose local pixel correction (LPC) and key pixel attention (KPA), respectively, based on two key properties of desirable pseudolabels: 1) spatial continuity, meaning an object region consists of a cluster of adjacent points; and 2) nonequal importance, meaning pixels have different importance for training. Specifically, LPC fills holes and filters out outliers based on summary statistics of the neighborhood as well as its size. KPA directs the focus of training toward ambiguous pixels in multiple pseudolabels to discover more accurate saliency cues. To evaluate the effectiveness of our method, we design a simple yet strong baseline we call weakly supervised saliency detector with Transformer (WSSDT) and unify the proposed modules into WSSDT. Extensive experiments on five datasets demonstrate that our method significantly improves the baseline and outperforms all existing congeneric methods. Moreover, we establish the first benchmark to evaluate WSSOD robustness. The results show that our method can improve detection robustness as well. The code and robustness benchmark are available at https://github.com/Horatio9702/SCNI.
Collapse
|
2
|
Zhuge Y, Gu H, Zhang L, Qi J, Lu H. Learning Motion and Temporal Cues for Unsupervised Video Object Segmentation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:9084-9097. [PMID: 38976474 DOI: 10.1109/tnnls.2024.3418980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
In this article, we address the challenges in unsupervised video object segmentation (UVOS) by proposing an efficient algorithm, termed MTNet, which concurrently exploits motion and temporal cues. Unlike previous methods that focus solely on integrating appearance with motion or on modeling temporal relations, our method combines both aspects by integrating them within a unified framework. MTNet is devised by effectively merging appearance and motion features during the feature extraction process within encoders, promoting a more complementary representation. To capture the intricate long-range contextual dynamics and information embedded within videos, a temporal transformer module is introduced, facilitating efficacious interframe interactions throughout a video clip. Furthermore, we employ a cascade of decoders all feature levels across all feature levels to optimally exploit the derived features, aiming to generate increasingly precise segmentation masks. As a result, MTNet provides a strong and compact framework that explores both temporal and cross-modality knowledge to robustly localize and track the primary object accurately in various challenging scenarios efficiently. Extensive experiments across diverse benchmarks conclusively show that our method not only attains state-of-the-art performance in UVOS but also delivers competitive results in video salient object detection (VSOD). These findings highlight the method's robust versatility and its adeptness in adapting to a range of segmentation tasks. The source code is available at https://github.com/hy0523/MTNet.
Collapse
|
3
|
Mi K, Lin Z. Chemical risk assessment in food animals via physiologically based pharmacokinetic modeling - Part II: Environmental pollutants on animal and human health assessments. ENVIRONMENT INTERNATIONAL 2025; 198:109372. [PMID: 40106874 DOI: 10.1016/j.envint.2025.109372] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2024] [Revised: 02/10/2025] [Accepted: 03/10/2025] [Indexed: 03/22/2025]
Abstract
Human activities generate a large amount of environmental pollutants, including drugs and agricultural and industrial chemicals that are released into the air, water, and soil. Environmental pollutants can enter food animals through contaminated feed and water, posing risks to human health via the food chain. Physiologically based pharmacokinetic (PBPK) modeling is used to predict the target organ dosimetry informing human health risk assessment. However, there is a lack of critical reviews concerning PBPK models for environmental pollutants in food animals in the last several years (2020-2024). This review is part of a series of reviews focusing on applications of PBPK models for drugs and environmental chemicals in food animals to inform human health and food safety assessments. Part I is focused on veterinary drugs. The present article is Part II and focuses on environmental chemicals, including pesticides, polychlorinated biphenyls (PCBs), bisphenols, and per- and polyfluoroalkyl substances (PFAS). This article discusses the existing challenges in developing PBPK models for environmental pollutants and shares our perspectives on future directions, including the combinations of in vitro to in vivo extrapolation (IVIVE), machine learning and artificial intelligence, read-across approaches, and quantitative pharmacodynamic modeling to enhance the potential applications of PBPK models in assessing human health and food safety.
Collapse
Affiliation(s)
- Kun Mi
- Department of Environmental and Global Health, College of Public Health and Health Professions, University of Florida, Gainesville, FL 32611, USA; Center for Environmental and Human Toxicology, University of Florida, Gainesville, FL 32611, USA.
| | - Zhoumeng Lin
- Department of Environmental and Global Health, College of Public Health and Health Professions, University of Florida, Gainesville, FL 32611, USA; Center for Environmental and Human Toxicology, University of Florida, Gainesville, FL 32611, USA.
| |
Collapse
|
4
|
Cao Y, Xu X, Cheng Y, Sun C, Du Z, Gao L, Shen W. Personalizing Vision-Language Models With Hybrid Prompts for Zero-Shot Anomaly Detection. IEEE TRANSACTIONS ON CYBERNETICS 2025; 55:1917-1929. [PMID: 40031813 DOI: 10.1109/tcyb.2025.3536165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Zero-shot anomaly detection (ZSAD) aims to develop a foundational model capable of detecting anomalies across arbitrary categories without relying on reference images. However, since "abnormality" is inherently defined in relation to "normality" within specific categories, detecting anomalies without reference images describing the corresponding normal context remains a significant challenge. As an alternative to reference images, this study explores the use of widely available product standards to characterize normal contexts and potential abnormal states. Specifically, this study introduces AnomalyVLM, which leverages generalized pretrained vision-language models (VLMs) to interpret these standards and detect anomalies. Given the current limitations of VLMs in comprehending complex textual information, AnomalyVLM generates hybrid prompts-comprising prompts for abnormal regions, symbolic rules, and region numbers-from the standards to facilitate more effective understanding. These hybrid prompts are incorporated into various stages of the anomaly detection process within the selected VLMs, including an anomaly region generator and an anomaly region refiner. By utilizing hybrid prompts, VLMs are personalized as anomaly detectors for specific categories, offering users flexibility and control in detecting anomalies across novel categories without the need for training data. Experimental results on four public industrial anomaly detection datasets, as well as a practical automotive part inspection task, highlight the superior performance and enhanced generalization capability of AnomalyVLM, especially in texture categories. An online demo of AnomalyVLM is available at https://github.com/caoyunkang/Segment-Any-Anomaly.
Collapse
|
5
|
Hong L, Wang X, Zhang G, Zhao M. USOD10K: A New Benchmark Dataset for Underwater Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2025; 34:1602-1615. [PMID: 37058379 DOI: 10.1109/tip.2023.3266163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Underwater salient object detection (USOD) is an emerging research area that has great potential for various underwater visual tasks. However, USOD research is still in its early stage due to the lack of large-scale datasets within which salient objects are well-defined and pixel-wise annotated. To address this issue, this paper introduces a new dataset named USOD10K. It contains 10,255 underwater images, covering 70 categories of salient objects in 12 different underwater scenes. Moreover, the USOD10K provides salient object boundaries and depth maps of all images. The USOD10K is the first large-scale dataset in the USOD community, making a significant leap in diversity, complexity, and scalability. Secondly, a simple but strong baseline termed TC-USOD is proposed for the USOD10K. The TC-USOD adopts a hybrid architecture based on an encoder-decoder design that leverages transformer and convolution as the basic computational building block of the encoder and decoder, respectively. Thirdly, we make a comprehensive summarization of 35 state-of-the-art SOD/USOD methods and benchmark them on the existing USOD dataset and the USOD10K. The results show that our TC-USOD achieves superior performance on all datasets tested. Finally, several other use cases of the USOD10K are discussed, and future directions of USOD research are pointed out. This work will promote the development of the USOD research and facilitate further research on underwater visual tasks and visually-guided underwater robots. To pave the road in the USOD research field, the dataset, code, and benchmark results are publicly available: https://github.com/Underwater-Robotic-Lab/USOD10K.
Collapse
|
6
|
Su Z, Zhang J, Liu T, Liu Z, Zhang S, Pietikainen M, Liu L. Boosting Convolutional Neural Networks With Middle Spectrum Grouped Convolution. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:3436-3449. [PMID: 38329861 DOI: 10.1109/tnnls.2024.3355489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/10/2024]
Abstract
This article proposes a novel module called middle spectrum grouped convolution (MSGC) for efficient deep convolutional neural networks (DCNNs) with the mechanism of grouped convolution. It explores the broad "middle spectrum" area between channel pruning and conventional grouped convolution. Compared with channel pruning, MSGC can retain most of the information from the input feature maps due to the group mechanism; compared with grouped convolution, MSGC benefits from the learnability, the core of channel pruning, for constructing its group topology, leading to better channel division. The middle spectrum area is unfolded along four dimensions: groupwise, layerwise, samplewise, and attentionwise, making it possible to reveal more powerful and interpretable structures. As a result, the proposed module acts as a booster that can reduce the computational cost of the host backbones for general image recognition with even improved predictive accuracy. For example, in the experiments on the ImageNet dataset for image classification, MSGC can reduce the multiply-accumulates (MACs) of ResNet-18 and ResNet-50 by half but still increase the Top-1 accuracy by more than 1%. With a 35% reduction of MACs, MSGC can also increase the Top-1 accuracy of the MobileNetV2 backbone. Results on the MS COCO dataset for object detection show similar observations. Our code and trained models are available at https://github.com/hellozhuo/msgc.
Collapse
|
7
|
Wu Z, Wang W, Wang L, Li Y, Lv F, Xia Q, Chen C, Hao A, Li S. Pixel is All You Need: Adversarial Spatio-Temporal Ensemble Active Learning for Salient Object Detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2025; 47:858-877. [PMID: 39383082 DOI: 10.1109/tpami.2024.3476683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/11/2024]
Abstract
Although weakly-supervised techniques can reduce the labeling effort, it is unclear whether a saliency model trained with weakly-supervised data (e.g., point annotation) can achieve the equivalent performance of its fully-supervised version. This paper attempts to answer this unexplored question by proving a hypothesis: there is a point-labeled dataset where saliency models trained on it can achieve equivalent performance when trained on the densely annotated dataset. To prove this conjecture, we proposed a novel yet effective adversarial spatio-temporal ensemble active learning. Our contributions are four-fold: 1) Our proposed adversarial attack triggering uncertainty can conquer the overconfidence of existing active learning methods and accurately locate these uncertain pixels. 2) Our proposed spatio-temporal ensemble strategy not only achieves outstanding performance but significantly reduces the model's computational cost. 3) Our proposed relationship-aware diversity sampling can conquer oversampling while boosting model performance. 4) We provide theoretical proof for the existence of such a point-labeled dataset. Experimental results show that our approach can find such a point-labeled dataset, where a saliency model trained on it obtained 98%-99% performance of its fully-supervised version with only ten annotated points per image.
Collapse
|
8
|
Tang H, Li Z, Zhang D, He S, Tang J. Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T Salient Object Detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; PP:1958-1974. [PMID: 40030445 DOI: 10.1109/tpami.2024.3511621] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
RGB-Thermal Salient Object Detection (RGB-T SOD) aims to pinpoint prominent objects within aligned pairs of visible and thermal infrared images. A key challenge lies in bridging the inherent disparities between RGB and Thermal modalities for effective saliency map prediction. Traditional encoder-decoder architectures, while designed for cross-modality feature interactions, may not have adequately considered the robustness against noise originating from defective modalities, thereby leading to suboptimal performance in complex scenarios. Inspired by hierarchical human visual systems, we propose the ConTriNet, a robust Confluent Triple-Flow Network employing a "Divide-and-Conquer" strategy. This framework utilizes a unified encoder with specialized decoders, each addressing different subtasks of exploring modality-specific and modality-complementary information for RGB-T SOD, thereby enhancing the final saliency map prediction. Specifically, ConTriNet comprises three flows: two modality-specific flows explore cues from RGB and Thermal modalities, and a third modality-complementary flow integrates cues from both modalities. ConTriNet presents several notable advantages. It incorporates a Modality-induced Feature Modulator (MFM) in the modality-shared union encoder to minimize inter-modality discrepancies and mitigate the impact of defective samples. Additionally, a foundational Residual Atrous Spatial Pyramid Module (RASPM) in the separated flows enlarges the receptive field, allowing for the capture of multi-scale contextual information. Furthermore, a Modality-aware Dynamic Aggregation Module (MDAM) in the modality-complementary flow dynamically aggregates saliency-related cues from both modality-specific flows. Leveraging the proposed parallel triple-flow framework, we further refine saliency maps derived from different flows through a flow-cooperative fusion strategy, yielding a high-quality, full-resolution saliency map for the final prediction. To evaluate the robustness and stability of our approach, we collect a comprehensive RGB-T SOD benchmark, VT-IMAG, covering various real-world challenging scenarios. Extensive experiments on public benchmarks and our VT-IMAG dataset demonstrate that ConTriNet consistently outperforms state-of-the-art competitors in both common and challenging scenarios, even when dealing with incomplete modality data. The code and VT-IMAG will be available at: https://cser-tang-hao.github.io/contrinet.html.
Collapse
|
9
|
Song Z, Kang X, Wei X, Li S. Pixel-Centric Context Perception Network for Camouflaged Object Detection. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:18576-18589. [PMID: 37819817 DOI: 10.1109/tnnls.2023.3319323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/13/2023]
Abstract
Camouflaged object detection (COD) aims to identify object pixels visually embedded in the background environment. Existing deep learning methods fail to utilize the context information around different pixels adequately and efficiently. In order to solve this problem, a novel pixel-centric context perception network (PCPNet) is proposed, the core of which is to customize the personalized context of each pixel based on the automatic estimation of its surroundings. Specifically, PCPNet first employs an elegant encoder equipped with the designed vital component generation (VCG) module to obtain a set of compact features rich in low-level spatial and high-level semantic information across multiple subspaces. Then, we present a parameter-free pixel importance estimation (PIE) function based on multiwindow information fusion. Object pixels with complex backgrounds will be assigned with higher PIE values. Subsequently, PIE is utilized to regularize the optimization loss. In this way, the network can pay more attention to those pixels with higher PIE values in the decoding stage. Finally, a local continuity refinement module (LCRM) is used to refine the detection results. Extensive experiments on four COD benchmarks, five salient object detection (SOD) benchmarks, and five polyp segmentation benchmarks demonstrate the superiority of PCPNet with respect to other state-of-the-art methods.
Collapse
|
10
|
Tang Y, Li M. DMGNet: Depth mask guiding network for RGB-D salient object detection. Neural Netw 2024; 180:106751. [PMID: 39332209 DOI: 10.1016/j.neunet.2024.106751] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 07/26/2024] [Accepted: 09/19/2024] [Indexed: 09/29/2024]
Abstract
Though depth images can provide supplementary spatial structural cues for salient object detection (SOD) task, inappropriate utilization of depth features may introduce noisy or misleading features, which may greatly destroy SOD performance. To address this issue, we propose a depth mask guiding network (DMGNet) for RGB-D SOD. In this network, a depth mask guidance module (DMGM) is designed to pre-segment the salient objects from depth images and then create masks using pre-segmented objects to guide the RGB subnetwork to extract more discriminative features. Furthermore, a feature fusion pyramid module (FFPM) is employed to acquire more informative fused features using multi-branch convolutional channels with varying receptive fields, further enhancing the fusion of cross-modal features. Extensive experiments on nine benchmark datasets demonstrate the effectiveness of the proposed network.
Collapse
Affiliation(s)
- Yinggan Tang
- School of Electrical Engineering, Yanshan University, Qinhuangdao, Hebei 066004, China; Key Laboratory of Intelligent Rehabilitation and Neromodulation of Hebei Province, Yanshan University, Qinhuangdao, Hebei 066004, China; Key Laboratory of Industrial Computer Control Engineering of Hebei Province, Yanshan University, Qinhuangdao, Hebei 066004, China.
| | - Mengyao Li
- School of Electrical Engineering, Yanshan University, Qinhuangdao, Hebei 066004, China.
| |
Collapse
|
11
|
Zhang Z, Wang W. Enhancing dance education through convolutional neural networks and blended learning. PeerJ Comput Sci 2024; 10:e2342. [PMID: 39650395 PMCID: PMC11622838 DOI: 10.7717/peerj-cs.2342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Accepted: 08/29/2024] [Indexed: 12/11/2024]
Abstract
This article explores the evolving landscape of dance teaching, acknowledging the transformative impact of the internet and technology. With the emergence of online platforms, dance education is no longer confined to physical classrooms but can extend to virtual spaces, facilitating a more flexible and accessible learning experience. Blended learning, integrating traditional offline methods and online resources, offers a versatile approach that transcends geographical and temporal constraints. The article highlights the utilization of the dual-wing harmonium (DWH) multi-view metric learning (MVML) algorithm for facial emotion recognition, enhancing the assessment of students' emotional expression in dance performances. Moreover, the integration of motion capture technology with convolutional neural networks (CNNs) facilitates a precise analysis of students' dance movements, offering detailed feedback and recommendations for improvement. A holistic assessment of students' performance is attained by combining the evaluation of emotional expression with the analysis of dance movements. Experimental findings support the efficacy of this approach, demonstrating high recognition accuracy and offering valuable insights into the effectiveness of dance teaching. By embracing technological advancements, this method introduces novel ideas and methodologies for objective evaluation in dance education, paving the way for enhanced learning outcomes and pedagogical practices in the future.
Collapse
Affiliation(s)
- Zhiping Zhang
- College of Education, HanJiang Normal University, Shiyan, Hubei, China
| | - Wei Wang
- Dancing College, Sichuan Normal University, Chengdu, SiChuan, China
| |
Collapse
|
12
|
Li G, Chen Z, Mao M, Lin L, Fang C. Uncertainty-Aware Active Domain Adaptive Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:5510-5524. [PMID: 38889015 DOI: 10.1109/tip.2024.3413598] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2024]
Abstract
Due to the advancement of deep learning, the performance of salient object detection (SOD) has been significantly improved. However, deep learning-based techniques require a sizable amount of pixel-wise annotations. To relieve the burden of data annotation, a variety of deep weakly-supervised and unsupervised SOD methods have been proposed, yet the performance gap between them and fully supervised methods remains significant. In this paper, we propose a novel, cost-efficient salient object detection framework, which can adapt models from synthetic data to real-world data with the help of a limited number of actively selected annotations. Specifically, we first construct a synthetic SOD dataset by copying and pasting foreground objects into pure background images. With the masks of foreground objects taken as the ground-truth saliency maps, this dataset can be used for training the SOD model initially. However, due to the large domain gap between synthetic images and real-world images, the performance of the initially trained model on the real-world images is deficient. To transfer the model from the synthetic dataset to the real-world datasets, we further design an uncertainty-aware active domain adaptive algorithm to generate labels for the real-world target images. The prediction variances against data augmentations are utilized to calculate the superpixel-level uncertainty values. For those superpixels with relatively low uncertainty, we directly generate pseudo labels according to the network predictions. Meanwhile, we select a few superpixels with high uncertainty scores and assign labels to them manually. This labeling strategy is capable of generating high-quality labels without incurring too much annotation cost. Experimental results on six benchmark SOD datasets demonstrate that our method outperforms the existing state-of-the-art weakly-supervised and unsupervised SOD methods and is even comparable to the fully supervised ones. Code will be released at: https://github.com/czh-3/UADA.
Collapse
|
13
|
Liu N, Nan K, Zhao W, Yao X, Han J. Learning Complementary Spatial-Temporal Transformer for Video Salient Object Detection. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:10663-10673. [PMID: 37027778 DOI: 10.1109/tnnls.2023.3243246] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Besides combining appearance and motion information, another crucial factor for video salient object detection (VSOD) is to mine spatial-temporal (ST) knowledge, including complementary long-short temporal cues and global-local spatial context from neighboring frames. However, the existing methods only explored part of them and ignored their complementarity. In this article, we propose a novel complementary ST transformer (CoSTFormer) for VSOD, which has a short-global branch and a long-local branch to aggregate complementary ST contexts. The former integrates the global context from the neighboring two frames using dense pairwise attention, while the latter is designed to fuse long-term temporal information from more consecutive frames with local attention windows. In this way, we decompose the ST context into a short-global part and a long-local part and leverage the powerful transformer to model the context relationship and learn their complementarity. To solve the contradiction between local window attention and object motion, we propose a novel flow-guided window attention (FGWA) mechanism to align the attention windows with object and camera movements. Furthermore, we deploy CoSTFormer on fused appearance and motion features, thus enabling the effective combination of all three VSOD factors. Besides, we present a pseudo video generation method to synthesize sufficient video clips from static images for training ST saliency models. Extensive experiments have verified the effectiveness of our method and illustrated that we achieve new state-of-the-art results on several benchmark datasets.
Collapse
|
14
|
Su Z, Wu Y, Cao K, Du J, Cao L, Wu Z, Wu X, Wang X, Song Y, Wang X, Duan H. APEX-pHLA: A novel method for accurate prediction of the binding between exogenous short peptides and HLA class I molecules. Methods 2024; 228:38-47. [PMID: 38772499 DOI: 10.1016/j.ymeth.2024.05.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Revised: 04/28/2024] [Accepted: 05/18/2024] [Indexed: 05/23/2024] Open
Abstract
Human leukocyte antigen (HLA) molecules play critically significant role within the realm of immunotherapy due to their capacities to recognize and bind exogenous antigens such as peptides, subsequently delivering them to immune cells. Predicting the binding between peptides and HLA molecules (pHLA) can expedite the screening of immunogenic peptides and facilitate vaccine design. However, traditional experimental methods are time-consuming and inefficient. In this study, an efficient method based on deep learning was developed for predicting peptide-HLA binding, which treated peptide sequences as linguistic entities. It combined the architectures of textCNN and BiLSTM to create a deep neural network model called APEX-pHLA. This model operated without limitations related to HLA class I allele variants and peptide segment lengths, enabling efficient encoding of sequence features for both HLA and peptide segments. On the independent test set, the model achieved Accuracy, ROC_AUC, F1, and MCC is 0.9449, 0.9850, 0.9453, and 0.8899, respectively. Similarly, on an external test set, the results were 0.9803, 0.9574, 0.8835, and 0.7863, respectively. These findings outperformed fifteen methods previously reported in the literature. The accurate prediction capability of the APEX-pHLA model in peptide-HLA binding might provide valuable insights for future HLA vaccine design.
Collapse
Affiliation(s)
- Zhihao Su
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, Zhejiang 310014, China.
| | - Yejian Wu
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, Zhejiang 310014, China
| | - Kaiqiang Cao
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, Zhejiang 310014, China.
| | - Jie Du
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, Zhejiang 310014, China.
| | - Lujing Cao
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, Zhejiang 310014, China
| | - Zhipeng Wu
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, Zhejiang 310014, China
| | - Xinyi Wu
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, Zhejiang 310014, China
| | - Xinqiao Wang
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, Zhejiang 310014, China
| | - Ying Song
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, Zhejiang 310014, China.
| | - Xudong Wang
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, Zhejiang 310014, China.
| | - Hongliang Duan
- Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China.
| |
Collapse
|
15
|
Guo Z, Cai D, Zhou Y, Xu T, Yu F. Identifying rice field weeds from unmanned aerial vehicle remote sensing imagery using deep learning. PLANT METHODS 2024; 20:105. [PMID: 39014411 PMCID: PMC11253438 DOI: 10.1186/s13007-024-01232-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Accepted: 07/04/2024] [Indexed: 07/18/2024]
Abstract
BACKGROUND Rice field weed object detection can provide key information on weed species and locations for precise spraying, which is of great significance in actual agricultural production. However, facing the complex and changing real farm environments, traditional object detection methods still have difficulties in identifying small-sized, occluded and densely distributed weed instances. To address these problems, this paper proposes a multi-scale feature enhanced DETR network, named RMS-DETR. By adding multi-scale feature extraction branches on top of DETR, this model fully utilizes the information from different semantic feature layers to improve recognition capability for rice field weeds in real-world scenarios. METHODS Introducing multi-scale feature layers on the basis of the DETR model, we conduct a differentiated design for different semantic feature layers. The high-level semantic feature layer adopts Transformer structure to extract contextual information between barnyard grass and rice plants. The low-level semantic feature layer uses CNN structure to extract local detail features of barnyard grass. Introducing multi-scale feature layers inevitably leads to increased model computation, thus lowering model inference speed. Therefore, we employ a new type of Pconv (Partial convolution) to replace traditional standard convolutions in the model. RESULTS Compared to the original DETR model, our proposed RMS-DETR model achieved an average recognition accuracy improvement of 3.6% and 4.4% on our constructed rice field weeds dataset and the DOTA public dataset, respectively. The average recognition accuracies reached 0.792 and 0.851, respectively. The RMS-DETR model size is 40.8 M with inference time of 0.0081 s. Compared with three classical DETR models (Deformable DETR, Anchor DETR and DAB-DETR), the RMS-DETR model respectively improved average precision by 2.1%, 4.9% and 2.4%. DISCUSSION This model is capable of accurately identifying rice field weeds in complex real-world scenarios, thus providing key technical support for precision spraying and management of variable-rate spraying systems.
Collapse
Affiliation(s)
- Zhonghui Guo
- School of Information and Electrical Engineering, Shenyang Agricultural University, Shenyang, 110866, China
- National Digital Agriculture Regional Innovation Center (Northeast), Shenyang, 110866, China
- Key Laboratory of Smart Agriculture Technology in Liaoning Province, Shenyang, 110866, China
| | - Dongdong Cai
- School of Information and Electrical Engineering, Shenyang Agricultural University, Shenyang, 110866, China
- National Digital Agriculture Regional Innovation Center (Northeast), Shenyang, 110866, China
- Key Laboratory of Smart Agriculture Technology in Liaoning Province, Shenyang, 110866, China
| | - Yunyi Zhou
- School of Information and Electrical Engineering, Shenyang Agricultural University, Shenyang, 110866, China
- National Digital Agriculture Regional Innovation Center (Northeast), Shenyang, 110866, China
- Key Laboratory of Smart Agriculture Technology in Liaoning Province, Shenyang, 110866, China
| | - Tongyu Xu
- School of Information and Electrical Engineering, Shenyang Agricultural University, Shenyang, 110866, China
- National Digital Agriculture Regional Innovation Center (Northeast), Shenyang, 110866, China
- Key Laboratory of Smart Agriculture Technology in Liaoning Province, Shenyang, 110866, China
| | - Fenghua Yu
- School of Information and Electrical Engineering, Shenyang Agricultural University, Shenyang, 110866, China.
- National Digital Agriculture Regional Innovation Center (Northeast), Shenyang, 110866, China.
- Key Laboratory of Smart Agriculture Technology in Liaoning Province, Shenyang, 110866, China.
- Key Laboratory of Smart Agriculture in the South China Tropical Region, Ministry of Agriculture and Rural Affairs, Guangzhou, 510640, China.
| |
Collapse
|
16
|
Liu J, Zhang W, Liu Y, Zhang Q. Polyp segmentation based on implicit edge-guided cross-layer fusion networks. Sci Rep 2024; 14:11678. [PMID: 38778219 PMCID: PMC11111678 DOI: 10.1038/s41598-024-62331-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 05/15/2024] [Indexed: 05/25/2024] Open
Abstract
Polyps are abnormal tissue clumps growing primarily on the inner linings of the gastrointestinal tract. While such clumps are generally harmless, they can potentially evolve into pathological tumors, and thus require long-term observation and monitoring. Polyp segmentation in gastrointestinal endoscopy images is an important stage for polyp monitoring and subsequent treatment. However, this segmentation task faces multiple challenges: the low contrast of the polyp boundaries, the varied polyp appearance, and the co-occurrence of multiple polyps. So, in this paper, an implicit edge-guided cross-layer fusion network (IECFNet) is proposed for polyp segmentation. The codec pair is used to generate an initial saliency map, the implicit edge-enhanced context attention module aggregates the feature graph output from the encoding and decoding to generate the rough prediction, and the multi-scale feature reasoning module is used to generate final predictions. Polyp segmentation experiments have been conducted on five popular polyp image datasets (Kvasir, CVC-ClinicDB, ETIS, CVC-ColonDB, and CVC-300), and the experimental results show that the proposed method significantly outperforms a conventional method, especially with an accuracy margin of 7.9% on the ETIS dataset.
Collapse
Affiliation(s)
- Junqing Liu
- Hubei Engineering and Technology Research Center for Construction Quality Inspection Equipment, China Three Gorges University, Yichang, 443002, Hubei, People's Republic of China
- College of Computer and Information Technology, China Three Gorges University, Yichang, 443002, Hubei, People's Republic of China
| | - Weiwei Zhang
- Hubei Engineering and Technology Research Center for Construction Quality Inspection Equipment, China Three Gorges University, Yichang, 443002, Hubei, People's Republic of China.
- College of Computer and Information Technology, China Three Gorges University, Yichang, 443002, Hubei, People's Republic of China.
| | - Yong Liu
- Hubei Engineering and Technology Research Center for Construction Quality Inspection Equipment, China Three Gorges University, Yichang, 443002, Hubei, People's Republic of China
- College of Computer and Information Technology, China Three Gorges University, Yichang, 443002, Hubei, People's Republic of China
| | - Qinghe Zhang
- Hubei Engineering and Technology Research Center for Construction Quality Inspection Equipment, China Three Gorges University, Yichang, 443002, Hubei, People's Republic of China
- College of Computer and Information Technology, China Three Gorges University, Yichang, 443002, Hubei, People's Republic of China
| |
Collapse
|
17
|
Makram AW, Salem NM, El-Wakad MT, Al-Atabany W. Robust detection and refinement of saliency identification. Sci Rep 2024; 14:11076. [PMID: 38744990 PMCID: PMC11636872 DOI: 10.1038/s41598-024-61105-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Accepted: 05/02/2024] [Indexed: 05/16/2024] Open
Abstract
Salient object detection is an increasingly popular topic in the computer vision field, particularly for images with complex backgrounds and diverse object parts. Background information is an essential factor in detecting salient objects. This paper suggests a robust and effective methodology for salient object detection. This method involves two main stages. The first stage is to produce a saliency detection map based on the dense and sparse reconstruction of image regions using a refined background dictionary. The refined background dictionary uses a boundary conductivity measurement to exclude salient object regions near the image's boundary from a background dictionary. In the second stage, the CascadePSP network is integrated to refine and correct the local boundaries of the saliency mask to highlight saliency objects more uniformly. Using six evaluation indexes, experimental outcomes conducted on three datasets show that the proposed approach performs effectively compared to the state-of-the-art methods in salient object detection, particularly in identifying the challenging salient objects located near the image's boundary. These results demonstrate the potential of the proposed framework for various computer vision applications.
Collapse
Affiliation(s)
- Abram W Makram
- Biomedical Engineering Department, Faculty of Engineering, Helwan University, Helwan, Egypt.
| | - Nancy M Salem
- Biomedical Engineering Department, Faculty of Engineering, Helwan University, Helwan, Egypt
| | | | - Walid Al-Atabany
- Biomedical Engineering Department, Faculty of Engineering, Helwan University, Helwan, Egypt
- Information Technology and Computer Science School, Nile University, Giza, Egypt
| |
Collapse
|
18
|
Liu X, Wang L. MSRMNet: Multi-scale skip residual and multi-mixed features network for salient object detection. Neural Netw 2024; 173:106144. [PMID: 38335792 DOI: 10.1016/j.neunet.2024.106144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 12/08/2023] [Accepted: 01/22/2024] [Indexed: 02/12/2024]
Abstract
The current models for the salient object detection (SOD) have made remarkable progress through multi-scale feature fusion strategies. However, the existing models have large deviations in the detection of different scales, and the target boundaries of the prediction images are still blurred. In this paper, we propose a new model addressing these issues using a transformer backbone to capture multiple feature layers. The model uses multi-scale skip residual connections during encoding to improve the accuracy of the model's predicted object position and edge pixel information. Furthermore, to extract richer multi-scale semantic information, we perform multiple mixed feature operations in the decoding stage. In addition, we add the structure similarity index measure (SSIM) function with coefficients in the loss function to enhance the accurate prediction performance of the boundaries. Experiments demonstrate that our algorithm achieves state-of-the-art results on five public datasets, and improves the performance metrics of the existing SOD tasks. Codes and results are available at: https://github.com/xxwudi508/MSRMNet.
Collapse
Affiliation(s)
- Xinlong Liu
- Sun Yat-Sen University, Guangzhou 510275, China.
| | - Luping Wang
- Sun Yat-Sen University, Guangzhou 510275, China.
| |
Collapse
|
19
|
Wei T, Wang Y, Zhang Y, Wang Y, Zhao L. Boundary-Sensitive Segmentation of Small Liver Lesions. IEEE J Biomed Health Inform 2024; 28:2991-3002. [PMID: 38466585 DOI: 10.1109/jbhi.2024.3375609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/13/2024]
Abstract
Early diagnosis plays a pivotal role in handling the global health challenge posed by liver diseases. However, early-stage lesions are typically quite small, presenting significant difficulties due to insufficient regions for developing effective features, indistinguishable boundaries of small lesions, and a lack of tiny liver lesion masks. To address these issues, we approach the solution in two-fold: an efficient model and a high-quality dataset. The model is built upon the advantages of path signature and camouflaged object detection. The path signature narrows down the ambiguous boundaries between lesions and other tissues while the camouflaged object detection achieves high accuracy in detecting inconspicuous lesions. The two are seamlessly integrated to ensure high accuracy and fidelity. For the dataset, we collect more than ten thousand liver images with over four thousand lesions, approximately half of which are small. Experiments on both an established dataset and our newly constructed one show that the proposed model outperforms state-of-the-art semantic segmentation and camouflaged object detection models, particularly in detecting small lesions. Moreover, the decisive and faithful salience maps generated by the model at the boundary regions demonstrate its strong robustness.
Collapse
|
20
|
Wang W, Sun G, Van Gool L. Looking Beyond Single Images for Weakly Supervised Semantic Segmentation Learning. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:1635-1649. [PMID: 35439127 DOI: 10.1109/tpami.2022.3168530] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
This article studies the problem of learning weakly supervised semantic segmentation (WSSS) from image-level supervision only. Current popular solutions leverage object localization maps from classifiers as supervision for semantic segmentation learning, and struggle to make the localization maps capture more complete object content. Rather than previous efforts that primarily focus on intra-image information, we address the value of cross-image semantic relations for comprehensive object pattern mining. To achieve this, two neural co-attentions are incorporated into the classifier to complementarily capture cross-image semantic similarities and differences. In particular, given a pair of training images, one co-attention enforces the classifier to recognize the common semantics from co-attentive objects, while the other one, called contrastive co-attention, drives the classifier to identify the unique semantics from the rest, unshared objects. This helps the classifier discover more object patterns and better ground semantics in image regions. In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference, hence eventually benefiting semantic segmentation learning. More importantly, our algorithm provides a unified framework that handles well different WSSS settings, i.e., learning WSSS with 1) precise image-level supervision only, 2) extra simple single-label data, and 3) extra noisy web data. Without bells and whistles, it sets new state-of-the-arts on all these settings. Moreover, our approach ranked 1st place in the Weakly-Supervised Semantic Segmentation Track of CVPR2020 Learning from Imperfect Data Challenge. The extensive experimental results demonstrate well the efficacy and high utility of our method.
Collapse
|
21
|
Zhang D, Wu C, Zhou J, Zhang W, Lin Z, Polat K, Alenezi F. Robust underwater image enhancement with cascaded multi-level sub-networks and triple attention mechanism. Neural Netw 2024; 169:685-697. [PMID: 37972512 DOI: 10.1016/j.neunet.2023.11.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 10/23/2023] [Accepted: 11/05/2023] [Indexed: 11/19/2023]
Abstract
With the growing exploration of marine resources, underwater image enhancement has gained significant attention. Recent advances in convolutional neural networks (CNN) have greatly impacted underwater image enhancement techniques. However, conventional CNN-based methods typically employ a single network structure, which may compromise robustness in challenging conditions. Additionally, commonly used UNet networks generally force fusion from low to high resolution for each layer, leading to inaccurate contextual information encoding. To address these issues, we propose a novel network called Cascaded Network with Multi-level Sub-networks (CNMS), which encompasses the following key components: (a) a cascade mechanism based on local modules and global networks for extracting feature representations with richer semantics and enhanced spatial precision, (b) information exchange between different resolution streams, and (c) a triple attention module for extracting attention-based features. CNMS selectively cascades multiple sub-networks through triple attention modules to extract distinct features from underwater images, bolstering the network's robustness and improving generalization capabilities. Within the sub-network, we introduce a Multi-level Sub-network (MSN) that spans multiple resolution streams, combining contextual information from various scales while preserving the original underwater images' high-resolution spatial details. Comprehensive experiments on multiple underwater datasets demonstrate that CNMS outperforms state-of-the-art methods in image enhancement tasks.
Collapse
Affiliation(s)
- Dehuan Zhang
- Dalian Maritime University, College of Information Science and Technology, Dalian, 116026, China.
| | - Chenyu Wu
- Dalian Maritime University, College of Information Science and Technology, Dalian, 116026, China.
| | - Jingchun Zhou
- Dalian Maritime University, College of Information Science and Technology, Dalian, 116026, China.
| | - Weishi Zhang
- Dalian Maritime University, College of Information Science and Technology, Dalian, 116026, China.
| | - Zifan Lin
- Department of Electrical and Electronic Engineering, University of Western Australia, Perth, WA6009, Australia.
| | - Kemal Polat
- Faculty of Engineering, Department of Electrical and Electronics Engineering, Bolu Abant Izzet Baysal University, Bolu, Turkey.
| | - Fayadh Alenezi
- Department of Electrical Engineering, Faculty of Engineering, Jouf University, Sakakah, 72388, Saudi Arabia.
| |
Collapse
|
22
|
Li J, Qiao S, Zhao Z, Xie C, Chen X, Xia C. Rethinking Lightweight Salient Object Detection via Network Depth-Width Tradeoff. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:5664-5677. [PMID: 37773905 DOI: 10.1109/tip.2023.3318959] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/01/2023]
Abstract
Existing salient object detection methods often adopt deeper and wider networks for better performance, resulting in heavy computational burden and slow inference speed. This inspires us to rethink saliency detection to achieve a favorable balance between efficiency and accuracy. To this end, we design a lightweight framework while maintaining satisfying competitive accuracy. Specifically, we propose a novel trilateral decoder framework by decoupling the U-shape structure into three complementary branches, which are devised to confront the dilution of semantic context, loss of spatial structure and absence of boundary detail, respectively. Along with the fusion of three branches, the coarse segmentation results are gradually refined in structure details and boundary quality. Without adding additional learnable parameters, we further propose Scale-Adaptive Pooling Module to obtain multi-scale receptive field. In particular, on the premise of inheriting this framework, we rethink the relationship among accuracy, parameters and speed via network depth-width tradeoff. With these insightful considerations, we comprehensively design shallower and narrower models to explore the maximum potential of lightweight SOD. Our models are proposed for different application environments: 1) a tiny version CTD-S (1.7M, 125FPS) for resource constrained devices, 2) a fast version CTD-M (12.6M, 158FPS) for speed-demanding scenarios, 3) a standard version CTD-L (26.5M, 84FPS) for high-performance platforms. Extensive experiments validate the superiority of our method, which achieves better efficiency-accuracy balance across five benchmarks.
Collapse
|
23
|
Gunaseelan J, Sundaram S, Mariyappan B. A Design and Implementation Using an Innovative Deep-Learning Algorithm for Garbage Segregation. SENSORS (BASEL, SWITZERLAND) 2023; 23:7963. [PMID: 37766020 PMCID: PMC10534615 DOI: 10.3390/s23187963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 09/07/2023] [Accepted: 09/10/2023] [Indexed: 09/29/2023]
Abstract
A startling shift in waste composition has been brought on by a dramatic change in lifestyle, the quick expansion of consumerism brought on by fierce competition among producers of consumer goods, and revolutionary advances in the packaging sector. The overflow or overspill of garbage from the bins causes poison to the soil, and the total obliteration of waste generated in the area or city is unknown. It is challenging to pinpoint with accuracy the specific sort of garbage waste; predictive image classification is lagging, and the existing approach takes longer to identify the specific garbage. To overcome this problem, image classification is carried out using a modified ResNeXt model. By adding a new block known as the "horizontal and vertical block," the proposed ResNeXt architecture expands on the ResNet architecture. Each parallel branch of the block has its own unique collection of convolutional layers. Before moving on to the next layer, these branches are concatenated together. The block's main goal is to expand the network's capacity without considerably raising the number of parameters. ResNeXt is able to capture a wider variety of features in the input image by using parallel branches with various filter sizes, which improves performance on image classification. Some extra dense and dropout layers have been added to the standard ResNeXt model to improve performance. In order to increase the effectiveness of the network connections and decrease the total size of the model, the model is pruned to make it smaller. The overall architecture is trained and tested using garbage images. The convolution neural Network is connected with a modified ResNeXt that is trained using images of metal, trash, and biodegradable, and ResNet 50 is trained using images of non-biodegradable, glass, and hazardous images in a parallel way. An input image is fed to the architecture, and the image classification is achieved simultaneously to identify the exact garbage within a short time with an accuracy of 98%. The achieved results of the suggested method are demonstrated to be superior to those of the deep learning models already in use when compared to a variety of existing deep learning models. The proposed model is implemented into the hardware by designing a three-component smart bin system. It has three separate bins; it collects biodegradable, non-biodegradable, and hazardous waste separately. The smart bin has an ultrasonic sensor to detect the level of the bin, a poisonous gas sensor, a stepper motor to open the lid of the bin, a solar panel for battery storage, a Raspberry Pi camera, and a Raspberry Pi board. The levels of the bin are maintained in a centralized system for future analysis processes. The architecture used in the proposed smart bin properly disposes of the mixed garbage waste in an eco-friendly manner and recovers as much wealth as possible. It also reduces manpower, saves time, ensures proper collection of garbage from the bins, and helps attain a clean environment. The model boosts performance to predict waste generation and classify it with an increased 98.9% accuracy, which is more than the existing system.
Collapse
Affiliation(s)
- Jenilasree Gunaseelan
- Department of Computer Applications, University College of Engineering, Anna University (BIT Campus), Trichy 620 024, Tamilnadu, India;
| | - Sujatha Sundaram
- Department of Computer Applications, University College of Engineering, Anna University (BIT Campus), Trichy 620 024, Tamilnadu, India;
| | - Bhuvaneswari Mariyappan
- Department of ECE, University College of Engineering, Anna University (BIT Campus), Trichy 620 024, Tamilnadu, India
| |
Collapse
|
24
|
Jiao S, Goel V, Navasardyan S, Yang Z, Khachatryan L, Yang Y, Wei Y, Zhao Y, Shi H. Collaborative Content-Dependent Modeling: A Return to the Roots of Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:4237-4246. [PMID: 37440395 DOI: 10.1109/tip.2023.3293759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/15/2023]
Abstract
Salient object detection (SOD) aims to identify the most visually distinctive object(s) from each given image. Most recent progresses focus on either adding elaborative connections among different convolution blocks or introducing boundary-aware supervision to help achieve better segmentation, which is actually moving away from the essence of SOD, i.e., distinctiveness/salience. This paper goes back to the roots of SOD and investigates the principles of how to identify distinctive object(s) in a more effective and efficient way. Intuitively, the salience of one object should largely depend on its global context within the input image. Based on this, we devise a clean yet effective architecture for SOD, named Collaborative Content-Dependent Networks (CCD-Net). In detail, we propose a collaborative content-dependent head whose parameters are conditioned on the input image's global context information. Within the content-dependent head, a hand-crafted multi-scale (HMS) module and a self-induced (SI) module are carefully designed to collaboratively generate content-aware convolution kernels for prediction. Benefited from the content-dependent head, CCD-Net is capable of leveraging global context to detect distinctive object(s) while keeping a simple encoder-decoder design. Extensive experimental results demonstrate that our CCD-Net achieves state-of-the-art results on various benchmarks. Our architecture is simple and intuitive compared to previous solutions, resulting in competitive characteristics with respect to model complexity, operating efficiency, and segmentation accuracy.
Collapse
|
25
|
Zhang X, Yu Y, Wang Y, Chen X, Wang C. Alignment Integration Network for Salient Object Detection and Its Application for Optical Remote Sensing Images. SENSORS (BASEL, SWITZERLAND) 2023; 23:6562. [PMID: 37514856 PMCID: PMC10386270 DOI: 10.3390/s23146562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 07/18/2023] [Accepted: 07/19/2023] [Indexed: 07/30/2023]
Abstract
Salient object detection has made substantial progress due to the exploitation of multi-level convolutional features. The key point is how to combine these convolutional features effectively and efficiently. Due to the step by step down-sampling operations in almost all CNNs, multi-level features usually have different scales. Methods based on fully convolutional networks directly apply bilinear up-sampling to low-resolution deep features and then combine them with high-resolution shallow features by addition or concatenation, which neglects the compatibility of features, resulting in misalignment problems. In this paper, to solve the problem, we propose an alignment integration network (ALNet), which aligns adjacent level features progressively to generate powerful combinations. To capture long-range dependencies for high-level integrated features as well as maintain high computational efficiency, a strip attention module (SAM) is introduced into the alignment integration procedures. Benefiting from SAM, multi-level semantics can be selectively propagated to predict precise salient objects. Furthermore, although integrating multi-level convolutional features can alleviate the blur boundary problem to a certain extent, it is still unsatisfactory for the restoration of a real object boundary. Therefore, we design a simple but effective boundary enhancement module (BEM) to guide the network focus on boundaries and other error-prone parts. Based on BEM, an attention weighted loss is proposed to boost the network to generate sharper object boundaries. Experimental results on five benchmark datasets demonstrate that the proposed method can achieve state-of-the-art performance on salient object detection. Moreover, we extend the experiments on the remote sensing datasets, and the results further prove the universality and scalability of ALNet.
Collapse
Affiliation(s)
- Xiaoning Zhang
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yi Yu
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
| | - Yuqing Wang
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
| | - Xiaolin Chen
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
| | - Chenglong Wang
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
| |
Collapse
|
26
|
Ndayikengurukiye D, Mignotte M. CoSOV1Net: A Cone- and Spatial-Opponent Primary Visual Cortex-Inspired Neural Network for Lightweight Salient Object Detection. SENSORS (BASEL, SWITZERLAND) 2023; 23:6450. [PMID: 37514744 PMCID: PMC10386563 DOI: 10.3390/s23146450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 07/12/2023] [Accepted: 07/14/2023] [Indexed: 07/30/2023]
Abstract
Salient object-detection models attempt to mimic the human visual system's ability to select relevant objects in images. To this end, the development of deep neural networks on high-end computers has recently achieved high performance. However, developing deep neural network models with the same performance for resource-limited vision sensors or mobile devices remains a challenge. In this work, we propose CoSOV1net, a novel lightweight salient object-detection neural network model, inspired by the cone- and spatial-opponent processes of the primary visual cortex (V1), which inextricably link color and shape in human color perception. Our proposed model is trained from scratch, without using backbones from image classification or other tasks. Experiments on the most widely used and challenging datasets for salient object detection show that CoSOV1Net achieves competitive performance (i.e., Fβ=0.931 on the ECSSD dataset) with state-of-the-art salient object-detection models while having a low number of parameters (1.14 M), low FLOPS (1.4 G) and high FPS (211.2) on GPU (Nvidia GeForce RTX 3090 Ti) compared to the state of the art in lightweight or nonlightweight salient object-detection tasks. Thus, CoSOV1net has turned out to be a lightweight salient object-detection model that can be adapted to mobile environments and resource-constrained devices.
Collapse
Affiliation(s)
- Didier Ndayikengurukiye
- Département d'Informatique et de Recherche Opérationnelle, Université de Montréal, Montreal, QC H3C 3J7, Canada
| | - Max Mignotte
- Département d'Informatique et de Recherche Opérationnelle, Université de Montréal, Montreal, QC H3C 3J7, Canada
| |
Collapse
|
27
|
Lin SL. Research on tire crack detection using image deep learning method. Sci Rep 2023; 13:8027. [PMID: 37198216 DOI: 10.1038/s41598-023-35227-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Accepted: 05/15/2023] [Indexed: 05/19/2023] Open
Abstract
Driving can understand the importance of tire tread depth and air pressure, but most people are unaware of the safety risks of tire oxidation. Drivers must maintain vehicle tire quality to ensure performance, efficiency, and safety. In this study, a deep learning tire defect detection method was designed. This paper improves the traditional ShuffleNet and proposes an improved ShuffleNet method for tire image detection. The research results are compared with the five methods of GoogLeNet, traditional ShuffleNet, VGGNet, ResNet and improved ShuffleNet through tire database verification. The experiment found that the detection rate of tire debris defects was 94.7%. Tire defects can be effectively detected, which proves the robustness and effectiveness of the improved ShuffleNet, enabling drivers and tire manufacturers to save labor costs and greatly reduce tire defect detection time.
Collapse
Affiliation(s)
- Shih-Lin Lin
- Graduate Institute of Vehicle Engineering, National Changhua University of Education, No.1, Jin-De Road, Changhua City, 50007, Taiwan.
| |
Collapse
|
28
|
Wu S, Zhang G. SRFFNet: Self-refine, Fusion and Feedback for Salient Object Detection. Cognit Comput 2023. [DOI: 10.1007/s12559-023-10130-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/09/2023]
|
29
|
Zhou X, Tong T, Zhong Z, Fan H, Li Z. Saliency-CCE: Exploiting colour contextual extractor and saliency-based biomedical image segmentation. Comput Biol Med 2023; 154:106551. [PMID: 36716685 DOI: 10.1016/j.compbiomed.2023.106551] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2022] [Revised: 01/03/2023] [Accepted: 01/11/2023] [Indexed: 01/21/2023]
Abstract
Biomedical image segmentation is one critical component in computer-aided system diagnosis. However, various non-automatic segmentation methods are usually designed to segment target objects with single-task driven, ignoring the potential contribution of multi-task, such as the salient object detection (SOD) task and the image segmentation task. In this paper, we propose a novel dual-task framework for white blood cell (WBC) and skin lesion (SL) saliency detection and segmentation in biomedical images, called Saliency-CCE. Saliency-CCE consists of a preprocessing of hair removal for skin lesions images, a novel colour contextual extractor (CCE) module for the SOD task and an improved adaptive threshold (AT) paradigm for the image segmentation task. In the SOD task, we perform the CCE module to extract hand-crafted features through a novel colour channel volume (CCV) block and a novel colour activation mapping (CAM) block. We first exploit the CCV block to generate a target object's region of interest (ROI). After that, we employ the CAM block to yield a refined salient map as the final salient map from the extracted ROI. We propose a novel adaptive threshold (AT) strategy in the segmentation task to automatically segment the WBC and SL from the final salient map. We evaluate our proposed Saliency-CCE on the ISIC-2016, the ISIC-2017, and the SCISC datasets, which outperform representative state-of-the-art SOD and biomedical image segmentation approaches. Our code is available at https://github.com/zxg3017/Saliency-CCE.
Collapse
Affiliation(s)
- Xiaogen Zhou
- Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, Minjiang University, Fuzhou, P.R. China; College of Physics and Information Engineering, Fuzhou University, Fuzhou, P.R. China
| | - Tong Tong
- College of Physics and Information Engineering, Fuzhou University, Fuzhou, P.R. China
| | - Zhixiong Zhong
- Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, Minjiang University, Fuzhou, P.R. China
| | - Haoyi Fan
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, P.R. China
| | - Zuoyong Li
- Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, Minjiang University, Fuzhou, P.R. China.
| |
Collapse
|
30
|
Zhuge M, Fan DP, Liu N, Zhang D, Xu D, Shao L. Salient Object Detection via Integrity Learning. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:3738-3752. [PMID: 35666793 DOI: 10.1109/tpami.2022.3179526] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Although current salient object detection (SOD) works have achieved significant progress, they are limited when it comes to the integrity of the predicted salient regions. We define the concept of integrity at both a micro and macro level. Specifically, at the micro level, the model should highlight all parts that belong to a certain salient object. Meanwhile, at the macro level, the model needs to discover all salient objects in a given image. To facilitate integrity learning for SOD, we design a novel Integrity Cognition Network (ICON), which explores three important components for learning strong integrity features. 1) Unlike existing models, which focus more on feature discriminability, we introduce a diverse feature aggregation (DFA) component to aggregate features with various receptive fields (i.e., kernel shape and context) and increase feature diversity. Such diversity is the foundation for mining the integral salient objects. 2) Based on the DFA features, we introduce an integrity channel enhancement (ICE) component with the goal of enhancing feature channels that highlight the integral salient objects, while suppressing the other distracting ones. 3) After extracting the enhanced features, the part-whole verification (PWV) method is employed to determine whether the part and whole object features have strong agreement. Such part-whole agreements can further improve the micro-level integrity for each salient object. To demonstrate the effectiveness of our ICON, comprehensive experiments are conducted on seven challenging benchmarks. Our ICON outperforms the baseline methods in terms of a wide range of metrics. Notably, our ICON achieves ∼ 10% relative improvement over the previous best model in terms of average false negative ratio (FNR), on six datasets. Codes and results are available at: https://github.com/mczhuge/ICON.
Collapse
|
31
|
Diaz-Guerra F, Jimenez-Molina A. Continuous Prediction of Web User Visual Attention on Short Span Windows Based on Gaze Data Analytics. SENSORS (BASEL, SWITZERLAND) 2023; 23:2294. [PMID: 36850892 PMCID: PMC9960063 DOI: 10.3390/s23042294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 01/27/2023] [Accepted: 02/13/2023] [Indexed: 06/18/2023]
Abstract
Understanding users' visual attention on websites is paramount to enhance the browsing experience, such as providing emergent information or dynamically adapting Web interfaces. Existing approaches to accomplish these challenges are generally based on the computation of salience maps of static Web interfaces, while websites increasingly become more dynamic and interactive. This paper proposes a method and provides a proof-of-concept to predict user's visual attention on specific regions of a website with dynamic components. This method predicts the regions of a user's visual attention without requiring a constant recording of the current layout of the website, but rather by knowing the structure it presented in a past period. To address this challenge, the concept of visit intention is introduced in this paper, defined as the probability that a user, while browsing, will fixate their gaze on a specific region of the website in the next period. Our approach uses the gaze patterns of a population that browsed a specific website, captured via an eye-tracker device, to aid personalized prediction models built with individual visual kinetics features. We show experimentally that it is possible to conduct such a prediction through multilabel classification models using a small number of users, obtaining an average area under curve of 84.3%, and an average accuracy of 79%. Furthermore, the user's visual kinetics features are consistently selected in every set of a cross-validation evaluation.
Collapse
Affiliation(s)
| | - Angel Jimenez-Molina
- Department of Industrial Engineering, University of Chile, Santiago 8370456, Chile
- Engineering Complex Systems Institute, Santiago 8370398, Chile
| |
Collapse
|
32
|
Fan DP, Zhang J, Xu G, Cheng MM, Shao L. Salient Objects in Clutter. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:2344-2366. [PMID: 35404809 DOI: 10.1109/tpami.2022.3166451] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In this paper, we identify and address a serious design bias of existing salient object detection (SOD) datasets, which unrealistically assume that each image should contain at least one clear and uncluttered salient object. This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets. However, these models are still far from satisfactory when applied to real-world scenes. Based on our analyses, we propose a new high-quality dataset and update the previous saliency benchmark. Specifically, our dataset, called Salient Objects in Clutter (SOC), includes images with both salient and non-salient objects from several common object categories. In addition to object category annotations, each salient image is accompanied by attributes that reflect common challenges in common scenes, which can help provide deeper insight into the SOD problem. Further, with a given saliency encoder, e.g., the backbone network, existing saliency models are designed to achieve mapping from the training image set to the training ground-truth set. We therefore argue that improving the dataset can yield higher performance gains than focusing only on the decoder design. With this in mind, we investigate several dataset-enhancement strategies, including label smoothing to implicitly emphasize salient boundaries, random image augmentation to adapt saliency models to various scenarios, and self-supervised learning as a regularization strategy to learn from small datasets. Our extensive results demonstrate the effectiveness of these tricks. We also provide a comprehensive benchmark for SOD, which can be found in our repository: https://github.com/DengPingFan/SODBenchmark.
Collapse
|
33
|
Pang Y, Zhao X, Zhang L, Lu H. CAVER: Cross-Modal View-Mixed Transformer for Bi-Modal Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:892-904. [PMID: 37018701 DOI: 10.1109/tip.2023.3234702] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Most of the existing bi-modal (RGB-D and RGB-T) salient object detection methods utilize the convolution operation and construct complex interweave fusion structures to achieve cross-modal information integration. The inherent local connectivity of the convolution operation constrains the performance of the convolution-based methods to a ceiling. In this work, we rethink these tasks from the perspective of global information alignment and transformation. Specifically, the proposed cross-modal view-mixed transformer (CAVER) cascades several cross-modal integration units to construct a top-down transformer-based information propagation path. CAVER treats the multi-scale and multi-modal feature integration as a sequence-to-sequence context propagation and update process built on a novel view-mixed attention mechanism. Besides, considering the quadratic complexity w.r.t. the number of input tokens, we design a parameter-free patch-wise token re-embedding strategy to simplify operations. Extensive experimental results on RGB-D and RGB-T SOD datasets demonstrate that such a simple two-stream encoder-decoder framework can surpass recent state-of-the-art methods when it is equipped with the proposed components.
Collapse
|
34
|
Wu Z, Allibert G, Meriaudeau F, Ma C, Demonceaux C. HiDAnet: RGB-D Salient Object Detection via Hierarchical Depth Awareness. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:2160-2173. [PMID: 37027289 DOI: 10.1109/tip.2023.3263111] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
RGB-D saliency detection aims to fuse multi-modal cues to accurately localize salient regions. Existing works often adopt attention modules for feature modeling, with few methods explicitly leveraging fine-grained details to merge with semantic cues. Thus, despite the auxiliary depth information, it is still challenging for existing models to distinguish objects with similar appearances but at distinct camera distances. In this paper, from a new perspective, we propose a novel Hierarchical Depth Awareness network (HiDAnet) for RGB-D saliency detection. Our motivation comes from the observation that the multi-granularity properties of geometric priors correlate well with the neural network hierarchies. To realize multi-modal and multi-level fusion, we first use a granularity-based attention scheme to strengthen the discriminatory power of RGB and depth features separately. Then we introduce a unified cross dual-attention module for multi-modal and multi-level fusion in a coarse-to-fine manner. The encoded multi-modal features are gradually aggregated into a shared decoder. Further, we exploit a multi-scale loss to take full advantage of the hierarchical information. Extensive experiments on challenging benchmark datasets demonstrate that our HiDAnet performs favorably over the state-of-the-art methods by large margins. The source code can be found in https://github.com/Zongwei97/HIDANet/.
Collapse
|
35
|
Liu JJ, Hou Q, Liu ZA, Cheng MM. PoolNet+: Exploring the Potential of Pooling for Salient Object Detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:887-904. [PMID: 34982676 DOI: 10.1109/tpami.2021.3140168] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
We explore the potential of pooling techniques on the task of salient object detection by expanding its role in convolutional neural networks. In general, two pooling-based modules are proposed. A global guidance module (GGM) is first built based on the bottom-up pathway of the U-shape architecture, which aims to guide the location information of the potential salient objects into layers at different feature levels. A feature aggregation module (FAM) is further designed to seamlessly fuse the coarse-level semantic information with the fine-level features in the top-down pathway. We can progressively refine the high-level semantic features with these two modules and obtain detail enriched saliency maps. Experimental results show that our proposed approach can locate the salient objects more accurately with sharpened details and substantially improve the performance compared with the existing state-of-the-art methods. Besides, our approach is fast and can run at a speed of 53 FPS when processing a 300 ×400 image. To make our approach better applied to mobile applications, we take MobileNetV2 as our backbone and re-tailor the structure of our pooling-based modules. Our mobile version model achieves a running speed of 66 FPS yet still performs better than most existing state-of-the-art methods. To verify the generalization ability of the proposed method, we apply it to the edge detection, RGB-D salient object detection, and camouflaged object detection tasks, and our method achieves better results than the corresponding state-of-the-art methods of these three tasks. Code can be found at http://mmcheng.net/poolnet/.
Collapse
|
36
|
Wu YH, Liu Y, Xu J, Bian JW, Gu YC, Cheng MM. MobileSal: Extremely Efficient RGB-D Salient Object Detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:10261-10269. [PMID: 34898430 DOI: 10.1109/tpami.2021.3134684] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The high computational cost of neural networks has prevented recent successes in RGB-D salient object detection (SOD) from benefiting real-world applications. Hence, this article introduces a novel network, MobileSal, which focuses on efficient RGB-D SOD using mobile networks for deep feature extraction. However, mobile networks are less powerful in feature representation than cumbersome networks. To this end, we observe that the depth information of color images can strengthen the feature representation related to SOD if leveraged properly. Therefore, we propose an implicit depth restoration (IDR) technique to strengthen the mobile networks' feature representation capability for RGB-D SOD. IDR is only adopted in the training phase and is omitted during testing, so it is computationally free. Besides, we propose compact pyramid refinement (CPR) for efficient multi-level feature aggregation to derive salient objects with clear boundaries. With IDR and CPR incorporated, MobileSal performs favorably against state-of-the-art methods on six challenging RGB-D SOD datasets with much faster speed (450fps for the input size of 320×320) and fewer parameters (6.5M). The code is released at https://mmcheng.net/mobilesal.
Collapse
|
37
|
Biswal MR, Delwar TS, Siddique A, Behera P, Choi Y, Ryu JY. Pattern Classification Using Quantized Neural Networks for FPGA-Based Low-Power IoT Devices. SENSORS (BASEL, SWITZERLAND) 2022; 22:8694. [PMID: 36433289 PMCID: PMC9699191 DOI: 10.3390/s22228694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Revised: 11/06/2022] [Accepted: 11/08/2022] [Indexed: 06/16/2023]
Abstract
With the recent growth of the Internet of Things (IoT) and the demand for faster computation, quantized neural networks (QNNs) or QNN-enabled IoT can offer better performance than conventional convolution neural networks (CNNs). With the aim of reducing memory access costs and increasing the computation efficiency, QNN-enabled devices are expected to transform numerous industrial applications with lower processing latency and power consumption. Another form of QNN is the binarized neural network (BNN), which has 2 bits of quantized levels. In this paper, CNN-, QNN-, and BNN-based pattern recognition techniques are implemented and analyzed on an FPGA. The FPGA hardware acts as an IoT device due to connectivity with the cloud, and QNN and BNN are considered to offer better performance in terms of low power and low resource use on hardware platforms. The CNN and QNN implementation and their comparative analysis are analyzed based on their accuracy, weight bit error, RoC curve, and execution speed. The paper also discusses various approaches that can be deployed for optimizing various CNN and QNN models with additionally available tools. The work is performed on the Xilinx Zynq 7020 series Pynq Z2 board, which serves as our FPGA-based low-power IoT device. The MNIST and CIFAR-10 databases are considered for simulation and experimentation. The work shows that the accuracy is 95.5% and 79.22% for the MNIST and CIFAR-10 databases, respectively, for full precision (32-bit), and the execution time is 5.8 ms and 18 ms for the MNIST and CIFAR-10 databases, respectively, for full precision (32-bit).
Collapse
Affiliation(s)
- Manas Ranjan Biswal
- Department of Intelligent Robot Engineering, Pukyong National University, Busan 48513, Republic of Korea
| | - Tahesin Samira Delwar
- Department of Intelligent Robot Engineering, Pukyong National University, Busan 48513, Republic of Korea
| | - Abrar Siddique
- Department of Intelligent Robot Engineering, Pukyong National University, Busan 48513, Republic of Korea
- Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada
| | - Prangyadarsini Behera
- Department of Intelligent Robot Engineering, Pukyong National University, Busan 48513, Republic of Korea
| | - Yeji Choi
- Department of Intelligent Robot Engineering, Pukyong National University, Busan 48513, Republic of Korea
| | - Jee-Youl Ryu
- Department of Intelligent Robot Engineering, Pukyong National University, Busan 48513, Republic of Korea
| |
Collapse
|
38
|
Liu N, Li L, Zhao W, Han J, Shao L. Instance-Level Relative Saliency Ranking With Graph Reasoning. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:8321-8337. [PMID: 34437057 DOI: 10.1109/tpami.2021.3107872] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Conventional salient object detection models cannot differentiate the importance of different salient objects. Recently, two works have been proposed to detect saliency ranking by assigning different degrees of saliency to different objects. However, one of these models cannot differentiate object instances and the other focuses more on sequential attention shift order inference. In this paper, we investigate a practical problem setting that requires simultaneously segment salient instances and infer their relative saliency rank order. We present a novel unified model as the first end-to-end solution, where an improved Mask R-CNN is first used to segment salient instances and a saliency ranking branch is then added to infer the relative saliency. For relative saliency ranking, we build a new graph reasoning module by combining four graphs to incorporate the instance interaction relation, local contrast, global contrast, and a high-level semantic prior, respectively. A novel loss function is also proposed to effectively train the saliency ranking branch. Besides, a new dataset and an evaluation metric are proposed for this task, aiming at pushing forward this field of research. Finally, experimental results demonstrate that our proposed model is more effective than previous methods. We also show an example of its practical usage on adaptive image retargeting.
Collapse
|
39
|
Jia XZ, DongYe CL, Peng YJ, Zhao WX, Liu TD. MRBENet: A Multiresolution Boundary Enhancement Network for Salient Object Detection. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:7780756. [PMID: 36262601 PMCID: PMC9576351 DOI: 10.1155/2022/7780756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 09/08/2022] [Accepted: 09/24/2022] [Indexed: 11/17/2022]
Abstract
Salient Object Detection (SOD) simulates the human visual perception in locating the most attractive objects in the images. Existing methods based on convolutional neural networks have proven to be highly effective for SOD. However, in some cases, these methods cannot satisfy the need of both accurately detecting intact objects and maintaining their boundary details. In this paper, we present a Multiresolution Boundary Enhancement Network (MRBENet) that exploits edge features to optimize the location and boundary fineness of salient objects. We incorporate a deeper convolutional layer into the backbone network to extract high-level semantic features and indicate the location of salient objects. Edge features of different resolutions are extracted by a U-shaped network. We designed a Feature Fusion Module (FFM) to fuse edge features and salient features. Feature Aggregation Module (FAM) based on spatial attention performs multiscale convolutions to enhance salient features. The FFM and FAM allow the model to accurately locate salient objects and enhance boundary fineness. Extensive experiments on six benchmark datasets demonstrate that the proposed method is highly effective and improves the accuracy of salient object detection compared with state-of-the-art methods.
Collapse
Affiliation(s)
- Xing-Zhao Jia
- College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China
| | - Chang-Lei DongYe
- College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China
| | - Yan-Jun Peng
- College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China
| | - Wen-Xiu Zhao
- College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China
| | - Tian-De Liu
- College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China
| |
Collapse
|
40
|
Fan DP, Ji GP, Cheng MM, Shao L. Concealed Object Detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:6024-6042. [PMID: 34061739 DOI: 10.1109/tpami.2021.3085766] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
We present the first systematic study on concealed object detection (COD), which aims to identify objects that are visually embedded in their background. The high intrinsic similarities between the concealed objects and their background make COD far more challenging than traditional object detection/segmentation. To better understand this task, we collect a large-scale dataset, called COD10K, which consists of 10,000 images covering concealed objects in diverse real-world scenarios from 78 object categories. Further, we provide rich annotations including object categories, object boundaries, challenging attributes, object-level labels, and instance-level annotations. Our COD10K is the largest COD dataset to date, with the richest annotations, which enables comprehensive concealed object understanding and can even be used to help progress several other vision tasks, such as detection, segmentation, classification etc. Motivated by how animals hunt in the wild, we also design a simple but strong baseline for COD, termed the Search Identification Network (SINet). Without any bells and whistles, SINet outperforms twelve cutting-edge baselines on all datasets tested, making them robust, general architectures that could serve as catalysts for future research in COD. Finally, we provide some interesting findings, and highlight several potential applications and future directions. To spark research in this new field, our code, dataset, and online demo are available at our project page: http://mmcheng.net/cod.
Collapse
|
41
|
Zhang Y, Dong L, Yang H, Qing L, He X, Chen H. Weakly-supervised contrastive learning-based implicit degradation modeling for blind image super-resolution. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
42
|
Fan DP, Li T, Lin Z, Ji GP, Zhang D, Cheng MM, Fu H, Shen J. Re-Thinking Co-Salient Object Detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:4339-4354. [PMID: 33600309 DOI: 10.1109/tpami.2021.3060412] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In this article, we conduct a comprehensive study on the co-salient object detection (CoSOD) problem for images. CoSOD is an emerging and rapidly growing extension of salient object detection (SOD), which aims to detect the co-occurring salient objects in a group of images. However, existing CoSOD datasets often have a serious data bias, assuming that each group of images contains salient objects of similar visual appearances. This bias can lead to the ideal settings and effectiveness of models trained on existing datasets, being impaired in real-life situations, where similarities are usually semantic or conceptual. To tackle this issue, we first introduce a new benchmark, called CoSOD3k in the wild, which requires a large amount of semantic context, making it more challenging than existing CoSOD datasets. Our CoSOD3k consists of 3,316 high-quality, elaborately selected images divided into 160 groups with hierarchical annotations. The images span a wide range of categories, shapes, object sizes, and backgrounds. Second, we integrate the existing SOD techniques to build a unified, trainable CoSOD framework, which is long overdue in this field. Specifically, we propose a novel CoEG-Net that augments our prior model EGNet with a co-attention projection strategy to enable fast common information learning. CoEG-Net fully leverages previous large-scale SOD datasets and significantly improves the model scalability and stability. Third, we comprehensively summarize 40 cutting-edge algorithms, benchmarking 18 of them over three challenging CoSOD datasets (iCoSeg, CoSal2015, and our CoSOD3k), and reporting more detailed (i.e., group-level) performance analysis. Finally, we discuss the challenges and future works of CoSOD. We hope that our study will give a strong boost to growth in the CoSOD community. The benchmark toolbox and results are available on our project page at https://dpfan.net/CoSOD3K.
Collapse
|
43
|
Zhang N, Han J, Liu N. Learning Implicit Class Knowledge for RGB-D Co-Salient Object Detection With Transformers. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:4556-4570. [PMID: 35763477 DOI: 10.1109/tip.2022.3185550] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
RGB-D co-salient object detection aims to segment co-occurring salient objects when given a group of relevant images and depth maps. Previous methods often adopt separate pipeline and use hand-crafted features, being hard to capture the patterns of co-occurring salient objects and leading to unsatisfactory results. Using end-to-end CNN models is a straightforward idea, but they are less effective in exploiting global cues due to the intrinsic limitation. Thus, in this paper, we alternatively propose an end-to-end transformer-based model which uses class tokens to explicitly capture implicit class knowledge to perform RGB-D co-salient object detection, denoted as CTNet. Specifically, we first design adaptive class tokens for individual images to explore intra-saliency cues and then develop common class tokens for the whole group to explore inter-saliency cues. Besides, we also leverage the complementary cues between RGB images and depth maps to promote the learning of the above two types of class tokens. In addition, to promote model evaluation, we construct a challenging and large-scale benchmark dataset, named RGBD CoSal1k, which collects 106 groups containing 1000 pairs of RGB-D images with complex scenarios and diverse appearances. Experimental results on three benchmark datasets demonstrate the effectiveness of our proposed method.
Collapse
|
44
|
Tran VN, Liu SH, Li YH, Wang JC. Heuristic Attention Representation Learning for Self-Supervised Pretraining. SENSORS (BASEL, SWITZERLAND) 2022; 22:s22145169. [PMID: 35890847 PMCID: PMC9320898 DOI: 10.3390/s22145169] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/04/2022] [Revised: 07/03/2022] [Accepted: 07/07/2022] [Indexed: 05/27/2023]
Abstract
Recently, self-supervised learning methods have been shown to be very powerful and efficient for yielding robust representation learning by maximizing the similarity across different augmented views in embedding vector space. However, the main challenge is generating different views with random cropping; the semantic feature might exist differently across different views leading to inappropriately maximizing similarity objective. We tackle this problem by introducing Heuristic Attention Representation Learning (HARL). This self-supervised framework relies on the joint embedding architecture in which the two neural networks are trained to produce similar embedding for different augmented views of the same image. HARL framework adopts prior visual object-level attention by generating a heuristic mask proposal for each training image and maximizes the abstract object-level embedding on vector space instead of whole image representation from previous works. As a result, HARL extracts the quality semantic representation from each training sample and outperforms existing self-supervised baselines on several downstream tasks. In addition, we provide efficient techniques based on conventional computer vision and deep learning methods for generating heuristic mask proposals on natural image datasets. Our HARL achieves +1.3% advancement in the ImageNet semi-supervised learning benchmark and +0.9% improvement in AP50 of the COCO object detection task over the previous state-of-the-art method BYOL. Our code implementation is available for both TensorFlow and PyTorch frameworks.
Collapse
Affiliation(s)
- Van Nhiem Tran
- Department of Computer Science and Information Engineering, National Central University, Taoyuan 3200, Taiwan; (V.N.T.); (S.-H.L.); (J.-C.W.)
- AI Research Center, Hon Hai Research Institute, Taipei 114699, Taiwan
| | - Shen-Hsuan Liu
- Department of Computer Science and Information Engineering, National Central University, Taoyuan 3200, Taiwan; (V.N.T.); (S.-H.L.); (J.-C.W.)
- AI Research Center, Hon Hai Research Institute, Taipei 114699, Taiwan
| | - Yung-Hui Li
- AI Research Center, Hon Hai Research Institute, Taipei 114699, Taiwan
| | - Jia-Ching Wang
- Department of Computer Science and Information Engineering, National Central University, Taoyuan 3200, Taiwan; (V.N.T.); (S.-H.L.); (J.-C.W.)
| |
Collapse
|
45
|
Giang TTH, Khai TQ, Im DY, Ryoo YJ. Fast Detection of Tomato Sucker Using Semantic Segmentation Neural Networks Based on RGB-D Images. SENSORS (BASEL, SWITZERLAND) 2022; 22:s22145140. [PMID: 35890823 PMCID: PMC9320735 DOI: 10.3390/s22145140] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 07/01/2022] [Accepted: 07/06/2022] [Indexed: 05/14/2023]
Abstract
Tomato sucker or axillary shoots should be removed to increase the yield and reduce the disease on tomato plants. It is an essential step in the tomato plant care process. It is usually performed manually by farmers. An automated approach can save a lot of time and labor. In the literature review, we see that semantic segmentation is a process of recognizing or classifying each pixel in an image, and it can help machines recognize and localize tomato suckers. This paper proposes a semantic segmentation neural network that can detect tomato suckers quickly by the tomato plant images. We choose RGB-D images which capture not only the visual of objects but also the distance information from objects to the camera. We make a tomato RGB-D image dataset for training and evaluating the proposed neural network. The proposed semantic segmentation neural network can run in real-time at 138.2 frames per second. Its number of parameters is 680, 760, much smaller than other semantic segmentation neural networks. It can correctly detect suckers at 80.2%. It requires low system resources and is suitable for the tomato dataset. We compare it to other popular non-real-time and real-time networks on the accuracy, time of execution, and sucker detection to prove its better performance.
Collapse
Affiliation(s)
- Truong Thi Huong Giang
- Department of Electrical Engineering, Mokpo National University, Muan 58554, Korea; (T.T.H.G.); (T.Q.K.)
| | - Tran Quoc Khai
- Department of Electrical Engineering, Mokpo National University, Muan 58554, Korea; (T.T.H.G.); (T.Q.K.)
| | - Dae-Young Im
- Components & Materials R&D Group, Korea Institute of Industrial Technology, Gwangju 61012, Korea;
| | - Young-Jae Ryoo
- Department of Electrical and Control Engineering, Mokpo National University, Muan 58554, Korea
- Correspondence:
| |
Collapse
|
46
|
Pei J, Zhou T, Tang H, Liu C, Chen C. FGO-Net: Feature and Gaussian Optimization Network for visual saliency prediction. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03647-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
47
|
A2TPNet: Alternate Steered Attention and Trapezoidal Pyramid Fusion Network for RGB-D Salient Object Detection. ELECTRONICS 2022. [DOI: 10.3390/electronics11131968] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
RGB-D salient object detection (SOD) aims at locating the most eye-catching object in visual input by fusing complementary information of RGB modality and depth modality. Most of the existing RGB-D SOD methods integrate multi-modal features to generate the saliency map indiscriminately, ignoring the ambiguity between different modalities. To better use multi-modal complementary information and alleviate the negative impact of ambiguity among different modalities, this paper proposes a novel Alternate Steered Attention and Trapezoidal Pyramid Fusion Network (A2TPNet) for RGB-D SOD composed of Cross-modal Alternate Fusion Module (CAFM) and Trapezoidal Pyramid Fusion Module (TPFM). CAFM is focused on fusing cross-modal features, taking full consideration of the ambiguity between cross-modal data by an Alternate Steered Attention (ASA), and it reduces the interference of redundant information and non-salient features in the interactive process through a collaboration mechanism containing channel attention and spatial attention. TPFM endows the RGB-D SOD model with more powerful feature expression capabilities by combining multi-scale features to enhance the expressive ability of contextual semantics of the model. Extensive experimental results on five publicly available datasets demonstrate that the proposed model consistently outperforms 17 state-of-the-art methods.
Collapse
|
48
|
Zhou L, Zhou T, Khan S, Sun H, Shen J, Shao L. Weakly Supervised Visual Saliency Prediction. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:3111-3124. [PMID: 35380961 DOI: 10.1109/tip.2022.3158064] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The success of current deep saliency models heavily depends on large amounts of annotated human fixation data to fit the highly non-linear mapping between the stimuli and visual saliency. Such fully supervised data-driven approaches are annotation-intensive and often fail to consider the underlying mechanisms of visual attention. In contrast, in this paper, we introduce a model based on various cognitive theories of visual saliency, which learns visual attention patterns in a weakly supervised manner. Our approach incorporates insights from cognitive science as differentiable submodules, resulting in a unified, end-to-end trainable framework. Specifically, our model encapsulates the following important components motivated from biological vision. (a) As scene semantics are closely related to visually attentive regions, our model encodes discriminative spatial information for scene understanding through spatial visual semantics embedding. (b) To model the objectness factors in visual attention deployment, we incorporate object-level semantics embedding and object relation information. (c) Considering the "winner-take-all" mechanism in visual stimuli processing, we model the competition mechanism among objects with softmax based neural attention. (d) Lastly, a conditional center prior is learned to mimic the spatial distribution bias of visual attention. Furthermore, we propose novel loss functions to utilize supervision cues from image-level semantics, saliency prior knowledge, and self-information compression. Experiments show that our method achieves promising results, and even outperforms many of its fully supervised counterparts. Overall, our weakly supervised saliency method makes an essential step towards reducing the annotation budget of current approaches, as well as providing a more comprehensive understanding of the visual attention mechanism. Our code is available at: https://github.com/ashleylqx/WeakFixation.git.
Collapse
|
49
|
Yu L, Mei H, Dong W, Wei Z, Zhu L, Wang Y, Yang X. Progressive Glass Segmentation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:2920-2933. [PMID: 35363615 DOI: 10.1109/tip.2022.3162709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Glass is very common in the real world. Influenced by the uncertainty about the glass region and the varying complex scenes behind the glass, the existence of glass poses severe challenges to many computer vision tasks, making glass segmentation as an important computer vision task. Glass does not have its own visual appearances but only transmit/reflect the appearances of its surroundings, making it fundamentally different from other common objects. To address such a challenging task, existing methods typically explore and combine useful cues from different levels of features in the deep network. As there exists a characteristic gap between level-different features, i.e., deep layer features embed more high-level semantics and are better at locating the target objects while shallow layer features have larger spatial sizes and keep richer and more detailed low-level information, fusing these features naively thus would lead to a sub-optimal solution. In this paper, we approach the effective features fusion towards accurate glass segmentation in two steps. First, we attempt to bridge the characteristic gap between different levels of features by developing a Discriminability Enhancement (DE) module which enables level-specific features to be a more discriminative representation, alleviating the features incompatibility for fusion. Second, we design a Focus-and-Exploration Based Fusion (FEBF) module to richly excavate useful information in the fusion process by highlighting the common and exploring the difference between level-different features. Combining these two steps, we construct a Progressive Glass Segmentation Network (PGSNet) which uses multiple DE and FEBF modules to progressively aggregate features from high-level to low-level, implementing a coarse-to-fine glass segmentation. In addition, we build the first home-scene-oriented glass segmentation dataset for advancing household robot applications and in-depth research on this topic. Extensive experiments demonstrate that our method outperforms 26 cutting-edge models on three challenging datasets under four standard metrics. The code and dataset will be made publicly available.
Collapse
|
50
|
A Saliency Prediction Model Based on Re-Parameterization and Channel Attention Mechanism. ELECTRONICS 2022. [DOI: 10.3390/electronics11081180] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Deep saliency models can effectively imitate the attention mechanism of human vision, and they perform considerably better than classical models that rely on handcrafted features. However, deep models also require higher-level information, such as context or emotional content, to further approach human performance. Therefore, this study proposes a multilevel saliency prediction network that aims to use a combination of spatial and channel information to find possible high-level features, further improving the performance of a saliency model. Firstly, we use a VGG style network with an identity block as the primary network architecture. With the help of re-parameterization, we can obtain rich features similar to multiscale networks and effectively reduce computational cost. Secondly, a subnetwork with a channel attention mechanism is designed to find potential saliency regions and possible high-level semantic information in an image. Finally, image spatial features and a channel enhancement vector are combined after quantization to improve the overall performance of the model. Compared with classical models and other deep models, our model exhibits superior overall performance.
Collapse
|