1
|
Liu W, Kang X, Duan P, Xie Z, Wei X, Li S. SOSNet: Real-Time Small Object Segmentation via Hierarchical Decoding and Example Mining. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:3071-3083. [PMID: 38090866 DOI: 10.1109/tnnls.2023.3338732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
Real-time semantic segmentation plays an important role in auto vehicles. However, most real-time small object segmentation methods fail to obtain satisfactory performance on small objects, such as cars and sign symbols, since the large objects usually tend to devote more to the segmentation result. To solve this issue, we propose an efficient and effective architecture, termed small objects segmentation network (SOSNet), to improve the segmentation performance of small objects. The SOSNet works from two perspectives: methodology and data. Specifically, with the former, we propose a dual-branch hierarchical decoder (DBHD) which is viewed as a small-object sensitive segmentation head. The DBHD consists of a top segmentation head that predicts whether the pixels belong to a small object class and a bottom one that estimates the pixel class. In this situation, the latent correlation among small objects can be fully explored. With the latter, we propose a small object example mining (SOEM) algorithm for balancing examples between small objects and large objects automatically. The core idea of the proposed SOEM is that most of the hard examples on small-object classes are reserved for training while most of the easy examples on large-object classes are banned. Experiments on three commonly used datasets show that the proposed SOSNet architecture greatly improves the accuracy compared to the existing real-time semantic segmentation methods while keeping efficiency. The code will be available at https://github.com/StuLiu/SOSNet.
Collapse
|
2
|
Bao Y, Kang G, Yang L, Duan X, Zhao B, Zhang B. Normalizing Batch Normalization for Long-Tailed Recognition. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; PP:209-220. [PMID: 40030792 DOI: 10.1109/tip.2024.3518099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
In real-world scenarios, the number of training samples across classes usually subjects to a long-tailed distribution. The conventionally trained network may achieve unexpected inferior performance on the rare class compared to the frequent class. Most previous works attempt to rectify the network bias from the data-level or from the classifier-level. Differently, in this paper, we identify that the bias towards the frequent class may be encoded into features, i.e., the rare-specific features which play a key role in discriminating the rare class are much weaker than the frequent-specific features. Based on such an observation, we introduce a simple yet effective approach, normalizing the parameters of Batch Normalization (BN) layer to explicitly rectify the feature bias. To achieve this end, we represent theWeight/Bias parameters of a BN layer as a vector, normalize it into a unit one and multiply the unit vector by a scalar learnable parameter. Through decoupling the direction and magnitude of parameters in BN layer to learn, the Weight/Bias exhibits a more balanced distribution and thus the strength of features becomes more even. Extensive experiments on various long-tailed recognition benchmarks (i.e., CIFAR-10/100-LT, ImageNet-LT and iNaturalist 2018) show that our method outperforms previous state-of-the-arts remarkably.
Collapse
|
3
|
Sharma R, Saqib M, Lin CT, Blumenstein M. Enhanced Atrous Spatial Pyramid Pooling Feature Fusion for Small Ship Instance Segmentation. J Imaging 2024; 10:299. [PMID: 39728196 DOI: 10.3390/jimaging10120299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2024] [Revised: 11/02/2024] [Accepted: 11/07/2024] [Indexed: 12/28/2024] Open
Abstract
In the maritime environment, the instance segmentation of small ships is crucial. Small ships are characterized by their limited appearance, smaller size, and ships in distant locations in marine scenes. However, existing instance segmentation algorithms do not detect and segment them, resulting in inaccurate ship segmentation. To address this, we propose a novel solution called enhanced Atrous Spatial Pyramid Pooling (ASPP) feature fusion for small ship instance segmentation. The enhanced ASPP feature fusion module focuses on small objects by refining them and fusing important features. The framework consistently outperforms state-of-the-art models, including Mask R-CNN, Cascade Mask R-CNN, YOLACT, SOLO, and SOLOv2, in three diverse datasets, achieving an average precision (mask AP) score of 75.8% for ShipSG, 69.5% for ShipInsSeg, and 54.5% for the MariBoats datasets.
Collapse
Affiliation(s)
- Rabi Sharma
- School of Computer Science, University of Technology Sydney, Broadway, Sydney 2007, Australia
| | - Muhammad Saqib
- School of Computer Science, University of Technology Sydney, Broadway, Sydney 2007, Australia
- National Collections & Marine Infrastructure, CSIRO, Sydney 2007, Australia
| | - C T Lin
- School of Computer Science, University of Technology Sydney, Broadway, Sydney 2007, Australia
| | - Michael Blumenstein
- School of Computer Science, University of Technology Sydney, Broadway, Sydney 2007, Australia
| |
Collapse
|
4
|
Rong P. DDNet: Depth Dominant Network for Semantic Segmentation of RGB-D Images. SENSORS (BASEL, SWITZERLAND) 2024; 24:6914. [PMID: 39517812 PMCID: PMC11548045 DOI: 10.3390/s24216914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/19/2024] [Revised: 10/12/2024] [Accepted: 10/23/2024] [Indexed: 11/16/2024]
Abstract
Convolutional neural networks (CNNs) have been widely applied to parse indoor scenes and segment objects represented by color images. Nonetheless, the lack of geometric and context information is a problem for most RGB-based methods, with which depth features are only used as an auxiliary module in RGB-D semantic segmentation. In this study, a novel depth dominant network (DDNet) is proposed to fully utilize the rich context information in the depth map. The critical insight is that obvious geometric information from the depth image is more conducive to segmentation than RGB data. Compared with other methods, DDNet is a depth-based network with two branches of CNNs to extract color and depth features. As the core of the encoder network, the depth branch is given a larger fusion weight to extract geometric information, while semantic information and complementary geometric information are provided by the color branch for the depth feature maps. The effectiveness of our proposed depth-based architecture has been demonstrated by comprehensive experimental evaluations and ablation studies on challenging RGB-D semantic segmentation benchmarks, including NYUv2 and a subset of ScanNetv2.
Collapse
Affiliation(s)
- Peizhi Rong
- Division of Science, Engineering and Health Studies, School of Professional Education and Executive Development, The Hong Kong Polytechnic University, Hong Kong 999077, China
| |
Collapse
|
5
|
Li K, Geng Q, Wan M, Cao X, Zhou Z. Context and Spatial Feature Calibration for Real-Time Semantic Segmentation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:5465-5477. [PMID: 37773909 DOI: 10.1109/tip.2023.3318967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/01/2023]
Abstract
Context modeling or multi-level feature fusion methods have been proved to be effective in improving semantic segmentation performance. However, they are not specialized to deal with the problems of pixel-context mismatch and spatial feature misalignment, and the high computational complexity hinders their widespread application in real-time scenarios. In this work, we propose a lightweight Context and Spatial Feature Calibration Network (CSFCN) to address the above issues with pooling-based and sampling-based attention mechanisms. CSFCN contains two core modules: Context Feature Calibration (CFC) module and Spatial Feature Calibration (SFC) module. CFC adopts a cascaded pyramid pooling module to efficiently capture nested contexts, and then aggregates private contexts for each pixel based on pixel-context similarity to realize context feature calibration. SFC splits features into multiple groups of sub-features along the channel dimension and propagates sub-features therein by the learnable sampling to achieve spatial feature calibration. Extensive experiments on the Cityscapes and CamVid datasets illustrate that our method achieves a state-of-the-art trade-off between speed and accuracy. Concretely, our method achieves 78.7% mIoU with 70.0 FPS and 77.8% mIoU with 179.2 FPS on the Cityscapes and CamVid test sets, respectively. The code is available at https://nave.vr3i.com/ and https://github.com/kaigelee/CSFCN.
Collapse
|
6
|
Zhang C, Xu F, Wu C, Li J. Lightweight semantic segmentation network with configurable context and small object attention. Front Comput Neurosci 2023; 17:1280640. [PMID: 37937062 PMCID: PMC10626006 DOI: 10.3389/fncom.2023.1280640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Accepted: 10/02/2023] [Indexed: 11/09/2023] Open
Abstract
The current semantic segmentation algorithms suffer from encoding feature distortion and small object feature loss. Context information exchange can effectively address the feature distortion problem, but it has the issue of fixed spatial range. Maintaining the input feature resolution can reduce the loss of small object information but would slow down the network's operation speed. To tackle these problems, we propose a lightweight semantic segmentation network with configurable context and small object attention (CCSONet). CCSONet includes a long-short distance configurable context feature enhancement module (LSCFEM) and a small object attention decoding module (SOADM). The LSCFEM differs from the regular context exchange module by configuring long and short-range relevant features for the current feature, providing a broader and more flexible spatial range. The SOADM enhances the features of small objects by establishing correlations among objects of the same category, avoiding the introduction of redundancy issues caused by high-resolution features. On the Cityscapes and Camvid datasets, our network achieves the accuracy of 76.9 mIoU and 73.1 mIoU, respectively, while maintaining speeds of 87 FPS and 138 FPS. It outperforms other lightweight semantic segmentation algorithms in terms of accuracy.
Collapse
Affiliation(s)
- Chunyu Zhang
- Faculty of Robot Science and Engineering, Northeastern University, Shenyang, China
| | - Fang Xu
- Shenyang Siasun Robot & Automation Company Ltd., Shenyang, China
| | - Chengdong Wu
- Faculty of Robot Science and Engineering, Northeastern University, Shenyang, China
| | - Jinzhao Li
- Changchun Institute of Optics, Fine Mechanics and Physics, University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
7
|
Qi W, Wu HC, Chan SC. MDF-Net: A Multi-Scale Dynamic Fusion Network for Breast Tumor Segmentation of Ultrasound Images. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:4842-4855. [PMID: 37639409 DOI: 10.1109/tip.2023.3304518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
Breast tumor segmentation of ultrasound images provides valuable information of tumors for early detection and diagnosis. Accurate segmentation is challenging due to low image contrast between areas of interest; speckle noises, and large inter-subject variations in tumor shape and size. This paper proposes a novel Multi-scale Dynamic Fusion Network (MDF-Net) for breast ultrasound tumor segmentation. It employs a two-stage end-to-end architecture with a trunk sub-network for multiscale feature selection and a structurally optimized refinement sub-network for mitigating impairments such as noise and inter-subject variation via better feature exploration and fusion. The trunk network is extended from UNet++ with a simplified skip pathway structure to connect the features between adjacent scales. Moreover, deep supervision at all scales, instead of at the finest scale in UNet++, is proposed to extract more discriminative features and mitigate errors from speckle noise via a hybrid loss function. Unlike previous works, the first stage is linked to a loss function of the second stage so that both the preliminary segmentations and refinement subnetworks can be refined together at training. The refinement sub-network utilizes a structurally optimized MDF mechanism to integrate preliminary segmentation information (capturing general tumor shape and size) at coarse scales and explores inter-subject variation information at finer scales. Experimental results from two public datasets show that the proposed method achieves better Dice and other scores over state-of-the-art methods. Qualitative analysis also indicates that our proposed network is more robust to tumor size/shapes, speckle noise and heavy posterior shadows along tumor boundaries. An optional post-processing step is also proposed to facilitate users in mitigating segmentation artifacts. The efficiency of the proposed network is also illustrated on the "Electron Microscopy neural structures segmentation dataset". It outperforms a state-of-the-art algorithm based on UNet-2022 with simpler settings. This indicates the advantages of our MDF-Nets in other challenging image segmentation tasks with small to medium data sizes.
Collapse
|
8
|
Yang Z, Zhang C, Li R, Xu Y, Lin G. Efficient Few-Shot Object Detection via Knowledge Inheritance. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 32:321-334. [PMID: 37015553 DOI: 10.1109/tip.2022.3228162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Few-shot object detection (FSOD), which aims at learning a generic detector that can adapt to unseen tasks with scarce training samples, has witnessed consistent improvement recently. However, most existing methods ignore the efficiency issues, e.g., high computational complexity and slow adaptation speed. Notably, efficiency has become an increasingly important evaluation metric for few-shot techniques due to an emerging trend toward embedded AI. To this end, we present an efficient pretrain-transfer framework (PTF) baseline with no computational increment, which achieves comparable results with previous state-of-the-art (SOTA) methods. Upon this baseline, we devise an initializer named knowledge inheritance (KI) to reliably initialize the novel weights for the box classifier, which effectively facilitates the knowledge transfer process and boosts the adaptation speed. Within the KI initializer, we propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights. Finally, our approach not only achieves the SOTA results across three public benchmarks, i.e., PASCAL VOC, COCO and LVIS, but also exhibits high efficiency with $1.8-100\times $ faster adaptation speed against the other methods on COCO/LVIS benchmark during few-shot transfer. To our best knowledge, this is the first work to consider the efficiency problem in FSOD. We hope to motivate a trend toward powerful yet efficient few-shot technique development. The codes are publicly available at https://github.com/Ze-Yang/Efficient-FSOD.
Collapse
|
9
|
Zhang C, Xu F, Wu C, Xu C. A lightweight multi-dimension dynamic convolutional network for real-time semantic segmentation. Front Neurorobot 2022; 16:1075520. [PMID: 36590086 PMCID: PMC9797588 DOI: 10.3389/fnbot.2022.1075520] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Accepted: 11/25/2022] [Indexed: 12/23/2022] Open
Abstract
Semantic segmentation can address the perceived needs of autonomous driving and micro-robots and is one of the challenging tasks in computer vision. From the application point of view, the difficulty faced by semantic segmentation is how to satisfy inference speed, network parameters, and segmentation accuracy at the same time. This paper proposes a lightweight multi-dimensional dynamic convolutional network (LMDCNet) for real-time semantic segmentation to address this problem. At the core of our architecture is Multidimensional Dynamic Convolution (MDy-Conv), which uses an attention mechanism and factorial convolution to remain efficient while maintaining remarkable accuracy. Specifically, LMDCNet belongs to an asymmetric network architecture. Therefore, we design an encoder module containing MDy-Conv convolution: MS-DAB. The success of this module is attributed to the use of MDy-Conv convolution, which increases the utilization of local and contextual information of features. Furthermore, we design a decoder module containing a feature pyramid and attention: SC-FP, which performs a multi-scale fusion of features accompanied by feature selection. On the Cityscapes and CamVid datasets, LMDCNet achieves accuracies of 73.8 mIoU and 69.6 mIoU at 71.2 FPS and 92.4 FPS, respectively, without pre-training or post-processing. Our designed LMDCNet is trained and inferred only on one 1080Ti GPU. Our experiments show that LMDCNet achieves a good balance between segmentation accuracy and network parameters with only 1.05 M.
Collapse
Affiliation(s)
- Chunyu Zhang
- Faculty of Robot Science and Engineering, Northeastern University, Shenyang, China
| | - Fang Xu
- Shenyang Siasun Robot & Automation Company Ltd., Shenyang, China
| | - Chengdong Wu
- Faculty of Robot Science and Engineering, Northeastern University, Shenyang, China
| | - Chenglong Xu
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, China
| |
Collapse
|
10
|
Wu L, Zhuang J, Chen W, Tang Y, Hou C, Li C, Zhong Z, Luo S. Data augmentation based on multiple oversampling fusion for medical image segmentation. PLoS One 2022; 17:e0274522. [PMID: 36256637 PMCID: PMC9578635 DOI: 10.1371/journal.pone.0274522] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2022] [Accepted: 08/28/2022] [Indexed: 11/18/2022] Open
Abstract
A high-performance medical image segmentation model based on deep learning depends on the availability of large amounts of annotated training data. However, it is not trivial to obtain sufficient annotated medical images. Generally, the small size of most tissue lesions, e.g., pulmonary nodules and liver tumours, could worsen the class imbalance problem in medical image segmentation. In this study, we propose a multidimensional data augmentation method combining affine transform and random oversampling. The training data is first expanded by affine transformation combined with random oversampling to improve the prior data distribution of small objects and the diversity of samples. Secondly, class weight balancing is used to avoid having biased networks since the number of background pixels is much higher than the lesion pixels. The class imbalance problem is solved by utilizing weighted cross-entropy loss function during the training of the CNN model. The LUNA16 and LiTS17 datasets were introduced to evaluate the performance of our works, where four deep neural network models, Mask-RCNN, U-Net, SegNet and DeepLabv3+, were adopted for small tissue lesion segmentation in CT images. In addition, the small tissue segmentation performance of the four different deep learning architectures on both datasets could be greatly improved by incorporating the data augmentation strategy. The best pixelwise segmentation performance for both pulmonary nodules and liver tumours was obtained by the Mask-RCNN model, with DSC values of 0.829 and 0.879, respectively, which were similar to those of state-of-the-art methods.
Collapse
Affiliation(s)
- Liangsheng Wu
- Academy of Interdisciplinary Studies, Guangdong Polytechnic Normal University, Guangzhou, China
- Academy of Contemporary Agriculture Engineering Innovations, Zhongkai University of Agriculture and Engineering, Guangzhou, China
- Institute of Intelligent Manufacturing, Guangdong Academy of Sciences, Guangzhou, China
| | - Jiajun Zhuang
- Academy of Contemporary Agriculture Engineering Innovations, Zhongkai University of Agriculture and Engineering, Guangzhou, China
| | - Weizhao Chen
- Academy of Interdisciplinary Studies, Guangdong Polytechnic Normal University, Guangzhou, China
| | - Yu Tang
- Academy of Interdisciplinary Studies, Guangdong Polytechnic Normal University, Guangzhou, China
| | - Chaojun Hou
- Academy of Contemporary Agriculture Engineering Innovations, Zhongkai University of Agriculture and Engineering, Guangzhou, China
| | - Chentong Li
- Institute of Intelligent Manufacturing, Guangdong Academy of Sciences, Guangzhou, China
| | - Zhenyu Zhong
- Institute of Intelligent Manufacturing, Guangdong Academy of Sciences, Guangzhou, China
| | - Shaoming Luo
- Academy of Interdisciplinary Studies, Guangdong Polytechnic Normal University, Guangzhou, China
| |
Collapse
|
11
|
Liu E, Gold KM, Combs D, Cadle-Davidson L, Jiang Y. Deep semantic segmentation for the quantification of grape foliar diseases in the vineyard. FRONTIERS IN PLANT SCIENCE 2022; 13:978761. [PMID: 36161031 PMCID: PMC9501698 DOI: 10.3389/fpls.2022.978761] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Accepted: 08/18/2022] [Indexed: 06/16/2023]
Abstract
Plant disease evaluation is crucial to pathogen management and plant breeding. Human field scouting has been widely used to monitor disease progress and provide qualitative and quantitative evaluation, which is costly, laborious, subjective, and often imprecise. To improve disease evaluation accuracy, throughput, and objectiveness, an image-based approach with a deep learning-based analysis pipeline was developed to calculate infection severity of grape foliar diseases. The image-based approach used a ground imaging system for field data acquisition, consisting of a custom stereo camera with strobe light for consistent illumination and real time kinematic (RTK) GPS for accurate localization. The deep learning-based pipeline used the hierarchical multiscale attention semantic segmentation (HMASS) model for disease infection segmentation, color filtering for grapevine canopy segmentation, and depth and location information for effective region masking. The resultant infection, canopy, and effective region masks were used to calculate the severity rate of disease infections in an image sequence collected in a given unit (e.g., grapevine panel). Fungicide trials for grape downy mildew (DM) and powdery mildew (PM) were used as case studies to evaluate the developed approach and pipeline. Experimental results showed that the HMASS model achieved acceptable to good segmentation accuracy of DM (mIoU > 0.84) and PM (mIoU > 0.74) infections in testing images, demonstrating the model capability for symptomatic disease segmentation. With the consistent image quality and multimodal metadata provided by the imaging system, the color filter and overlapping region removal could accurately and reliably segment grapevine canopies and identify repeatedly imaged regions between consecutive image frames, leading to critical information for infection severity calculation. Image-derived severity rates were highly correlated (r > 0.95) with human-assessed values, and had comparable statistical power in differentiating fungicide treatment efficacy in both case studies. Therefore, the developed approach and pipeline can be used as an effective and efficient tool to quantify the severity of foliar disease infections, enabling objective, high-throughput disease evaluation for fungicide trial evaluation, genetic mapping, and breeding programs.
Collapse
Affiliation(s)
- Ertai Liu
- Department of Biological and Environmental Engineering, Cornell University, Ithaca, NY, United States
| | - Kaitlin M. Gold
- Plant Pathology and Plant-Microbe Biology Section, School of Integrative Plant Science, Cornell AgriTech, Cornell University, Geneva, NY, United States
| | - David Combs
- Plant Pathology and Plant-Microbe Biology Section, School of Integrative Plant Science, Cornell AgriTech, Cornell University, Geneva, NY, United States
| | - Lance Cadle-Davidson
- Plant Pathology and Plant-Microbe Biology Section, School of Integrative Plant Science, Cornell AgriTech, Cornell University, Geneva, NY, United States
- Grape Genetics Research Unit, United States Department of Agriculture-Agricultural Research Service, Geneva, NY, United States
| | - Yu Jiang
- Horticulture Section, School of Integrative Plant Science, Cornell AgriTech, Cornell University, Geneva, NY, United States
| |
Collapse
|
12
|
Yang Z, Yu H, He Y, Sun W, Mao ZH, Mian A. Fully Convolutional Network-Based Self-Supervised Learning for Semantic Segmentation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; PP:132-142. [PMID: 35544492 DOI: 10.1109/tnnls.2022.3172423] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Although deep learning has achieved great success in many computer vision tasks, its performance relies on the availability of large datasets with densely annotated samples. Such datasets are difficult and expensive to obtain. In this article, we focus on the problem of learning representation from unlabeled data for semantic segmentation. Inspired by two patch-based methods, we develop a novel self-supervised learning framework by formulating the jigsaw puzzle problem as a patch-wise classification problem and solving it with a fully convolutional network. By learning to solve a jigsaw puzzle comprising 25 patches and transferring the learned features to semantic segmentation task, we achieve a 5.8% point improvement on the Cityscapes dataset over the baseline model initialized from random values. It is noted that we use only about 1/6 training images of Cityscapes in our experiment, which is designed to imitate the real cases where fully annotated images are usually limited to a small number. We also show that our self-supervised learning method can be applied to different datasets and models. In particular, we achieved competitive performance with the state-of-the-art methods on the PASCAL VOC2012 dataset using significantly fewer time costs on pretraining.
Collapse
|
13
|
Jiang TY, Ju FL, Dai YX, Li J, Li YF, Bai YJ, Cui ZQ, Xu ZH, Zhang ZQ. Real-Time Tracking of Object Melting Based on Enhanced DeepLab v3+ Network. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:2309317. [PMID: 35401724 PMCID: PMC8986418 DOI: 10.1155/2022/2309317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Revised: 02/13/2022] [Accepted: 02/24/2022] [Indexed: 11/23/2022]
Abstract
In order to reveal the dissolution behavior of iron tailings in blast furnace slag, the main component of iron tailings, SiO2, was used for research. Aiming at the problem of information loss and inaccurate extraction of tracking molten SiO2 particles in high temperature, a method based on the improved DeepLab v3+ network was proposed to track, segment, and extract small object particles in real time. First, by improving the decoding layer of the DeepLab v3+ network, construct dense ASPP (atrous spatial pyramid pooling) modules with different dilation rates to optimize feature extraction, increase the shallow convolution of the backbone network, and merge it into the upper convolution decoding part to increase detailed capture. Secondly, integrate the lightweight network MobileNet v3 to reduce network parameters, further speed up image detection, and reduce the memory usage to achieve real-time image segmentation and adapt to low-level configuration hardware. Finally, improve the expression of the loss function for the binary classification model of small object in this paper, combining the advantages of the Dice Loss binary classification segmentation and the Focal Loss balance of positive and negative samples, solving the problem of unbalanced dataset caused by the small proportion of positive samples. Experimental results show that MIoU (mean intersection over union) of the proposed model for small object segmentation is 6% higher than that of the original model, the overall MIoU is increased by 3%, and the execution time and memory consumption are only half of the original model, which can be well applied to real-time tracking and segmentation of small particles.
Collapse
Affiliation(s)
- Tian-yu Jiang
- Hebei Engineering Research Center for the Intelligentization of Iron Ore Optimization and Ironmaking Raw Materials Preparation Processes, North China University of Science and Technology, Tangshan, Hebei 063210, China
- Hebei Key Laboratory of Data Science and Application, North China University of Science and Technology, Tangshan, Hebei 063210, China
- Key Laboratory of Engineering Computing, North China University of Science and Technology, Tangshan, Hebei 063210, China
- Tangshan Intelligent Industry and Image Processing Technology Innovation Center, North China University of Science and Technology, Tangshan, Hebei 063210, China
| | - Feng-lan Ju
- College of Metallurgy and Energy, North China University of Science and Technology, Tangshan, Hebei 063210, China
| | - Ya-xun Dai
- College of Metallurgy and Energy, North China University of Science and Technology, Tangshan, Hebei 063210, China
| | - Jie Li
- Hebei Engineering Research Center for the Intelligentization of Iron Ore Optimization and Ironmaking Raw Materials Preparation Processes, North China University of Science and Technology, Tangshan, Hebei 063210, China
- Hebei Key Laboratory of Data Science and Application, North China University of Science and Technology, Tangshan, Hebei 063210, China
- Key Laboratory of Engineering Computing, North China University of Science and Technology, Tangshan, Hebei 063210, China
- Tangshan Intelligent Industry and Image Processing Technology Innovation Center, North China University of Science and Technology, Tangshan, Hebei 063210, China
| | - Yi-fan Li
- Hebei Engineering Research Center for the Intelligentization of Iron Ore Optimization and Ironmaking Raw Materials Preparation Processes, North China University of Science and Technology, Tangshan, Hebei 063210, China
- Hebei Key Laboratory of Data Science and Application, North China University of Science and Technology, Tangshan, Hebei 063210, China
- Key Laboratory of Engineering Computing, North China University of Science and Technology, Tangshan, Hebei 063210, China
- Tangshan Intelligent Industry and Image Processing Technology Innovation Center, North China University of Science and Technology, Tangshan, Hebei 063210, China
| | - Yun-jie Bai
- Hebei Engineering Research Center for the Intelligentization of Iron Ore Optimization and Ironmaking Raw Materials Preparation Processes, North China University of Science and Technology, Tangshan, Hebei 063210, China
- Hebei Key Laboratory of Data Science and Application, North China University of Science and Technology, Tangshan, Hebei 063210, China
- Key Laboratory of Engineering Computing, North China University of Science and Technology, Tangshan, Hebei 063210, China
- Tangshan Intelligent Industry and Image Processing Technology Innovation Center, North China University of Science and Technology, Tangshan, Hebei 063210, China
| | - Ze-qian Cui
- Hebei Engineering Research Center for the Intelligentization of Iron Ore Optimization and Ironmaking Raw Materials Preparation Processes, North China University of Science and Technology, Tangshan, Hebei 063210, China
- Hebei Key Laboratory of Data Science and Application, North China University of Science and Technology, Tangshan, Hebei 063210, China
- Key Laboratory of Engineering Computing, North China University of Science and Technology, Tangshan, Hebei 063210, China
- Tangshan Intelligent Industry and Image Processing Technology Innovation Center, North China University of Science and Technology, Tangshan, Hebei 063210, China
| | - Zheng-han Xu
- Hebei Engineering Research Center for the Intelligentization of Iron Ore Optimization and Ironmaking Raw Materials Preparation Processes, North China University of Science and Technology, Tangshan, Hebei 063210, China
- Hebei Key Laboratory of Data Science and Application, North China University of Science and Technology, Tangshan, Hebei 063210, China
- Key Laboratory of Engineering Computing, North China University of Science and Technology, Tangshan, Hebei 063210, China
- Tangshan Intelligent Industry and Image Processing Technology Innovation Center, North China University of Science and Technology, Tangshan, Hebei 063210, China
| | - Zun-Qian Zhang
- Hebei Engineering Research Center for the Intelligentization of Iron Ore Optimization and Ironmaking Raw Materials Preparation Processes, North China University of Science and Technology, Tangshan, Hebei 063210, China
- Hebei Key Laboratory of Data Science and Application, North China University of Science and Technology, Tangshan, Hebei 063210, China
- Key Laboratory of Engineering Computing, North China University of Science and Technology, Tangshan, Hebei 063210, China
- Tangshan Intelligent Industry and Image Processing Technology Innovation Center, North China University of Science and Technology, Tangshan, Hebei 063210, China
| |
Collapse
|
14
|
Zhang X, Du B, Wu Z, Wan T. LAANet: lightweight attention-guided asymmetric network for real-time semantic segmentation. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-06932-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
15
|
Hu X, Jing L, Sehar U. Joint pyramid attention network for real-time semantic segmentation of urban scenes. APPL INTELL 2022. [DOI: 10.1007/s10489-021-02446-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
16
|
He JY, Liang SH, Wu X, Zhao B, Zhang L. MGSeg: Multiple Granularity-Based Real-Time Semantic Segmentation Network. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:7200-7214. [PMID: 34375283 DOI: 10.1109/tip.2021.3102509] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Recent works on semantic segmentation witness significant performance improvement by utilizing global contextual information. In this paper, an efficient multi-granularity based semantic segmentation network (MGSeg) is proposed for real-time semantic segmentation, by modeling the latent relevance between multi-scale geometric details and high-level semantics for fine granularity segmentation. In particular, a light-weight backbone ResNet-18 is first adopted to produce the hierarchical features. Hybrid Attention Feature Aggregation (HAFA) is designed to filter the noisy spatial details of features, acquire the scale-invariance representation, and alleviate the gradient vanishing problem of the early-stage feature learning. After aggregating the learned features, Fine Granularity Refinement (FGR) module is employed to explicitly model the relationship between the multi-level features and categories, generating proper weights for fusion. More importantly, to meet the real-time processing, a series of light-weight strategies and simplified structures are applied to accelerate the efficiency, including light-weight backbone, channel compression, narrow neck structure, and so on. Extensive experiments conducted on benchmark datasets Cityscapes and CamVid demonstrate that the proposed method achieves the state-of-the-art performance, 77.8%@50fps and 72.7%@127fps on Cityscapes and CamVid datasets, respectively, having the capability for real-time applications.
Collapse
|
17
|
Yang Z, Yu H, Cao S, Xu Q, Yuan D, Zhang H, Jia W, Mao ZH, Sun M. Human-Mimetic Estimation of Food Volume from a Single-View RGB Image Using an AI System. ELECTRONICS 2021; 10:1556. [PMID: 34552763 PMCID: PMC8455030 DOI: 10.3390/electronics10131556] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
It is well known that many chronic diseases are associated with unhealthy diet. Although improving diet is critical, adopting a healthy diet is difficult despite its benefits being well understood. Technology is needed to allow an assessment of dietary intake accurately and easily in real-world settings so that effective intervention to manage being overweight, obesity, and related chronic diseases can be developed. In recent years, new wearable imaging and computational technologies have emerged. These technologies are capable of performing objective and passive dietary assessments with a much simplified procedure than traditional questionnaires. However, a critical task is required to estimate the portion size (in this case, the food volume) from a digital image. Currently, this task is very challenging because the volumetric information in the two-dimensional images is incomplete, and the estimation involves a great deal of imagination, beyond the capacity of the traditional image processing algorithms. In this work, we present a novel Artificial Intelligent (AI) system to mimic the thinking of dietitians who use a set of common objects as gauges (e.g., a teaspoon, a golf ball, a cup, and so on) to estimate the portion size. Specifically, our human-mimetic system "mentally" gauges the volume of food using a set of internal reference volumes that have been learned previously. At the output, our system produces a vector of probabilities of the food with respect to the internal reference volumes. The estimation is then completed by an "intelligent guess", implemented by an inner product between the probability vector and the reference volume vector. Our experiments using both virtual and real food datasets have shown accurate volume estimation results.
Collapse
Affiliation(s)
- Zhengeng Yang
- College of Electrical and Information Engineering, Hunan University, Changsha 410082, China
- Department of Neurosurgery, University of Pittsburgh, Pittsburgh, PA 15260, USA
| | - Hongshan Yu
- College of Electrical and Information Engineering, Hunan University, Changsha 410082, China
| | - Shunxin Cao
- Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, PA 15260, USA
| | - Qi Xu
- School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Ding Yuan
- Image Processing Center, Beihang University, Beijing 100191, China
| | - Hong Zhang
- Image Processing Center, Beihang University, Beijing 100191, China
| | - Wenyan Jia
- Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, PA 15260, USA
| | - Zhi-Hong Mao
- Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, PA 15260, USA
- Department of Bioengineering, University of Pittsburgh, Pittsburgh, PA 15260, USA
| | - Mingui Sun
- Department of Neurosurgery, University of Pittsburgh, Pittsburgh, PA 15260, USA
- Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, PA 15260, USA
- Department of Bioengineering, University of Pittsburgh, Pittsburgh, PA 15260, USA
| |
Collapse
|