1
|
Zhao L, Wang T, Chen Y, Zhang X, Tang H, Lin F, Li C, Li Q, Tan T, Kang D, Tong T. A novel framework for segmentation of small targets in medical images. Sci Rep 2025; 15:9924. [PMID: 40121297 PMCID: PMC11929788 DOI: 10.1038/s41598-025-94437-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2024] [Accepted: 03/13/2025] [Indexed: 03/25/2025] Open
Abstract
Medical image segmentation represents a pivotal and intricate procedure in the domain of medical image processing and analysis. With the progression of artificial intelligence in recent years, the utilization of deep learning techniques for medical image segmentation has witnessed escalating popularity. Nevertheless, the intricate nature of medical image poses challenges on the segmentation of diminutive targets is still in its early stages. Current networks encounter difficulties in addressing the segmentation of exceedingly small targets, especially when the number of training samples is limited. To overcome this constraint, we have implemented a proficient strategy to enhance lesion images containing small targets and constrained samples. We introduce a segmentation framework termed STS-Net, specifically designed for small target segmentation. This framework leverages the established capacity of convolutional neural networks to acquire effective image representations. The proposed STS-Net network adopts a ResNeXt50-32x4d architecture as the encoder, integrating attention mechanisms during the encoding phase to amplify the feature representation capabilities of the network. We evaluated the proposed network on four publicly available datasets. Experimental results underscore the superiority of our approach in the domain of medical image segmentation, particularly for small target segmentation. The codes are available at https://github.com/zlxokok/STSNet .
Collapse
Affiliation(s)
- Longxuan Zhao
- College of Physics and Information Engineering, Fuzhou University, Fuzhou, 350100, China.
- Fujian Key Lab of Medical Instrumentation and Pharmaceutical Technology, Fuzhou, 350100, China.
| | - Tao Wang
- College of Physics and Information Engineering, Fuzhou University, Fuzhou, 350100, China
- Fujian Key Lab of Medical Instrumentation and Pharmaceutical Technology, Fuzhou, 350100, China
| | - Yuanbin Chen
- College of Physics and Information Engineering, Fuzhou University, Fuzhou, 350100, China
- Fujian Key Lab of Medical Instrumentation and Pharmaceutical Technology, Fuzhou, 350100, China
| | - Xinlin Zhang
- College of Physics and Information Engineering, Fuzhou University, Fuzhou, 350100, China
- Fujian Key Lab of Medical Instrumentation and Pharmaceutical Technology, Fuzhou, 350100, China
- Imperial Vision Technology, Fuzhou, 350100, China
| | - Hui Tang
- College of Physics and Information Engineering, Fuzhou University, Fuzhou, 350100, China
- Fujian Key Lab of Medical Instrumentation and Pharmaceutical Technology, Fuzhou, 350100, China
| | - Fuxin Lin
- Department of Neurosurgery, Neurosurgery Research Institute, The First Affiliated Hospital, Fujian Medical University, Fuzhou, 350100, China
- Department of Neurosurgery, National Regional Medical Center, Binhai Campus of the First Affiliated Hospital, Fujian Medical University, Fuzhou, 350100, China
- Department of Neurosurgery, Fujian Institute of Brain Disorders and Brain Science, Fujian Clinical Research Center for Neurological Diseases, The First Affiliated Hospital and Neurosurgery Research Institute, Fujian Medical University, Fuzhou, 350100, China
- Fujian Provincial Clinical Research Center for Neurological Diseases, The First Affiliated Hospital, Fujian Medical University, Fuzhou, 350100, China
- Clinical Research and Translation Center, The First Affiliated Hospital, Fujian Medical University, Fuzhou, 350100, China
| | - Chunwang Li
- Department of Neurosurgery, Neurosurgery Research Institute, The First Affiliated Hospital, Fujian Medical University, Fuzhou, 350100, China
- Department of Neurosurgery, National Regional Medical Center, Binhai Campus of the First Affiliated Hospital, Fujian Medical University, Fuzhou, 350100, China
| | - Qixuan Li
- Department of Neurosurgery, Neurosurgery Research Institute, The First Affiliated Hospital, Fujian Medical University, Fuzhou, 350100, China
- Department of Neurosurgery, National Regional Medical Center, Binhai Campus of the First Affiliated Hospital, Fujian Medical University, Fuzhou, 350100, China
| | - Tao Tan
- Macao Polytechnic University, Macao, 999078, China
| | - Dezhi Kang
- Department of Neurosurgery, Neurosurgery Research Institute, The First Affiliated Hospital, Fujian Medical University, Fuzhou, 350100, China.
- Department of Neurosurgery, National Regional Medical Center, Binhai Campus of the First Affiliated Hospital, Fujian Medical University, Fuzhou, 350100, China.
- Department of Neurosurgery, Fujian Institute of Brain Disorders and Brain Science, Fujian Clinical Research Center for Neurological Diseases, The First Affiliated Hospital and Neurosurgery Research Institute, Fujian Medical University, Fuzhou, 350100, China.
- Fujian Provincial Clinical Research Center for Neurological Diseases, The First Affiliated Hospital, Fujian Medical University, Fuzhou, 350100, China.
- Clinical Research and Translation Center, The First Affiliated Hospital, Fujian Medical University, Fuzhou, 350100, China.
| | - Tong Tong
- College of Physics and Information Engineering, Fuzhou University, Fuzhou, 350100, China.
- Fujian Key Lab of Medical Instrumentation and Pharmaceutical Technology, Fuzhou, 350100, China.
- Imperial Vision Technology, Fuzhou, 350100, China.
| |
Collapse
|
2
|
Liu W, Kang X, Duan P, Xie Z, Wei X, Li S. SOSNet: Real-Time Small Object Segmentation via Hierarchical Decoding and Example Mining. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:3071-3083. [PMID: 38090866 DOI: 10.1109/tnnls.2023.3338732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
Real-time semantic segmentation plays an important role in auto vehicles. However, most real-time small object segmentation methods fail to obtain satisfactory performance on small objects, such as cars and sign symbols, since the large objects usually tend to devote more to the segmentation result. To solve this issue, we propose an efficient and effective architecture, termed small objects segmentation network (SOSNet), to improve the segmentation performance of small objects. The SOSNet works from two perspectives: methodology and data. Specifically, with the former, we propose a dual-branch hierarchical decoder (DBHD) which is viewed as a small-object sensitive segmentation head. The DBHD consists of a top segmentation head that predicts whether the pixels belong to a small object class and a bottom one that estimates the pixel class. In this situation, the latent correlation among small objects can be fully explored. With the latter, we propose a small object example mining (SOEM) algorithm for balancing examples between small objects and large objects automatically. The core idea of the proposed SOEM is that most of the hard examples on small-object classes are reserved for training while most of the easy examples on large-object classes are banned. Experiments on three commonly used datasets show that the proposed SOSNet architecture greatly improves the accuracy compared to the existing real-time semantic segmentation methods while keeping efficiency. The code will be available at https://github.com/StuLiu/SOSNet.
Collapse
|
3
|
Qiu D, Ju J, Ren S, Zhang T, Tu H, Tan X, Xie F. A deep learning-based cascade algorithm for pancreatic tumor segmentation. Front Oncol 2024; 14:1328146. [PMID: 39169945 PMCID: PMC11335681 DOI: 10.3389/fonc.2024.1328146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 07/08/2024] [Indexed: 08/23/2024] Open
Abstract
Pancreatic tumors are small in size, diverse in shape, and have low contrast and high texture similarity with surrounding tissue. As a result, the segmentation model is easily confused by complex and changeable background information, leading to inaccurate positioning of small targets and false positives and false negatives. Therefore, we design a cascaded pancreatic tumor segmentation algorithm. In the first stage, we use a general multi-scale U-Net to segment the pancreas, and we exploit a multi-scale segmentation network based on non-local localization and focusing modules to segment pancreatic tumors in the second stage. The non-local localization module learns channel and spatial position information, searches for the approximate area where the pancreatic tumor is located from a global perspective, and obtains the initial segmentation results. The focusing module conducts context exploration based on foreground features (or background features), detects and removes false positive (or false negative) interference, and obtains more accurate segmentation results based on the initial segmentation. In addition, we design a new loss function to alleviate the insensitivity to small targets. Experimental results show that the proposed algorithm can more accurately locate pancreatic tumors of different sizes, and the Dice coefficient outperforms the existing state-of-the-art segmentation model. The code will be available at https://github.com/HeyJGJu/Pancreatic-Tumor-SEG.
Collapse
Affiliation(s)
- Dandan Qiu
- School of Information Science and Technology, Northwest University, Xi’an, Shaanxi, China
| | - Jianguo Ju
- School of Information Science and Technology, Northwest University, Xi’an, Shaanxi, China
| | - Shumin Ren
- School of Information Science and Technology, Northwest University, Xi’an, Shaanxi, China
| | - Tongtong Zhang
- School of Information Science and Technology, Northwest University, Xi’an, Shaanxi, China
| | - Huijuan Tu
- Department of Radiology, Kunshan Hospital of Chinese Medicine, Kunshan, Jiangsu, China
| | - Xin Tan
- School of Information Science and Technology, Northwest University, Xi’an, Shaanxi, China
| | - Fei Xie
- College of Computer Science and Technology, Xidian University, Xi’an, Shaanxi, China
| |
Collapse
|
4
|
Zhu Y, Li C, Liu Y, Wang X, Tang J, Luo B, Huang Z. Tiny Object Tracking: A Large-Scale Dataset and a Baseline. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:10273-10287. [PMID: 37022390 DOI: 10.1109/tnnls.2023.3239529] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Tiny objects, frequently appearing in practical applications, have weak appearance and features, and receive increasing interests in many vision tasks, such as object detection and segmentation. To promote the research and development of tiny object tracking, we create a large-scale video dataset, which contains 434 sequences with a total of more than 217K frames. Each frame is carefully annotated with a high-quality bounding box. In data creation, we take 12 challenge attributes into account to cover a broad range of viewpoints and scene complexities, and annotate these attributes for facilitating the attribute-based performance analysis. To provide a strong baseline in tiny object tracking, we propose a novel multilevel knowledge distillation network (MKDNet), which pursues three-level knowledge distillations in a unified framework to effectively enhance the feature representation, discrimination, and localization abilities in tracking tiny objects. Extensive experiments are performed on the proposed dataset, and the results prove the superiority and effectiveness of MKDNet compared with state-of-the-art methods. The dataset, the algorithm code, and the evaluation code are available at https://github.com/mmic-lcl/Datasets-and-benchmark-code.
Collapse
|
5
|
Zhang C, Xu F, Wu C, Li J. Lightweight semantic segmentation network with configurable context and small object attention. Front Comput Neurosci 2023; 17:1280640. [PMID: 37937062 PMCID: PMC10626006 DOI: 10.3389/fncom.2023.1280640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Accepted: 10/02/2023] [Indexed: 11/09/2023] Open
Abstract
The current semantic segmentation algorithms suffer from encoding feature distortion and small object feature loss. Context information exchange can effectively address the feature distortion problem, but it has the issue of fixed spatial range. Maintaining the input feature resolution can reduce the loss of small object information but would slow down the network's operation speed. To tackle these problems, we propose a lightweight semantic segmentation network with configurable context and small object attention (CCSONet). CCSONet includes a long-short distance configurable context feature enhancement module (LSCFEM) and a small object attention decoding module (SOADM). The LSCFEM differs from the regular context exchange module by configuring long and short-range relevant features for the current feature, providing a broader and more flexible spatial range. The SOADM enhances the features of small objects by establishing correlations among objects of the same category, avoiding the introduction of redundancy issues caused by high-resolution features. On the Cityscapes and Camvid datasets, our network achieves the accuracy of 76.9 mIoU and 73.1 mIoU, respectively, while maintaining speeds of 87 FPS and 138 FPS. It outperforms other lightweight semantic segmentation algorithms in terms of accuracy.
Collapse
Affiliation(s)
- Chunyu Zhang
- Faculty of Robot Science and Engineering, Northeastern University, Shenyang, China
| | - Fang Xu
- Shenyang Siasun Robot & Automation Company Ltd., Shenyang, China
| | - Chengdong Wu
- Faculty of Robot Science and Engineering, Northeastern University, Shenyang, China
| | - Jinzhao Li
- Changchun Institute of Optics, Fine Mechanics and Physics, University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
6
|
Sang S, Zhou Y, Islam MT, Xing L. Small-Object Sensitive Segmentation Using Across Feature Map Attention. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:6289-6306. [PMID: 36178991 PMCID: PMC10823909 DOI: 10.1109/tpami.2022.3211171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Semantic segmentation is an important step in understanding the scene for many practical applications such as autonomous driving. Although Deep Convolutional Neural Networks-based methods have significantly improved segmentation accuracy, small/thin objects remain challenging to segment due to convolutional and pooling operations that result in information loss, especially for small objects. This article presents a novel attention-based method called Across Feature Map Attention (AFMA) to address this challenge. It quantifies the inner-relationship between small and large objects belonging to the same category by utilizing the different feature levels of the original image. The AFMA could compensate for the loss of high-level feature information of small objects and improve the small/thin object segmentation. Our method can be used as an efficient plug-in for a wide range of existing architectures and produces much more interpretable feature representation than former studies. Extensive experiments on eight widely used segmentation methods and other existing small-object segmentation models on CamVid and Cityscapes demonstrate that our method substantially and consistently improves the segmentation of small/thin objects.
Collapse
|
7
|
Agarwal M, Gupta SK, Biswas KK. Development of a compressed FCN architecture for semantic segmentation using Particle Swarm Optimization. Neural Comput Appl 2023; 35:11833-11846. [PMID: 36778195 PMCID: PMC9897161 DOI: 10.1007/s00521-023-08324-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 01/16/2023] [Indexed: 02/05/2023]
Abstract
Researchers have adapted the conventional deep learning classification networks to generate Fully Conventional Networks (FCN) for carrying out accurate semantic segmentation. However, such models are expensive both in terms of storage and inference time and not readily employable on edge devices. In this paper, a compressed version of VGG16-based Fully Convolution Network (FCN) has been developed using Particle Swarm Optimization. It has been shown that the developed model can offer tremendous saving in storage space and also faster inference time, and can be implemented on edge devices. The efficacy of the proposed approach has been tested using potato late blight leaf images from publicly available PlantVillage dataset, street scene image dataset and lungs X-Ray dataset and it has been shown that it approaches the accuracies offered by standard FCN even after 851× compression.
Collapse
|
8
|
Cai Y, Dai L, Wang H, Li Z. Multi-Target Pan-Class Intrinsic Relevance Driven Model for Improving Semantic Segmentation in Autonomous Driving. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:9069-9084. [PMID: 34710044 DOI: 10.1109/tip.2021.3122293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
At present, most semantic segmentation models rely on the excellent feature extraction capabilities of a deep learning network structure. Although these models can achieve excellent performance on multiple datasets, ways of refining the target main body segmentation and overcoming the performance limitation of deep learning networks are still a research focus. We discovered a pan-class intrinsic relevance phenomenon among targets that can link the targets cross-class. This cross-class strategy is different from the latest semantic segmentation model via context where targets are divided into an intra-class and inter-class. This paper proposes a model for refining the target main body segmentation using multi-target pan-class intrinsic relevance. The main contributions of the proposed model can be summarized as follows: a) The multi-target pan-class intrinsic relevance prior knowledge establishment (RPK-Est) module builds the prior knowledge of the intrinsic relevance to lay the foundation for the following extraction of the pan-class intrinsic relevance feature. b) The multi-target pan-class intrinsic relevance feature extraction (RF-Ext) module is designed to extract the pan-class intrinsic relevance feature based on the proposed multi-target node graph and graph convolution network. c) The multi-target pan-class intrinsic relevance feature integration (RF-Int) module is proposed to integrate the intrinsic relevance features and semantic features by a generative adversarial learning strategy at the gradient level, which can make intrinsic relevance features play a role in semantic segmentation. The proposed model achieved outstanding performance in semantic segmentation testing on four authoritative datasets compared to other state-of-the-art models.
Collapse
|
9
|
He JY, Liang SH, Wu X, Zhao B, Zhang L. MGSeg: Multiple Granularity-Based Real-Time Semantic Segmentation Network. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:7200-7214. [PMID: 34375283 DOI: 10.1109/tip.2021.3102509] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Recent works on semantic segmentation witness significant performance improvement by utilizing global contextual information. In this paper, an efficient multi-granularity based semantic segmentation network (MGSeg) is proposed for real-time semantic segmentation, by modeling the latent relevance between multi-scale geometric details and high-level semantics for fine granularity segmentation. In particular, a light-weight backbone ResNet-18 is first adopted to produce the hierarchical features. Hybrid Attention Feature Aggregation (HAFA) is designed to filter the noisy spatial details of features, acquire the scale-invariance representation, and alleviate the gradient vanishing problem of the early-stage feature learning. After aggregating the learned features, Fine Granularity Refinement (FGR) module is employed to explicitly model the relationship between the multi-level features and categories, generating proper weights for fusion. More importantly, to meet the real-time processing, a series of light-weight strategies and simplified structures are applied to accelerate the efficiency, including light-weight backbone, channel compression, narrow neck structure, and so on. Extensive experiments conducted on benchmark datasets Cityscapes and CamVid demonstrate that the proposed method achieves the state-of-the-art performance, 77.8%@50fps and 72.7%@127fps on Cityscapes and CamVid datasets, respectively, having the capability for real-time applications.
Collapse
|
10
|
Yang Z, Yu H, Feng M, Sun W, Lin X, Sun M, Mao ZH, Mian A. Small Object Augmentation of Urban Scenes for Real-Time Semantic Segmentation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 29:5175-5190. [PMID: 32191886 DOI: 10.1109/tip.2020.2976856] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Semantic segmentation is a key step in scene understanding for autonomous driving. Although deep learning has significantly improved the segmentation accuracy, current highquality models such as PSPNet and DeepLabV3 are inefficient given their complex architectures and reliance on multi-scale inputs. Thus, it is difficult to apply them to real-time or practical applications. On the other hand, existing real-time methods cannot yet produce satisfactory results on small objects such as traffic lights, which are imperative to safe autonomous driving. In this paper, we improve the performance of real-time semantic segmentation from two perspectives, methodology and data. Specifically, we propose a real-time segmentation model coined Narrow Deep Network (NDNet) and build a synthetic dataset by inserting additional small objects into the training images. The proposed method achieves 65.7% mean intersection over union (mIoU) on the Cityscapes test set with only 8.4G floatingpoint operations (FLOPs) on 1024×2048 inputs. Furthermore, by re-training the existing PSPNet and DeepLabV3 models on our synthetic dataset, we obtained an average 2% mIoU improvement on small objects.
Collapse
|
11
|
Jiang B, Tu W, Yang C, Yuan J. Context-Integrated and Feature-Refined Network for Lightweight Object Parsing. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 29:5079-5093. [PMID: 32167897 DOI: 10.1109/tip.2020.2978583] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Semantic segmentation for lightweight object parsing is a very challenging task, because both accuracy and efficiency (e.g., execution speed, memory footprint or computational complexity) should all be taken into account. However, most previous works pay too much attention to one-sided perspective, either accuracy or speed, and ignore others, which poses a great limitation to actual demands of intelligent devices. To tackle this dilemma, we propose a novel lightweight architecture named Context-Integrated and Feature-Refined Network (CIFReNet). The core components of CIFReNet are the Long-skip Refinement Module (LRM) and the Multi-scale Context Integration Module (MCIM). The LRM is designed to ease the propagation of spatial information between low-level and high-level stages. Furthermore, channel attention mechanism is introduced into the process of long-skip learning to boost the quality of low-level feature refinement. Meanwhile, the MCIM consists of three cascaded Dense Semantic Pyramid (DSP) blocks with image-level features, which is presented to encode multiple context information and enlarge the field of view. Specifically, the proposed DSP block exploits a dense feature sampling strategy to enhance the information representations without significantly increasing the computation cost. Comprehensive experiments are conducted on three benchmark datasets for object parsing including Cityscapes, CamVid, and Helen. As indicated, the proposed method reaches a better trade-off between accuracy and efficiency compared with the other state-of-the-art methods.
Collapse
|
12
|
Ding H, Jiang X, Shuai B, Liu AQ, Wang G. Semantic Segmentation with Context Encoding and Multi-Path Decoding. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 29:3520-3533. [PMID: 31940532 DOI: 10.1109/tip.2019.2962685] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Semantic image segmentation aims to classify every pixel of a scene image to one of many classes. It implicitly involves object recognition, localization, and boundary delineation. In this paper, we propose a segmentation network called CGBNet to enhance the paring results by context encoding and multi-path decoding. We first propose a context encoding module that generates context contrasted local feature to make use of the informative context and the discriminative local information. This context encoding module greatly improves the segmentation performance, especially for inconspicuous objects. Furthermore, we propose a scale-selection scheme to selectively fuse the parsing results from different-scales of features at every spatial position. It adaptively selects appropriate score maps from rich scales of features. To improve the parsing results of boundary, we further propose a boundary delineation module that encourages the location-specific very-low-level feature near the boundaries to take part in the final prediction and suppresses them far from the boundaries. Without bells and whistles, the proposed segmentation network achieves very competitive performance in terms of all three different evaluation metrics consistently on the four popular scene segmentation datasets, Pascal Context, SUN-RGBD, Sift Flow, and COCO Stuff.
Collapse
|
13
|
Song S, Yu H, Miao Z, Guo D, Ke W, Ma C, Wang S. An easy-to-hard learning strategy for within-image co-saliency detection. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.05.009] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
14
|
Guo D, Pei Y, Zheng K, Yu H, Lu Y, Wang S. Degraded Image Semantic Segmentation with Dense-Gram Networks. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:782-795. [PMID: 31449020 DOI: 10.1109/tip.2019.2936111] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Degraded image semantic segmentation is of great importance in autonomous driving, highway navigation systems, and many other safety-related applications and it was not systematically studied before. In general, image degradations increase the difficulty of semantic segmentation, usually leading to decreased semantic segmentation accuracy. Therefore, performance on the underlying clean images can be treated as an upper bound of degraded image semantic segmentation. While the use of supervised deep learning has substantially improved the state of the art of semantic image segmentation, the gap between the feature distribution learned using the clean images and the feature distribution learned using the degraded images poses a major obstacle in improving the degraded image semantic segmentation performance. The conventional strategies for reducing the gap include: 1) Adding image-restoration based pre-processing modules; 2) Using both clean and the degraded images for training; 3) Fine-tuning the network pre-trained on the clean image. In this paper, we propose a novel Dense-Gram Network to more effectively reduce the gap than the conventional strategies and segment degraded images. Extensive experiments demonstrate that the proposed Dense-Gram Network yields stateof-the-art semantic segmentation performance on degraded images synthesized using PASCAL VOC 2012, SUNRGBD, CamVid, and CityScapes datasets.
Collapse
|
15
|
Li X, Ma H, Luo X. Weaklier Supervised Semantic Segmentation With Only One Image Level Annotation per Category. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:128-141. [PMID: 31380759 DOI: 10.1109/tip.2019.2930874] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Image semantic segmentation tasks and methods based on weakly supervised conditions have been proposed and achieve better and better performance in recent years. However, the purpose of these tasks is mainly to simplify the labeling work. In this paper, we establish a new and more challenging task condition: weaklier supervision with one image level annotation per category, which only provides prior knowledge that humans need to recognize new objects, and aims to achieve pixel-level object semantic understanding. In order to solve this problem, a three-stage semantic segmentation framework is put forward, which realizes image level, pixel level, and object common features learning from coarse to fine grade, and finally obtains semantic segmentation results with accurate and complete object regions. Researches on PASCAL VOC 2012 dataset demonstrates the effectiveness of the proposed method, which makes an obvious improvement compared to baselines. Based on fewer supervised information, the method also provides satisfactory performance compared to weakly supervised learning-based methods with complete image-level annotations.
Collapse
|