1
|
Zhang X, Xiao Z, Wu X, Chen Y, Zhao J, Hu Y, Liu J. Pyramid Pixel Context Adaption Network for Medical Image Classification With Supervised Contrastive Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:6802-6815. [PMID: 38829749 DOI: 10.1109/tnnls.2024.3399164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2024]
Abstract
Spatial attention (SA) mechanism has been widely incorporated into deep neural networks (DNNs), significantly lifting the performance in computer vision tasks via long-range dependency modeling. However, it may perform poorly in medical image analysis. Unfortunately, the existing efforts are often unaware that long-range dependency modeling has limitations in highlighting subtle lesion regions. To overcome this limitation, we propose a practical yet lightweight architectural unit, pyramid pixel context adaption (PPCA) module, which exploits multiscale pixel context information to recalibrate pixel position in a pixel-independent manner dynamically. PPCA first applies a well-designed cross-channel pyramid pooling (CCPP) to aggregate multiscale pixel context information, then eliminates the inconsistency among them by the well-designed pixel normalization (PN), and finally estimates per pixel attention weight via a pixel context integration. By embedding PPCA into a DNN with negligible overhead, the PPCA network (PPCANet) is developed for medical image classification. In addition, we introduce supervised contrastive learning to enhance feature representation by exploiting the potential of label information via supervised contrastive loss (CL). The extensive experiments on six medical image datasets show that the PPCANet outperforms state-of-the-art (SOTA) attention-based networks and recent DNNs. We also provide visual analysis and ablation study to explain the behavior of PPCANet in the decision-making process.
Collapse
|
2
|
Dang J, Zheng H, Xu X, Wang L, Hu Q, Guo Y. Adaptive Sparse Memory Networks for Efficient and Robust Video Object Segmentation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:3820-3833. [PMID: 38315589 DOI: 10.1109/tnnls.2024.3357118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2024]
Abstract
Recently, memory-based networks have achieved promising performance for video object segmentation (VOS). However, existing methods still suffer from unsatisfactory segmentation accuracy and inferior efficiency. The reasons are mainly twofold: 1) during memory construction, the inflexible memory storage mechanism results in a weak discriminative ability for similar appearances in complex scenarios, leading to video-level temporal redundancy, and 2) during memory reading, matching robustness and memory retrieval accuracy decrease as the number of video frames increases. To address these challenges, we propose an adaptive sparse memory network (ASM) that efficiently and effectively performs VOS by sparsely leveraging previous guidance while attending to key information. Specifically, we design an adaptive sparse memory constructor (ASMC) to adaptively memorize informative past frames according to dynamic temporal changes in video frames. Furthermore, we introduce an attentive local memory reader (ALMR) to quickly retrieve relevant information using a subset of memory, thereby reducing frame-level redundant computation and noise in a simpler and more convenient manner. To prevent key features from being discarded by the subset of memory, we further propose a novel attentive local feature aggregation (ALFA) module, which preserves useful cues by selectively aggregating discriminative spatial dependence from adjacent frames, thereby effectively increasing the receptive field of each memory frame. Extensive experiments demonstrate that our model achieves state-of-the-art performance with real-time speed on six popular VOS benchmarks. Furthermore, our ASM can be applied to existing memory-based methods as generic plugins to achieve significant performance improvements. More importantly, our method exhibits robustness in handling sparse videos with low frame rates.
Collapse
|
3
|
Yang S, Zhang L, Liu S, Lu H, Chen H. Real-Time Semantic Segmentation via a Densely Aggregated Bilateral Network. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:381-392. [PMID: 37910414 DOI: 10.1109/tnnls.2023.3326665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/03/2023]
Abstract
With the growing demands of applications on online devices, the speed-accuracy trade-off is critical in the semantic segmentation system. Recently, the bilateral segmentation network has shown promising capacity to achieve the balance between favorable accuracy and fast speed, and has become the mainstream backbone in real-time semantic segmentation. Segmentation of target objects relies on high-level semantics, whereas it requires detailed low-level features to model specific local patterns for accurate location. However, the lightweight backbone of bilateral architecture limits the extraction of semantic context and spatial details. And the late fusion of the bilateral streams incurs the insufficient aggregation of semantic context and spatial details. In this article, we propose a densely aggregated bilateral network (DAB-Net) for real-time semantic segmentation. In the context path, a patchwise context enhancement (PCE) module is proposed to efficiently capture the local semantic contextual information from spatialwise and channelwise, respectively. Meanwhile, a context-guided spatial path (CGSP) is designed to exploit more spatial information by encoding finer details from the raw image and the transition from the context path. Finally, with multiple interactions between bilateral branches, the intertwined outputs from bilateral streams are combined in a unified decoder for a final interaction to further enhance the feature representation, which generates the final segmentation prediction. Experimental results on three public benchmarks demonstrate that our proposed method achieves higher accuracy with a limited decay in speed, which performs favorably against state-of-the-art real-time approaches and runs at 31.1 frames/s (FPS) on the high resolution of . The source code is released at https://github.com/isyangshu/DABNet.
Collapse
|
4
|
Sun Z, Chen Y, Xiong S. SSAT++: A Semantic-Aware and Versatile Makeup Transfer Network With Local Color Consistency Constraint. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:1287-1301. [PMID: 37999963 DOI: 10.1109/tnnls.2023.3332065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/26/2023]
Abstract
The purpose of makeup transfer (MT) is to transfer makeup from a reference image to a target face while preserving the target's content. Existing methods have made remarkable progress in generating realistic results but do not perform well in terms of semantic correspondence and color fidelity. In addition, the straightforward extension of processing videos frame by frame tends to produce flickering results in most methods. These limitations restrict the applicability of previous methods in real-world scenarios. To address these issues, we propose a symmetric semantic-aware transfer network (SSAT++) to improve makeup similarity and video temporal consistency. For MT, the feature fusion (FF) module first integrates the content and semantic features of the input images, producing multiscale fusion features. Then, the semantic correspondence from the reference to the target is obtained by measuring the correlation of fusion features at each position. According to semantic correspondence, the symmetric mask semantic transfer (SMST) module aligns the reference makeup features with the target content features to generate MT results. Meanwhile, the semantic correspondence from the target to the reference is obtained by transposing the correlation matrix and applied to the makeup removal task. To enhance color fidelity, we propose a novel local color loss that forces the transferred results to have the same color histogram distribution as the reference. Furthermore, a morphing simulation is designed to ensure temporal consistency for video MT without requiring additional video frame input and optical flow estimation. To evaluate the effectiveness of our SSAT++, extensive experiments have been conducted on the MT dataset which has a variety of makeup styles, and on the MT-Wild dataset which contains images with diverse poses and expressions. The experiments show that SSAT++ outperforms existing MT methods through qualitative and quantitative evaluation and provides more flexible makeup control. Code and trained model will be available at https://gitee.com/sunzhaoyang0304/ssat-msp and https://github.com/Snowfallingplum/SSAT.
Collapse
|
5
|
Chen J, Jiao L, Liu X, Liu F, Li L, Yang S. Multiresolution Interpretable Contourlet Graph Network for Image Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:17716-17729. [PMID: 37747859 DOI: 10.1109/tnnls.2023.3307721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/27/2023]
Abstract
Modeling contextual relationships in images as graph inference is an interesting and promising research topic. However, existing approaches only perform graph modeling of entities, ignoring the intrinsic geometric features of images. To overcome this problem, a novel multiresolution interpretable contourlet graph network (MICGNet) is proposed in this article. MICGNet delicately balances graph representation learning with the multiscale and multidirectional features of images, where contourlet is used to capture the hyperplanar directional singularities of images and multilevel sparse contourlet coefficients are encoded into graph for further graph representation learning. This process provides interpretable theoretical support for optimizing the model structure. Specifically, first, the superpixel-based region graph is constructed. Then, the region graph is applied to code the nonsubsampled contourlet transform (NSCT) coefficients of the image, which are considered as node features. Considering the statistical properties of the NSCT coefficients, we calculate the node similarity, i.e., the adjacency matrix, using Mahalanobis distance. Next, graph convolutional networks (GCNs) are employed to further learn more abstract multilevel NSCT-enhanced graph representations. Finally, the learnable graph assignment matrix is designed to get the geometric association representations, which accomplish the assignment of graph representations to grid feature maps. We conduct comparative experiments on six publicly available datasets, and the experimental analysis shows that MICGNet is significantly more effective and efficient than other algorithms of recent years.
Collapse
|
6
|
Cai W, Sun H, Liu R, Cui Y, Wang J, Xia Y, Yao D, Guo D. A Spatial-Channel-Temporal-Fused Attention for Spiking Neural Networks. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:14315-14329. [PMID: 37256807 DOI: 10.1109/tnnls.2023.3278265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Spiking neural networks (SNNs) mimic brain computational strategies, and exhibit substantial capabilities in spatiotemporal information processing. As an essential factor for human perception, visual attention refers to the dynamic process for selecting salient regions in biological vision systems. Although visual attention mechanisms have achieved great success in computer vision applications, they are rarely introduced into SNNs. Inspired by experimental observations on predictive attentional remapping, we propose a new spatial-channel-temporal-fused attention (SCTFA) module that can guide SNNs to efficiently capture underlying target regions by utilizing accumulated historical spatial-channel information in the present study. Through a systematic evaluation on three event stream datasets (DVS Gesture, SL-Animals-DVS, and MNIST-DVS), we demonstrate that the SNN with the SCTFA module (SCTFA-SNN) not only significantly outperforms the baseline SNN (BL-SNN) and two other SNN models with degenerated attention modules, but also achieves competitive accuracy with the existing state-of-the-art (SOTA) methods. Additionally, our detailed analysis shows that the proposed SCTFA-SNN model has strong robustness to noise and outstanding stability when faced with incomplete data, while maintaining acceptable complexity and efficiency. Overall, these findings indicate that incorporating appropriate cognitive mechanisms of the brain may provide a promising approach to elevate the capabilities of SNNs.
Collapse
|
7
|
Wang Z, Tian G. Task-Oriented Robot Cognitive Manipulation Planning Using Affordance Segmentation and Logic Reasoning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:12172-12185. [PMID: 37028380 DOI: 10.1109/tnnls.2023.3252578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
The purpose of task-oriented robot cognitive manipulation planning is to enable robots to select appropriate actions to manipulate appropriate parts of an object according to different tasks, so as to complete the human-like task execution. This ability is crucial for robots to understand how to manipulate and grasp objects under given tasks. This article proposes a task-oriented robot cognitive manipulation planning method using affordance segmentation and logic reasoning, which can provide robots with semantic reasoning skills about the most appropriate parts of the object to be manipulated and oriented by tasks. Object affordance can be obtained by constructing a convolutional neural network based on the attention mechanism. In view of the diversity of service tasks and objects in service environments, object/task ontologies are constructed to realize the management of objects and tasks, and the object-task affordances are established through causal probability logic. On this basis, the Dempster-Shafer theory is used to design a robot cognitive manipulation planning framework, which can reason manipulation regions' configuration for the intended task. The experimental results demonstrate that our proposed method can effectively improve the cognitive manipulation ability of robots and make robots preform various tasks more intelligently.
Collapse
|
8
|
Luo H, Lin G, Shen F, Huang X, Yao Y, Shen H. Robust-EQA: Robust Learning for Embodied Question Answering With Noisy Labels. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:12083-12094. [PMID: 37028297 DOI: 10.1109/tnnls.2023.3251984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Embodied question answering (EQA) is a recently emerged research field in which an agent is asked to answer the user's questions by exploring the environment and collecting visual information. Plenty of researchers turn their attention to the EQA field due to its broad potential application areas, such as in-home robots, self-driven mobile, and personal assistants. High-level visual tasks, such as EQA, are susceptible to noisy inputs, because they have complex reasoning processes. Before the profits of the EQA field can be applied to practical applications, good robustness against label noise needs to be equipped. To tackle this problem, we propose a novel label noise-robust learning algorithm for the EQA task. First, a joint training co-regularization noise-robust learning method is proposed for noisy filtering of the visual question answering (VQA) module, which trains two parallel network branches by one loss function. Then, a two-stage hierarchical robust learning algorithm is proposed to filter out noisy navigation labels in both trajectory level and action level. Finally, by taking purified labels as inputs, a joint robust learning mechanism is given to coordinate the work of the whole EQA system. Empirical results demonstrate that, under extremely noisy environments (45% of noisy labels) and low-level noisy environments (20% of noisy labels), the robustness of deep learning models trained by our algorithm is superior to the existing EQA models in noisy environments.
Collapse
|
9
|
Gao L, Wang W, Meng X, Zhang S, Xu J, Ju S, Wang YC. TPA: Two-stage progressive attention segmentation framework for hepatocellular carcinoma on multi-modality MRI. Med Phys 2024; 51:4936-4947. [PMID: 38306473 DOI: 10.1002/mp.16968] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 01/04/2024] [Accepted: 01/21/2024] [Indexed: 02/04/2024] Open
Abstract
BACKGROUND Dynamic contrast enhanced magnetic resonance imaging (DCE-MRI) plays a crucial role in the diagnosis and measurement of hepatocellular carcinoma (HCC). The multi-modality information contained in the multi-phase images of DCE-MRI is important for improving segmentation. However, this remains a challenging task due to the heterogeneity of HCC, which may cause one HCC lesion to have varied imaging appearance in each phase of DCE-MRI. In particular, some phases exhibit inconsistent sizes and boundaries will result in a lack of correlation between modalities, and it may pose inaccurate segmentation results. PURPOSE We aim to design a multi-modality segmentation model that can learn meaningful inter-phase correlation for achieving HCC segmentation. METHODS In this study, we propose a two-stage progressive attention segmentation framework (TPA) for HCC based on the transformer and the decision-making process of radiologists. Specifically, the first stage aims to fuse features from multi-phase images to identify HCC and provide localization region. In the second stage, a multi-modality attention transformer module (MAT) is designed to focus on the features that can represent the actual size. RESULTS We conduct training, validation, and test in a single-center dataset (386 cases), followed by external test on a batch of multi-center datasets (83 cases). Furthermore, we analyze a subgroup of data with weak inter-phase correlation in the test set. The proposed model achieves Dice coefficient of 0.822 and 0.772 in the internal and external test sets, respectively, and 0.829, 0.791 in the subgroup. The experimental results demonstrate that our model outperforms state-of-the-art models, particularly within subgroup. CONCLUSIONS The proposed TPA provides best segmentation results, and utilizing clinical prior knowledge for network design is practical and feasible.
Collapse
Affiliation(s)
- Lei Gao
- Institute for AI in Medicine, School of Artificial Intelligence, Nanjing University of Information Science and Technology, Nanjing, China
| | - Weilang Wang
- Department of Radiology, Zhongda Hospital, Jiangsu Key Laboratory of Molecular and Functional Imaging, School of Medicine, Southeast University, Nanjing, China
| | - Xiangpan Meng
- Department of Radiology, Zhongda Hospital, Jiangsu Key Laboratory of Molecular and Functional Imaging, School of Medicine, Southeast University, Nanjing, China
| | - Shuhang Zhang
- Department of Radiology, Zhongda Hospital, Jiangsu Key Laboratory of Molecular and Functional Imaging, School of Medicine, Southeast University, Nanjing, China
| | - Jun Xu
- Institute for AI in Medicine, School of Artificial Intelligence, Nanjing University of Information Science and Technology, Nanjing, China
| | - Shenghong Ju
- Institute for AI in Medicine, School of Artificial Intelligence, Nanjing University of Information Science and Technology, Nanjing, China
- Department of Radiology, Zhongda Hospital, Jiangsu Key Laboratory of Molecular and Functional Imaging, School of Medicine, Southeast University, Nanjing, China
| | - Yuan-Cheng Wang
- Institute for AI in Medicine, School of Artificial Intelligence, Nanjing University of Information Science and Technology, Nanjing, China
- Department of Radiology, Zhongda Hospital, Jiangsu Key Laboratory of Molecular and Functional Imaging, School of Medicine, Southeast University, Nanjing, China
| |
Collapse
|
10
|
Toledo YP, Pereira ALDS, Quesada AP, De Moraes RF, Garcia SH, Fernandes LAF. Scalable Segmentation of Diabetic Foot Ulcers. 2024 IEEE 37TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS (CBMS) 2024:514-520. [DOI: 10.1109/cbms61543.2024.00091] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2025]
|
11
|
Ma R, Zhang Y, Zhang B, Fang L, Huang D, Qi L. Learning Attention in the Frequency Domain for Flexible Real Photograph Denoising. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:3707-3721. [PMID: 38809730 DOI: 10.1109/tip.2024.3404253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2024]
Abstract
Recent advancements in deep learning techniques have pushed forward the frontiers of real photograph denoising. However, due to the inherent pooling operations in the spatial domain, current CNN-based denoisers are biased towards focusing on low-frequency representations, while discarding the high-frequency components. This will induce a problem for suboptimal visual quality as the image denoising tasks target completely eliminating the complex noises and recovering all fine-scale and salient information. In this work, we tackle this challenge from the frequency perspective and present a new solution pipeline, coined as frequency attention denoising network (FADNet). Our key idea is to build a learning-based frequency attention framework, where the feature correlations on a broader frequency spectrum can be fully characterized, thus enhancing the representational power of the network across multiple frequency channels. Based on this, we design a cascade of adaptive instance residual modules (AIRMs). In each AIRM, we first transform the spatial-domain features into the frequency space. Then, a learning-based frequency attention framework is devised to explore the feature inter-dependencies converted in the frequency domain. Besides this, we introduce an adaptive layer by leveraging the guidance of the estimated noise map and intermediate features to meet the challenges of model generalization in the noise discrepancy. The effectiveness of our method is demonstrated on several real camera benchmark datasets, with superior denoising performance, generalization capability, and efficiency versus the state-of-the-art.
Collapse
|
12
|
Kong X, Deng Y, Tang F, Dong W, Ma C, Chen Y, He Z, Xu C. Exploring the Temporal Consistency of Arbitrary Style Transfer: A Channelwise Perspective. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:8482-8496. [PMID: 37018565 DOI: 10.1109/tnnls.2022.3230084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Arbitrary image stylization by neural networks has become a popular topic, and video stylization is attracting more attention as an extension of image stylization. However, when image stylization methods are applied to videos, unsatisfactory results that suffer from severe flickering effects appear. In this article, we conducted a detailed and comprehensive analysis of the cause of such flickering effects. Systematic comparisons among typical neural style transfer approaches show that the feature migration modules for state-of-the-art (SOTA) learning systems are ill-conditioned and could lead to a channelwise misalignment between the input content representations and the generated frames. Unlike traditional methods that relieve the misalignment via additional optical flow constraints or regularization modules, we focus on keeping the temporal consistency by aligning each output frame with the input frame. To this end, we propose a simple yet efficient multichannel correlation network (MCCNet), to ensure that output frames are directly aligned with inputs in the hidden feature space while maintaining the desired style patterns. An inner channel similarity loss is adopted to eliminate side effects caused by the absence of nonlinear operations such as softmax for strict alignment. Furthermore, to improve the performance of MCCNet under complex light conditions, we introduce an illumination loss during training. Qualitative and quantitative evaluations demonstrate that MCCNet performs well in arbitrary video and image style transfer tasks. Code is available at https://github.com/kongxiuxiu/MCCNetV2.
Collapse
|
13
|
Song Q, Li J, Guo H, Huang R. Denoised Non-Local Neural Network for Semantic Segmentation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:7162-7174. [PMID: 37021852 DOI: 10.1109/tnnls.2022.3214216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
The non-local (NL) network has become a widely used technique for semantic segmentation, which computes an attention map to measure the relationships of each pixel pair. However, most of the current popular NL models tend to ignore the phenomenon that the calculated attention map appears to be very noisy, containing interclass and intraclass inconsistencies, which lowers the accuracy and reliability of the NL methods. In this article, we figuratively denote these inconsistencies as attention noises and explore the solutions to denoise them. Specifically, we inventively propose a denoised NL network, which consists of two primary modules, i.e., the global rectifying (GR) block and the local retention (LR) block, to eliminate the interclass and intraclass noises, respectively. First, GR adopts the class-level predictions to capture a binary map to distinguish whether the selected two pixels belong to the same category. Second, LR captures the ignored local dependencies and further uses them to rectify the unwanted hollows in the attention map. The experimental results on two challenging semantic segmentation datasets demonstrate the superior performance of our model. Without any external training data, our proposed denoised NL can achieve the state-of-the-art performance of 83.5% and 46.69% mean of classwise intersection over union (mIoU) on Cityscapes and ADE20K, respectively.
Collapse
|
14
|
Ding Y, Yi Z, Xiao J, Hu M, Guo Y, Liao Z, Wang Y. CTH-Net: A CNN and Transformer hybrid network for skin lesion segmentation. iScience 2024; 27:109442. [PMID: 38523786 PMCID: PMC10957498 DOI: 10.1016/j.isci.2024.109442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 01/25/2024] [Accepted: 03/04/2024] [Indexed: 03/26/2024] Open
Abstract
Automatically and accurately segmenting skin lesions can be challenging, due to factors such as low contrast and fuzzy boundaries. This paper proposes a hybrid encoder-decoder model (CTH-Net) based on convolutional neural network (CNN) and Transformer, capitalizing on the advantages of these approaches. We propose three modules for skin lesion segmentation and seamlessly connect them with carefully designed model architecture. Better segmentation performance is achieved by introducing SoftPool in the CNN branch and sandglass block in the bottleneck layer. Extensive experiments were conducted on four publicly accessible skin lesion datasets, ISIC 2016, ISIC 2017, ISIC 2018, and PH2 to confirm the efficacy and benefits of the proposed strategy. Experimental results show that the proposed CTH-Net provides better skin lesion segmentation performance in both quantitative and qualitative testing when compared with state-of-the-art approaches. We believe the CTH-Net design is inspiring and can be extended to other applications/frameworks.
Collapse
Affiliation(s)
- Yuhan Ding
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Zhenglin Yi
- Departments of Urology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Jiatong Xiao
- Departments of Urology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Minghui Hu
- Departments of Urology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Yu Guo
- Department of Burns and Plastic Surgery, Xiangya Hospital, Central South University, Changsha 410008, China
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Zhifang Liao
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Yongjie Wang
- Department of Burns and Plastic Surgery, Xiangya Hospital, Central South University, Changsha 410008, China
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha 410008, China
| |
Collapse
|
15
|
Zhang D, Fan X, Kang X, Tian S, Xiao G, Yu L, Wu W. Class key feature extraction and fusion for 2D medical image segmentation. Med Phys 2024; 51:1263-1276. [PMID: 37552522 DOI: 10.1002/mp.16636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2022] [Revised: 06/28/2023] [Accepted: 07/07/2023] [Indexed: 08/09/2023] Open
Abstract
BACKGROUND The size variation, complex semantic environment and high similarity in medical images often prevent deep learning models from achieving good performance. PURPOSE To overcome these problems and improve the model segmentation performance and generalizability. METHODS We propose the key class feature reconstruction module (KCRM), which ranks channel weights and selects key features (KFs) that contribute more to the segmentation results for each class. Meanwhile, KCRM reconstructs all local features to establish the dependence relationship from local features to KFs. In addition, we propose the spatial gating module (SGM), which employs KFs to generate two spatial maps to suppress irrelevant regions, strengthening the ability to locate semantic objects. Finally, we enable the model to adapt to size variations by diversifying the receptive field. RESULTS We integrate these modules into class key feature extraction and fusion network (CKFFNet) and validate its performance on three public medical datasets: CHAOS, UW-Madison, and ISIC2017. The experimental results show that our method achieves better segmentation results and generalizability than those of mainstream methods. CONCLUSION Through quantitative and qualitative research, the proposed module improves the segmentation results and enhances the model generalizability, making it suitable for application and expansion.
Collapse
Affiliation(s)
- Dezhi Zhang
- Department of Dermatology and Venereology, People's Hospital of Xinjiang Uygur Autonomous Region, Xinjiang Clinical Research Center For Dermatologic Diseases, Xinjiang Key Laboratory of Dermatology Research (XJYS1707), Urmuqi, China
| | - Xin Fan
- College of Software, Xinjiang University, Urmuqi, Xinjiang, China
| | - Xiaojing Kang
- Department of Dermatology and Venereology, People's Hospital of Xinjiang Uygur Autonomous Region, Xinjiang Clinical Research Center For Dermatologic Diseases, Xinjiang Key Laboratory of Dermatology Research (XJYS1707), Urmuqi, China
| | - Shengwei Tian
- College of Software, Xinjiang University, Urmuqi, Xinjiang, China
- Key Laboratory of Software Engineering Technology, College of Software, Xin Jiang University, Urumqi, China
| | - Guangli Xiao
- College of Software, Xinjiang University, Urmuqi, Xinjiang, China
| | - Long Yu
- College of Network Center, Xinjiang University, Urumqi, China
- Signal and Signal Processing Laboratory, College of Information Science and Engineering, Xinjiang University, Urumqi, China
| | - Weidong Wu
- Department of Dermatology and Venereology, People's Hospital of Xinjiang Uygur Autonomous Region, Xinjiang Clinical Research Center For Dermatologic Diseases, Xinjiang Key Laboratory of Dermatology Research (XJYS1707), Urmuqi, China
| |
Collapse
|
16
|
Lei J, Huang Y, Chen Y, Xia L, Yi B. The effect of the re-segmentation method on improving the performance of rectal cancer image segmentation models. Technol Health Care 2024; 32:1629-1640. [PMID: 38517809 DOI: 10.3233/thc-230690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/24/2024]
Abstract
BACKGROUND Rapid and accurate segmentation of tumor regions from rectal cancer images can better understand the patientâs lesions and surrounding tissues, providing more effective auxiliary diagnostic information. However, cutting rectal tumors with deep learning still cannot be compared with manual segmentation, and a major obstacle to cutting rectal tumors with deep learning is the lack of high-quality data sets. OBJECTIVE We propose to use our Re-segmentation Method to manually correct the model segmentation area and put it into training and training ideas. The data set has been made publicly available. Methods: A total of 354 rectal cancer CT images and 308 rectal region images labeled by experts from Jiangxi Cancer Hospital were included in the data set. Six network architectures are used to train the data set, and the region predicted by the model is manually revised and then put into training to improve the ability of model segmentation and then perform performance measurement. RESULTS In this study, we use the Resegmentation Method for various popular network architectures. CONCLUSION By comparing the evaluation indicators before and after using the Re-segmentation Method, we prove that our proposed Re-segmentation Method can further improve the performance of the rectal cancer image segmentation model.
Collapse
Affiliation(s)
- Jie Lei
- School of Software, Nanchang University, Nanchang, Jiangxi, China
- School of Software, Nanchang University, Nanchang, Jiangxi, China
| | - YiJun Huang
- School of Software, Nanchang University, Nanchang, Jiangxi, China
- School of Software, Nanchang University, Nanchang, Jiangxi, China
| | - YangLin Chen
- Jiangxi Cancer Hospital, Nanchang, Jiangxi, China
- School of Software, Nanchang University, Nanchang, Jiangxi, China
| | - Linglin Xia
- School of Software, Nanchang University, Nanchang, Jiangxi, China
| | - Bo Yi
- Jiangxi Cancer Hospital, Nanchang, Jiangxi, China
| |
Collapse
|
17
|
Luo J, Wang Q, Zou R, Wang Y, Liu F, Zheng H, Du S, Yuan C. A Heart Image Segmentation Method Based on Position Attention Mechanism and Inverted Pyramid. SENSORS (BASEL, SWITZERLAND) 2023; 23:9366. [PMID: 38067739 PMCID: PMC10708808 DOI: 10.3390/s23239366] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 11/16/2023] [Accepted: 11/21/2023] [Indexed: 12/18/2023]
Abstract
In the realm of modern medicine, medical imaging stands as an irreplaceable pillar for accurate diagnostics. The significance of precise segmentation in medical images cannot be overstated, especially considering the variability introduced by different practitioners. With the escalating volume of medical imaging data, the demand for automated and efficient segmentation methods has become imperative. This study introduces an innovative approach to heart image segmentation, embedding a multi-scale feature and attention mechanism within an inverted pyramid framework. Recognizing the intricacies of extracting contextual information from low-resolution medical images, our method adopts an inverted pyramid architecture. Through training with multi-scale images and integrating prediction outcomes, we enhance the network's contextual understanding. Acknowledging the consistent patterns in the relative positions of organs, we introduce an attention module enriched with positional encoding information. This module empowers the network to capture essential positional cues, thereby elevating segmentation accuracy. Our research resides at the intersection of medical imaging and sensor technology, emphasizing the foundational role of sensors in medical image analysis. The integration of sensor-generated data showcases the symbiotic relationship between sensor technology and advanced machine learning techniques. Evaluation on two heart datasets substantiates the superior performance of our approach. Metrics such as the Dice coefficient, Jaccard coefficient, recall, and F-measure demonstrate the method's efficacy compared to state-of-the-art techniques. In conclusion, our proposed heart image segmentation method addresses the challenges posed by diverse medical images, offering a promising solution for efficiently processing 2D/3D sensor data in contemporary medical imaging.
Collapse
Affiliation(s)
- Jinbin Luo
- School of Physics and Mechanical and Electrical Engineering, Longyan University, Longyan 364012, China; (J.L.); (Q.W.); (R.Z.); (Y.W.); (F.L.)
| | - Qinghui Wang
- School of Physics and Mechanical and Electrical Engineering, Longyan University, Longyan 364012, China; (J.L.); (Q.W.); (R.Z.); (Y.W.); (F.L.)
| | - Ruirui Zou
- School of Physics and Mechanical and Electrical Engineering, Longyan University, Longyan 364012, China; (J.L.); (Q.W.); (R.Z.); (Y.W.); (F.L.)
| | - Ying Wang
- School of Physics and Mechanical and Electrical Engineering, Longyan University, Longyan 364012, China; (J.L.); (Q.W.); (R.Z.); (Y.W.); (F.L.)
| | - Fenglin Liu
- School of Physics and Mechanical and Electrical Engineering, Longyan University, Longyan 364012, China; (J.L.); (Q.W.); (R.Z.); (Y.W.); (F.L.)
| | - Haojie Zheng
- School of Software Engineering, Xi’an Jiaotong University, Xi’an 710049, China;
| | - Shaoyi Du
- Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an 710049, China;
| | - Chengzhi Yuan
- Department of Mechanical, Industrial and Systems Engineering, University of Rhode Island, Kingston, RI 02881, USA
| |
Collapse
|
18
|
Hu Y, Wang X, Gu Q. PWSNAS: Powering Weight Sharing NAS With General Search Space Shrinking Framework. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:9171-9184. [PMID: 35316195 DOI: 10.1109/tnnls.2022.3156373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Neural architecture search (NAS) depends heavily on an efficient and accurate performance estimator. To speed up the evaluation process, recent advances, like differentiable architecture search (DARTS) and One-Shot approaches, instead of training every model from scratch, train a weight-sharing super-network to reuse parameters among different candidates, in which all child models can be efficiently evaluated. Though these methods significantly boost search efficiency, they inherently suffer from inaccurate and unstable performance estimation. To this end, we propose a general and effective framework for powering weight-sharing NAS, namely, PWSNAS, by shrinking search space automatically, i.e., candidate operators will be discarded if they are less important. With the strategy, our approach can provide a promising search space of a smaller size by progressively simplifying the original search space, which can reduce difficulties for existing NAS methods to find superior architectures. In particular, we present two strategies to guide the shrinking process: detect redundant operators with a new angle-based metric and decrease the degree of weight sharing of a super-network by increasing parameters, which differentiates PWSNAS from existing shrinking methods. Comprehensive analysis experiments on NASBench-201 verify the superiority of our proposed metric over existing accuracy-based and magnitude-based metrics. PWSNAS can easily apply to the state-of-the-art NAS methods, e.g., single path one-shot neural architecture search (SPOS), FairNAS, ProxylessNAS, DARTS, and progressive DARTS (PDARTS). We evaluate PWSNAS and demonstrate consistent performance gains over baseline methods.
Collapse
|
19
|
Qiu W, Xiong L, Li N, Luo Z, Wang Y, Zhang Y. AEAU-Net: an unsupervised end-to-end registration network by combining affine transformation and deformable medical image registration. Med Biol Eng Comput 2023; 61:2859-2873. [PMID: 37498511 DOI: 10.1007/s11517-023-02887-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Accepted: 07/09/2023] [Indexed: 07/28/2023]
Abstract
Deformable medical image registration plays an essential role in clinical diagnosis and treatment. However, due to the large difference in image deformation, unsupervised convolutional neural network (CNN)-based methods cannot extract global features and local features simultaneously and cannot capture long-distance dependencies to solve the problem of excessive deformation. In this paper, an unsupervised end-to-end registration network is proposed for 3D MRI medical image registration, named AEAU-Net, which includes two-stage operations, i.e., an affine transformation and a deformable registration. These two operations are implemented by an affine transformation subnetwork and a deformable registration subnetwork, respectively. In the deformable registration subnetwork, termed as EAU-Net, we designed an efficient attention mechanism (EAM) module and a recursive residual path (RSP) module. The EAM module is embedded in the bottom layer of the EAU-Net to capture long-distance dependencies. The RSP model is used to obtain effective features by fusing deep and shallow features. Extensive experiments on two datasets, LPBA40 and Mindboggle101, were conducted to verify the effectiveness of the proposed method. Compared with baseline methods, this proposed method could obtain better registration performance. The ablation study further demonstrated the reasonability and validity of the designed architecture of the proposed method.
Collapse
Affiliation(s)
- Wei Qiu
- School of Computer Science and Technology, Laboratory for Brain Science and Medical Artificial Intelligence, Southwest University of Science and Technology, Mianyang, 621010, China
| | - Lianjin Xiong
- School of Computer Science and Technology, Laboratory for Brain Science and Medical Artificial Intelligence, Southwest University of Science and Technology, Mianyang, 621010, China
| | - Ning Li
- School of Computer Science and Technology, Laboratory for Brain Science and Medical Artificial Intelligence, Southwest University of Science and Technology, Mianyang, 621010, China
| | - Zhangrong Luo
- School of Computer Science and Technology, Laboratory for Brain Science and Medical Artificial Intelligence, Southwest University of Science and Technology, Mianyang, 621010, China
| | - Yaobin Wang
- School of Computer Science and Technology, Laboratory for Brain Science and Medical Artificial Intelligence, Southwest University of Science and Technology, Mianyang, 621010, China
| | - Yangsong Zhang
- School of Computer Science and Technology, Laboratory for Brain Science and Medical Artificial Intelligence, Southwest University of Science and Technology, Mianyang, 621010, China.
- NHC Key Laboratory of Nuclear Technology Medical Transformation (Mianyang Central Hospital), Mianyang, 621010, China.
- Key Laboratory of Testing Technology for Manufacturing Process, Ministry of Education, Southwest University of Science and Technology, Mianyang, Sichuan, 621010, China.
| |
Collapse
|
20
|
Gu Z, Zhou S, Niu L, Zhao Z, Zhang L. From Pixel to Patch: Synthesize Context-Aware Features for Zero-Shot Semantic Segmentation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:7689-7703. [PMID: 35143403 DOI: 10.1109/tnnls.2022.3145962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Zero-shot learning (ZSL) has been actively studied for image classification tasks to relieve the burden of annotating image labels. Interestingly, the semantic segmentation task requires more labor-intensive pixel-wise annotation, but zero-shot semantic segmentation has not attracted extensive research interest. Thus, we focus on zero-shot semantic segmentation that aims to segment unseen objects with only category-level semantic representations provided for unseen categories. In this article, we propose a novel context-aware feature generation network (CaGNet) that can synthesize context-aware pixel-wise visual features for unseen categories based on category-level semantic representations and pixel-wise contextual information. The synthesized features are used to fine-tune the classifier to enable segmenting of unseen objects. Furthermore, we extend pixel-wise feature generation and fine-tuning to patch-wise feature generation and fine-tuning, which additionally considers the interpixel relationship. Experimental results on Pascal-VOC, Pascal-context, and COCO-stuff show that our method significantly outperforms the existing zero-shot semantic segmentation methods.
Collapse
|
21
|
Li Y, Zhang Y, Liu JY, Wang K, Zhang K, Zhang GS, Liao XF, Yang G. Global Transformer and Dual Local Attention Network via Deep-Shallow Hierarchical Feature Fusion for Retinal Vessel Segmentation. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:5826-5839. [PMID: 35984806 DOI: 10.1109/tcyb.2022.3194099] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Clinically, retinal vessel segmentation is a significant step in the diagnosis of fundus diseases. However, recent methods generally neglect the difference of semantic information between deep and shallow features, which fail to capture the global and local characterizations in fundus images simultaneously, resulting in the limited segmentation performance for fine vessels. In this article, a global transformer (GT) and dual local attention (DLA) network via deep-shallow hierarchical feature fusion (GT-DLA-dsHFF) are investigated to solve the above limitations. First, the GT is developed to integrate the global information in the retinal image, which effectively captures the long-distance dependence between pixels, alleviating the discontinuity of blood vessels in the segmentation results. Second, DLA, which is constructed using dilated convolutions with varied dilation rates, unsupervised edge detection, and squeeze-excitation block, is proposed to extract local vessel information, consolidating the edge details in the segmentation result. Finally, a novel deep-shallow hierarchical feature fusion (dsHFF) algorithm is studied to fuse the features in different scales in the deep learning framework, respectively, which can mitigate the attenuation of valid information in the process of feature fusion. We verified the GT-DLA-dsHFF on four typical fundus image datasets. The experimental results demonstrate our GT-DLA-dsHFF achieves superior performance against the current methods and detailed discussions verify the efficacy of the proposed three modules. Segmentation results of diseased images show the robustness of our proposed GT-DLA-dsHFF. Implementation codes will be available on https://github.com/YangLibuaa/GT-DLA-dsHFF.
Collapse
|
22
|
Ishikawa H, Aoki Y. Boosting Semantic Segmentation by Conditioning the Backbone with Semantic Boundaries. SENSORS (BASEL, SWITZERLAND) 2023; 23:6980. [PMID: 37571763 PMCID: PMC10422643 DOI: 10.3390/s23156980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 07/31/2023] [Accepted: 08/04/2023] [Indexed: 08/13/2023]
Abstract
In this paper, we propose the Semantic-Boundary-Conditioned Backbone (SBCB) framework, an effective approach to enhancing semantic segmentation performance, particularly around mask boundaries, while maintaining compatibility with various segmentation architectures. Our objective is to improve existing models by leveraging semantic boundary information as an auxiliary task. The SBCB framework incorporates a complementary semantic boundary detection (SBD) task with a multi-task learning approach. It enhances the segmentation backbone without introducing additional parameters during inference or relying on independent post-processing modules. The SBD head utilizes multi-scale features from the backbone, learning low-level features in early stages and understanding high-level semantics in later stages. This complements common semantic segmentation architectures, where features from later stages are used for classification. Extensive evaluations using popular segmentation heads and backbones demonstrate the effectiveness of the SBCB. It leads to an average improvement of 1.2% in IoU and a 2.6% gain in the boundary F-score on the Cityscapes dataset. The SBCB framework also improves over- and under-segmentation characteristics. Furthermore, the SBCB adapts well to customized backbones and emerging vision transformer models, consistently achieving superior performance. In summary, the SBCB framework significantly boosts segmentation performance, especially around boundaries, without introducing complexity to the models. Leveraging the SBD task as an auxiliary objective, our approach demonstrates consistent improvements on various benchmarks, confirming its potential for advancing the field of semantic segmentation.
Collapse
Affiliation(s)
- Haruya Ishikawa
- Department of Electronics and Electrical Engineering, Facility of Science and Technology, Keio University, 3-14-1, Hiyoshi, Kohoku-ku, Yokohama 223-8522, Japan;
| | | |
Collapse
|
23
|
Liu G, Wang Q, Zhu J, Hong H. W-Net: Convolutional neural network for segmenting remote sensing images by dual path semantics. PLoS One 2023; 18:e0288311. [PMID: 37498885 PMCID: PMC10374094 DOI: 10.1371/journal.pone.0288311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Accepted: 06/25/2023] [Indexed: 07/29/2023] Open
Abstract
In the latest research progress, deep neural networks have been revolutionized by frameworks to extract image features more accurately. In this study, we focus on an attention model that can be useful in deep neural networks and propose a simple but strong feature extraction deep network architecture, W-Net. The architecture of our W-Net network has two mutually independent path structures, and it is designed with the following advantages. (1) There are two independent effective paths in our proposed network structure, and the two paths capture more contextual information from different scales in different ways. (2) The two paths acquire different feature images, and in the upsampling approach, we use bilinear interpolation thus reducing the feature map distortion phenomenon and integrating the different images processed. (3) The feature image processing is at a bottleneck, and a hierarchical attention module is constructed at the bottleneck by reclassifying after the channel attention module and the spatial attention module, resulting in more efficient and accurate processing of feature images. During the experiment, we also tested iSAID, a massively high spatial resolution remote sensing image dataset, with further experimental data comparison to demonstrate the generality of our method for remote sensor image segmentation.
Collapse
Affiliation(s)
- Guangjie Liu
- College of Computer Science and Technology, Changchun Normal University, Changchun, Jilin, China
| | - Qi Wang
- College of Computer Science and Technology, Changchun Normal University, Changchun, Jilin, China
| | - Jinlong Zhu
- College of Computer Science and Technology, Changchun Normal University, Changchun, Jilin, China
| | - Haotong Hong
- FAW Mold Manufacturing Co., Ltd, Changchun, Jilin, China
| |
Collapse
|
24
|
Haider A, Arsalan M, Hyun Nam S, Sultan H, Ryoung Park K. Computer-aided Fish Assessment in an Underwater Marine Environment Using Parallel and Progressive Spatial Information Fusion. JOURNAL OF KING SAUD UNIVERSITY - COMPUTER AND INFORMATION SCIENCES 2023. [DOI: 10.1016/j.jksuci.2023.02.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/24/2023]
|
25
|
Deng Y, Wang H, Hou Y, Liang S, Zeng D. LFU-Net: A Lightweight U-Net with Full Skip Connections for Medical Image Segmentation. Curr Med Imaging 2023; 19:347-360. [PMID: 35733312 DOI: 10.2174/1573405618666220622154853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Revised: 04/21/2022] [Accepted: 04/26/2022] [Indexed: 11/22/2022]
Abstract
BACKGROUND In the series of improved versions of U-Net, while the segmentation accuracy continues to improve, the number of parameters does not change, which makes the hardware required for training expensive, thus affecting the speed of training convergence. OBJECTIVE The objective of this study is to propose a lightweight U-Net to balance the relationship between the parameters and the segmentation accuracy. METHODS A lightweight U-Net with full skip connections and deep supervision (LFU-Net) was proposed. The full skip connections include skip connections from shallow encoders, deep decoders, and sub-networks, while the deep supervision learns hierarchical representations from full-resolution feature representations in outputs of sub-networks. The key lightweight design is that the number of output channels is based on 8 rather than 64 or 32. Its pruning scheme was designed to further reduce parameters. The code is available at: https://github.com/dengdy22/U-Nets. RESULTS For the ISBI LiTS 2017 Challenge validation dataset, the LFU-Net with no pruning received a Dice value of 0.9699, which achieved equal or better performance with a mere about 1% of the parameters of existing networks. For the BraTS 2018 validation dataset, its Dice values were 0.8726, 0.9363, 0.8699 and 0.8116 on average, WT, TC and ET, respectively, and its Hausdorff95 distances values were 3.9514, 4.3960, 3.0607 and 4.3975, respectively, which was not inferior to the existing networks and showed that it can achieve balanced recognition of each region. CONCLUSION LFU-Net can be used as a lightweight and effective method in the segmentation tasks of two and multiple classification medical imaging datasets.
Collapse
Affiliation(s)
- Yunjiao Deng
- School of Mechanical Engineering, Yanshan University, Qinhuangdao 066004, China
| | - Hui Wang
- School of Mechanical Engineering, Yanshan University, Qinhuangdao 066004, China
| | - Yulei Hou
- School of Mechanical Engineering, Yanshan University, Qinhuangdao 066004, China
| | - Shunpan Liang
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China
| | - Daxing Zeng
- School of Mechanical Engineering, Dongguan University of Technology, Dongguan 523015, China
| |
Collapse
|
26
|
Yu Y, Zhang Y, Song Z, Tang CK. LMA: lightweight mixed-domain attention for efficient network design. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04170-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
27
|
Zhang W, Zhou Z, Gao Z, Yang G, Xu L, Wu W, Zhang H. Multiple Adversarial Learning based Angiography Reconstruction for Ultra-low-dose Contrast Medium CT. IEEE J Biomed Health Inform 2022; 27:409-420. [PMID: 36219660 DOI: 10.1109/jbhi.2022.3213595] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Iodinated contrast medium (ICM) dose reduction is beneficial for decreasing potential health risk to renal-insufficiency patients in CT scanning. Due to the lowintensity vessel in ultra-low-dose-ICM CT angiography, it cannot provide clinical diagnosis of vascular diseases. Angiography reconstruction for ultra-low-dose-ICM CT can enhance vascular intensity for directly vascular diseases diagnosis. However, the angiography reconstruction is challenging since patient individual differences and vascular disease diversity. In this paper, we propose a Multiple Adversarial Learning based Angiography Reconstruction (i.e., MALAR) framework to enhance vascular intensity. Specifically, a bilateral learning mechanism is developed for mapping a relationship between source and target domains rather than the image-to-image mapping. Then, a dual correlation constraint is introduced to characterize both distribution uniformity from across-domain features and sample inconsistency with domain simultaneously. Finally, an adaptive fusion module by combining multiscale information and long-range interactive dependency is explored to alleviate the interference of high-noise metal. Experiments are performed on CT sequences with different ICM doses. Quantitative results based on multiple metrics demonstrate the effectiveness of our MALAR on angiography reconstruction. Qualitative assessments by radiographers confirm the potential of our MALAR for the clinical diagnosis of vascular diseases. The code and model are available at https://github.com/HIC-SYSU/MALAR.
Collapse
Affiliation(s)
- Weiwei Zhang
- School of Biomedical Engineering, Sun Yat-sen University, Shenzhen, China
| | - Zhen Zhou
- Department of Radiology, Beijing Anzhen Hospital, Capital Medical University, Beijing, China
| | - Zhifan Gao
- School of Biomedical Engineering, Sun Yat-sen University, Shenzhen, China
| | - Guang Yang
- Cardiovascular Research Centre, Royal Brompton Hospital, London, U.K
| | - Lei Xu
- Department of Radiology, Beijing Anzhen Hospital, Capital Medical University, Beijing, China
| | - Weiwen Wu
- School of Biomedical Engineering, Sun Yat-sen University, Shenzhen, China
| | - Heye Zhang
- School of Biomedical Engineering, Sun Yat-sen University, Shenzhen, China
| |
Collapse
|
28
|
Wang J, Cai M, Gu Y, Liu Z, Li X, Han Y. Cropland encroachment detection via dual attention and multi-loss based building extraction in remote sensing images. FRONTIERS IN PLANT SCIENCE 2022; 13:993961. [PMID: 36147239 PMCID: PMC9486080 DOI: 10.3389/fpls.2022.993961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 08/12/2022] [Indexed: 06/16/2023]
Abstract
The United Nations predicts that by 2050, the world's total population will increase to 9.15 billion, but the per capita cropland will drop to 0.151°hm2. The acceleration of urbanization often comes at the expense of the encroachment of cropland, the unplanned expansion of urban area has adversely affected cultivation. Therefore, the automatic extraction of buildings, which are the main carriers of urban population activities, in remote sensing images has become a more meaningful cropland observation task. To solve the shortcomings of traditional building extraction methods such as insufficient utilization of image information, relying on manual characterization, etc. A U-Net based deep learning building extraction model is proposed and named AttsegGAN. This study proposes an adversarial loss based on the Generative Adversarial Network in terms of training strategy, and the additionally trained learnable discriminator is used as a distance measurer for the two probability distributions of ground truth Pdata and prediction P g . In addition, for the sharpness of the building edge, the Sobel edge loss based on the Sobel operator is weighted and jointly participated in the training. In WHU building dataset, this study applies the components and strategies step by step, and verifies their effectiveness. Furthermore, the addition of the attention module is also subjected to ablation experiments and the final framework is determined. Compared with the original, AttsegGAN improved by 0.0062, 0.0027, and 0.0055 on Acc, F1, and IoU respectively after adopting all improvements. In the comparative experiment. AttsegGAN is compared with state-of-the-arts including U-Net, DeeplabV3+, PSPNet, and DANet on both WHU and Massachusetts building dataset. In WHU dataset, AttsegGAN achieved 0.9875, 0.9435, and 0.8907 on Acc, F1, and IoU, surpassed U-Net by 0.0260, 0.1183, and 0.1883, respectively, demonstrated the effectiveness of the proposed components in a similar hourglass structure. In Massachusetts dataset, AttsegGAN also surpassed state-of-the-arts, achieved 0.9395, 0.8328, and 0.7130 on Acc, F1, and IoU, respectively, it improved IoU by 0.0412 over the second-ranked PSPNet, and it was 0.0025 and 0.0101 higher than the second place in Acc and F1.
Collapse
Affiliation(s)
- Junshu Wang
- College of Electronic Engineering, College of Artificial Intelligence, South China Agricultural University, Guangzhou, China
| | - Mingrui Cai
- College of Electronic Engineering, College of Artificial Intelligence, South China Agricultural University, Guangzhou, China
| | - Yifan Gu
- College of Electronic Engineering, College of Artificial Intelligence, South China Agricultural University, Guangzhou, China
| | - Zhen Liu
- College of Electronic Engineering, College of Artificial Intelligence, South China Agricultural University, Guangzhou, China
| | - Xiaoxin Li
- College of Electronic Engineering, College of Artificial Intelligence, South China Agricultural University, Guangzhou, China
| | - Yuxing Han
- Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
| |
Collapse
|
29
|
Li Y, Zhang Y, Cui W, Lei B, Kuang X, Zhang T. Dual Encoder-Based Dynamic-Channel Graph Convolutional Network With Edge Enhancement for Retinal Vessel Segmentation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2022; 41:1975-1989. [PMID: 35167444 DOI: 10.1109/tmi.2022.3151666] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Retinal vessel segmentation with deep learning technology is a crucial auxiliary method for clinicians to diagnose fundus diseases. However, the deep learning approaches inevitably lose the edge information, which contains spatial features of vessels while performing down-sampling, leading to the limited segmentation performance of fine blood vessels. Furthermore, the existing methods ignore the dynamic topological correlations among feature maps in the deep learning framework, resulting in the inefficient capture of the channel characterization. To address these limitations, we propose a novel dual encoder-based dynamic-channel graph convolutional network with edge enhancement (DE-DCGCN-EE) for retinal vessel segmentation. Specifically, we first design an edge detection-based dual encoder to preserve the edge of vessels in down-sampling. Secondly, we investigate a dynamic-channel graph convolutional network to map the image channels to the topological space and synthesize the features of each channel on the topological map, which solves the limitation of insufficient channel information utilization. Finally, we study an edge enhancement block, aiming to fuse the edge and spatial features in the dual encoder, which is beneficial to improve the accuracy of fine blood vessel segmentation. Competitive experimental results on five retinal image datasets validate the efficacy of the proposed DE-DCGCN-EE, which achieves more remarkable segmentation results against the other state-of-the-art methods, indicating its potential clinical application.
Collapse
|
30
|
GraformerDIR: Graph convolution transformer for deformable image registration. Comput Biol Med 2022; 147:105799. [DOI: 10.1016/j.compbiomed.2022.105799] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Revised: 05/06/2022] [Accepted: 06/26/2022] [Indexed: 01/02/2023]
|
31
|
Fang F, Zhang P, Zhou B, Qian K, Gan Y. Atten-GAN: Pedestrian Trajectory Prediction with GAN Based on Attention Mechanism. Cognit Comput 2022. [DOI: 10.1007/s12559-022-10029-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
32
|
Wang J, Mo W, Wu Y, Xu X, Li Y, Ye J, Lai X. Combined Channel Attention and Spatial Attention Module Network for Chinese Herbal Slices Automated Recognition. Front Neurosci 2022; 16:920820. [PMID: 35769703 PMCID: PMC9234258 DOI: 10.3389/fnins.2022.920820] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Accepted: 05/16/2022] [Indexed: 11/13/2022] Open
Abstract
Chinese Herbal Slices (CHS) are critical components of Traditional Chinese Medicine (TCM); the accurate recognition of CHS is crucial for applying to medicine, production, and education. However, existing methods to recognize the CHS are mainly performed by experienced professionals, which may not meet vast CHS market demand due to time-consuming and the limited number of professionals. Although some automated CHS recognition approaches have been proposed, the performance still needs further improvement because they are primarily based on the traditional machine learning with hand-crafted features, resulting in relatively low accuracy. Additionally, few CHS datasets are available for research aimed at practical application. To comprehensively address these problems, we propose a combined channel attention and spatial attention module network (CCSM-Net) for efficiently recognizing CHS with 2-D images. The CCSM-Net integrates channel and spatial attentions, focusing on the most important information as well as the position of the information of CHS image. Especially, pairs of max-pooling and average pooling operations are used in the CA and SA module to aggregate the channel information of the feature map. Then, a dataset of 14,196 images with 182 categories of commonly used CHS is constructed. We evaluated our framework on the constructed dataset. Experimental results show that the proposed CCSM-Net indicates promising performance and outperforms other typical deep learning algorithms, achieving a recognition rate of 99.27%, a precision of 99.33%, a recall of 99.27%, and an F1-score of 99.26% with different numbers of CHS categories.
Collapse
Affiliation(s)
- Jianqing Wang
- School of Medical Technology and Information Engineering, Zhejiang Chinese Medical University, Hangzhou, China
| | - Weitao Mo
- School of Medical Technology and Information Engineering, Zhejiang Chinese Medical University, Hangzhou, China
| | - Yan Wu
- School of Medical Technology and Information Engineering, Zhejiang Chinese Medical University, Hangzhou, China
| | - Xiaomei Xu
- School of Medical Technology and Information Engineering, Zhejiang Chinese Medical University, Hangzhou, China
| | - Yi Li
- School of Medical Technology and Information Engineering, Zhejiang Chinese Medical University, Hangzhou, China
| | - Jianming Ye
- First Affiliated Hospital, Gannan Medical University, Ganzhou, China
| | - Xiaobo Lai
- School of Medical Technology and Information Engineering, Zhejiang Chinese Medical University, Hangzhou, China
- First Affiliated Hospital, Gannan Medical University, Ganzhou, China
| |
Collapse
|
33
|
Li J, Zha S, Chen C, Ding M, Zhang T, Yu H. Attention Guided Global Enhancement and Local Refinement Network for Semantic Segmentation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:3211-3223. [PMID: 35436194 DOI: 10.1109/tip.2022.3166673] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The encoder-decoder architecture is widely used as a lightweight semantic segmentation network. However, it struggles with a limited performance compared to a well-designed Dilated-FCN model for two major problems. First, commonly used upsampling methods in the decoder such as interpolation and deconvolution suffer from a local receptive field, unable to encode global contexts. Second, low-level features may bring noises to the network decoder through skip connections for the inadequacy of semantic concepts in early encoder layers. To tackle these challenges, a Global Enhancement Method is proposed to aggregate global information from high-level feature maps and adaptively distribute them to different decoder layers, alleviating the shortage of global contexts in the upsampling process. Besides, aLocal Refinement Module is developed by utilizing the decoder features as the semantic guidance to refine the noisy encoder features before the fusion of these two (the decoder features and the encoder features). Then, the two methods are integrated into a Context Fusion Block, and based on that, a novel Attention guided Global enhancement and Local refinement Network (AGLN) is elaborately designed. Extensive experiments on PASCAL Context, ADE20K, and PASCAL VOC 2012 datasets have demonstrated the effectiveness of the proposed approach. In particular, with a vanilla ResNet-101 backbone, AGLN achieves the state-of-the-art result (56.23% mean IOU) on the PASCAL Context dataset. The code is available at https://github.com/zhasen1996/AGLN.
Collapse
|
34
|
Peng Y, Zhu W, Chen Z, Shi F, Wang M, Zhou Y, Wang L, Shen Y, Xiang D, Chen F, Chen X. AFENet: Attention Fusion Enhancement Network for Optic Disc Segmentation of Premature Infants. Front Neurosci 2022; 16:836327. [PMID: 35516802 PMCID: PMC9063315 DOI: 10.3389/fnins.2022.836327] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Accepted: 02/09/2022] [Indexed: 11/16/2022] Open
Abstract
Retinopathy of prematurity and ischemic brain injury resulting in periventricular white matter damage are the main causes of visual impairment in premature infants. Accurate optic disc (OD) segmentation has important prognostic significance for the auxiliary diagnosis of the above two diseases of premature infants. Because of the complexity and non-uniform illumination and low contrast between background and the target area of the fundus images, the segmentation of OD for infants is challenging and rarely reported in the literature. In this article, to tackle these problems, we propose a novel attention fusion enhancement network (AFENet) for the accurate segmentation of OD in the fundus images of premature infants by fusing adjacent high-level semantic information and multiscale low-level detailed information from different levels based on encoder-decoder network. Specifically, we first design a dual-scale semantic enhancement (DsSE) module between the encoder and the decoder inspired by self-attention mechanism, which can enhance the semantic contextual information for the decoder by reconstructing skip connection. Then, to reduce the semantic gaps between the high-level and low-level features, a multiscale feature fusion (MsFF) module is developed to fuse multiple features of different levels at the top of encoder by using attention mechanism. Finally, the proposed AFENet was evaluated on the fundus images of preterm infants for OD segmentation, which shows that the proposed two modules are both promising. Based on the baseline (Res34UNet), using DsSE or MsFF module alone can increase Dice similarity coefficients by 1.51 and 1.70%, respectively, whereas the integration of the two modules together can increase 2.11%. Compared with other state-of-the-art segmentation methods, the proposed AFENet achieves a high segmentation performance.
Collapse
Affiliation(s)
- Yuanyuan Peng
- Analysis and Visualization Lab, School of Electronics and Information Engineering and Medical Image Processing, Soochow University, Suzhou, China
| | - Weifang Zhu
- Analysis and Visualization Lab, School of Electronics and Information Engineering and Medical Image Processing, Soochow University, Suzhou, China
| | - Zhongyue Chen
- Analysis and Visualization Lab, School of Electronics and Information Engineering and Medical Image Processing, Soochow University, Suzhou, China
| | - Fei Shi
- Analysis and Visualization Lab, School of Electronics and Information Engineering and Medical Image Processing, Soochow University, Suzhou, China
| | - Meng Wang
- Analysis and Visualization Lab, School of Electronics and Information Engineering and Medical Image Processing, Soochow University, Suzhou, China
| | - Yi Zhou
- Analysis and Visualization Lab, School of Electronics and Information Engineering and Medical Image Processing, Soochow University, Suzhou, China
| | - Lianyu Wang
- Analysis and Visualization Lab, School of Electronics and Information Engineering and Medical Image Processing, Soochow University, Suzhou, China
| | - Yuhe Shen
- Analysis and Visualization Lab, School of Electronics and Information Engineering and Medical Image Processing, Soochow University, Suzhou, China
| | - Daoman Xiang
- Guangzhou Women and Children’s Medical Center, Guangzhou, China
| | - Feng Chen
- Guangzhou Women and Children’s Medical Center, Guangzhou, China
| | - Xinjian Chen
- Analysis and Visualization Lab, School of Electronics and Information Engineering and Medical Image Processing, Soochow University, Suzhou, China
- State Key Laboratory of Radiation Medicine and Protection, Soochow University, Suzhou, China
| |
Collapse
|
35
|
Hong W, Sheng Q, Dong B, Wu L, Chen L, Zhao L, Liu Y, Zhu J, Liu Y, Xie Y, Yu Y, Wang H, Yuan J, Ge T, Zhao L, Liu X, Zhang Y. Automatic Detection of Secundum Atrial Septal Defect in Children Based on Color Doppler Echocardiographic Images Using Convolutional Neural Networks. Front Cardiovasc Med 2022; 9:834285. [PMID: 35463790 PMCID: PMC9019069 DOI: 10.3389/fcvm.2022.834285] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Accepted: 02/24/2022] [Indexed: 11/13/2022] Open
Abstract
Secundum atrial septal defect (ASD) is one of the most common congenital heart diseases (CHDs). This study aims to evaluate the feasibility and accuracy of automatic detection of ASD in children based on color Doppler echocardiographic images using convolutional neural networks. In this study, we propose a fully automatic detection system for ASD, which includes three stages. The first stage is used to identify four target echocardiographic views (that is, the subcostal view focusing on the atrium septum, the apical four-chamber view, the low parasternal four-chamber view, and the parasternal short-axis view). These four echocardiographic views are most useful for the diagnosis of ASD clinically. The second stage aims to segment the target cardiac structure and detect candidates for ASD. The third stage is to infer the final detection by utilizing the segmentation and detection results of the second stage. The proposed ASD detection system was developed and validated using a training set of 4,031 cases containing 370,057 echocardiographic images and an independent test set of 229 cases containing 203,619 images, of which 105 cases with ASD and 124 cases with intact atrial septum. Experimental results showed that the proposed ASD detection system achieved accuracy, recall, precision, specificity, and F1 score of 0.8833, 0.8545, 0.8577, 0.9136, and 0.8546, respectively on the image-level averages of the four most clinically useful echocardiographic views. The proposed system can automatically and accurately identify ASD, laying a good foundation for the subsequent artificial intelligence diagnosis of CHDs.
Collapse
Affiliation(s)
- Wenjing Hong
- Department of Pediatric Cardiology, Shanghai Children’s Medical Center, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Qiuyang Sheng
- Deepwise Artificial Intelligence Laboratory, Beijing, China
| | - Bin Dong
- Pediatric Artificial Intelligence Clinical Application and Research Center, Shanghai Children’s Medical Center, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
- Shanghai Engineering Research Center of Intelligence Pediatrics (SERCIP), Shanghai, China
| | - Lanping Wu
- Department of Pediatric Cardiology, Shanghai Children’s Medical Center, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Lijun Chen
- Department of Pediatric Cardiology, Shanghai Children’s Medical Center, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Leisheng Zhao
- Department of Pediatric Cardiology, Shanghai Children’s Medical Center, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Yiqing Liu
- Department of Pediatric Cardiology, Shanghai Children’s Medical Center, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Junxue Zhu
- Department of Pediatric Cardiology, Shanghai Children’s Medical Center, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Yiman Liu
- Department of Pediatric Cardiology, Shanghai Children’s Medical Center, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Yixin Xie
- Department of Pediatric Cardiology, Shanghai Children’s Medical Center, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Yizhou Yu
- Deepwise Artificial Intelligence Laboratory, Beijing, China
| | - Hansong Wang
- Pediatric Artificial Intelligence Clinical Application and Research Center, Shanghai Children’s Medical Center, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
- Shanghai Engineering Research Center of Intelligence Pediatrics (SERCIP), Shanghai, China
| | - Jiajun Yuan
- Pediatric Artificial Intelligence Clinical Application and Research Center, Shanghai Children’s Medical Center, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
- Shanghai Engineering Research Center of Intelligence Pediatrics (SERCIP), Shanghai, China
| | - Tong Ge
- Pediatric Artificial Intelligence Clinical Application and Research Center, Shanghai Children’s Medical Center, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
- Shanghai Engineering Research Center of Intelligence Pediatrics (SERCIP), Shanghai, China
| | - Liebin Zhao
- Shanghai Engineering Research Center of Intelligence Pediatrics (SERCIP), Shanghai, China
| | - Xiaoqing Liu
- Deepwise Artificial Intelligence Laboratory, Beijing, China
| | - Yuqi Zhang
- Department of Pediatric Cardiology, Shanghai Children’s Medical Center, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
36
|
Li L, Ma H. RDCTrans U-Net: A Hybrid Variable Architecture for Liver CT Image Segmentation. SENSORS (BASEL, SWITZERLAND) 2022; 22:2452. [PMID: 35408067 PMCID: PMC9003011 DOI: 10.3390/s22072452] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Revised: 03/15/2022] [Accepted: 03/18/2022] [Indexed: 06/14/2023]
Abstract
Segmenting medical images is a necessary prerequisite for disease diagnosis and treatment planning. Among various medical image segmentation tasks, U-Net-based variants have been widely used in liver tumor segmentation tasks. In view of the highly variable shape and size of tumors, in order to improve the accuracy of segmentation, this paper proposes a U-Net-based hybrid variable structure-RDCTrans U-Net for liver tumor segmentation in computed tomography (CT) examinations. We design a backbone network dominated by ResNeXt50 and supplemented by dilated convolution to increase the network depth, expand the perceptual field, and improve the efficiency of feature extraction without increasing the parameters. At the same time, Transformer is introduced in down-sampling to increase the network's overall perception and global understanding of the image and to improve the accuracy of liver tumor segmentation. The method proposed in this paper tests the segmentation performance of liver tumors on the LiTS (Liver Tumor Segmentation) dataset. It obtained 89.22% mIoU and 98.91% Acc, for liver and tumor segmentation. The proposed model also achieved 93.38% Dice and 89.87% Dice, respectively. Compared with the original U-Net and the U-Net model that introduces dense connection, attention mechanism, and Transformer, respectively, the method proposed in this paper achieves SOTA (state of art) results.
Collapse
Affiliation(s)
- Lingyun Li
- College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China;
| | - Hongbing Ma
- College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China;
- Beijing National Research Center for Information Science and Technology, Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
| |
Collapse
|
37
|
Zhu X, He Z, Zhao L, Dai Z, Yang Q. A Cascade Attention Based Facial Expression Recognition Network by Fusing Multi-Scale Spatio-Temporal Features. SENSORS 2022; 22:s22041350. [PMID: 35214248 PMCID: PMC8874494 DOI: 10.3390/s22041350] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Revised: 02/07/2022] [Accepted: 02/08/2022] [Indexed: 02/06/2023]
Abstract
The performance of a facial expression recognition network degrades obviously under situations of uneven illumination or partial occluded face as it is quite difficult to pinpoint the attention hotspots on the dynamically changing regions (e.g., eyes, nose, and mouth) as precisely as possible. To address the above issue, by a hybrid of the attention mechanism and pyramid feature, this paper proposes a cascade attention-based facial expression recognition network on the basis of a combination of (i) local spatial feature, (ii) multi-scale-stereoscopic spatial context feature (extracted from the 3-scale pyramid feature), and (iii) temporal feature. Experiments on the CK+, Oulu-CASIA, and RAF-DB datasets obtained recognition accuracy rates of 99.23%, 89.29%, and 86.80%, respectively. It demonstrates that the proposed method outperforms the state-of-the-art methods in both the experimental and natural environment.
Collapse
Affiliation(s)
- Xiaoliang Zhu
- National Engineering Laboratory for Educational Big Data, Central China Normal University, Wuhan 430079, China;
| | - Zili He
- National Engineering Research Center for E-Learning, Central China Normal University, Wuhan 430079, China; (Z.H.); (Q.Y.)
| | - Liang Zhao
- National Engineering Laboratory for Educational Big Data, Central China Normal University, Wuhan 430079, China;
- Correspondence: (L.Z.); (Z.D.)
| | - Zhicheng Dai
- National Engineering Research Center for E-Learning, Central China Normal University, Wuhan 430079, China; (Z.H.); (Q.Y.)
- Correspondence: (L.Z.); (Z.D.)
| | - Qiaolai Yang
- National Engineering Research Center for E-Learning, Central China Normal University, Wuhan 430079, China; (Z.H.); (Q.Y.)
| |
Collapse
|
38
|
Wang Y, Wang C, Wu H, Chen P. An improved Deeplabv3+ semantic segmentation algorithm with multiple loss constraints. PLoS One 2022; 17:e0261582. [PMID: 35045083 PMCID: PMC8769336 DOI: 10.1371/journal.pone.0261582] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Accepted: 12/03/2021] [Indexed: 11/18/2022] Open
Abstract
Aiming at the problems of low segmentation accuracy and inaccurate object boundary segmentation in current semantic segmentation algorithms, a semantic segmentation algorithm using multiple loss function constraints and multi-level cascading residual structure is proposed. The multi-layer cascaded residual unit was used to increase the range of the network layer receptive field. A parallel network was constructed to extract different depth feature information, then different depth feature information and encoder output features are fused to obtain multiple outputs feature which build multiple losses with the label, thereby constraining the model to optimize the network. The proposed network was evaluated on Cityscapes and CamVid datasets. The experimental results show that the mean Intersection over Union ratio (MIoU) of the proposed algorithm is 3.07% and 3.59% higher than the original Deeplabv3+ algorithm, respectively.
Collapse
Affiliation(s)
- Yunyan Wang
- School of Electrical and Electronic Engineering, Hubei University of Technology, Wuhan, China
- Hubei University of Technology Cooperative Innovation Center of Hubei Province for Efficient Use of Solar Energy, Wuhan, China
| | - Chongyang Wang
- School of Electrical and Electronic Engineering, Hubei University of Technology, Wuhan, China
| | - Huaxuan Wu
- School of Electrical and Electronic Engineering, Hubei University of Technology, Wuhan, China
| | - Peng Chen
- School of Electrical and Electronic Engineering, Hubei University of Technology, Wuhan, China
| |
Collapse
|
39
|
Chen H, He X, Yang H, Qing L, Teng Q. A Feature-Enriched Deep Convolutional Neural Network for JPEG Image Compression Artifacts Reduction and its Applications. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:430-444. [PMID: 34793307 DOI: 10.1109/tnnls.2021.3124370] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The amount of multimedia data, such as images and videos, has been increasing rapidly with the development of various imaging devices and the Internet, bringing more stress and challenges to information storage and transmission. The redundancy in images can be reduced to decrease data size via lossy compression, such as the most widely used standard Joint Photographic Experts Group (JPEG). However, the decompressed images generally suffer from various artifacts (e.g., blocking, banding, ringing, and blurring) due to the loss of information, especially at high compression ratios. This article presents a feature-enriched deep convolutional neural network for compression artifacts reduction (FeCarNet, for short). Taking the dense network as the backbone, FeCarNet enriches features to gain valuable information via introducing multi-scale dilated convolutions, along with the efficient 1 ×1 convolution for lowering both parameter complexity and computation cost. Meanwhile, to make full use of different levels of features in FeCarNet, a fusion block that consists of attention-based channel recalibration and dimension reduction is developed for local and global feature fusion. Furthermore, short and long residual connections both in the feature and pixel domains are combined to build a multi-level residual structure, thereby benefiting the network training and performance. In addition, aiming at reducing computation complexity further, pixel-shuffle-based image downsampling and upsampling layers are, respectively, arranged at the head and tail of the FeCarNet, which also enlarges the receptive field of the whole network. Experimental results show the superiority of FeCarNet over state-of-the-art compression artifacts reduction approaches in terms of both restoration capacity and model complexity. The applications of FeCarNet on several computer vision tasks, including image deblurring, edge detection, image segmentation, and object detection, demonstrate the effectiveness of FeCarNet further.
Collapse
|
40
|
Wang G, Zhai Q. Feature fusion network based on strip pooling. Sci Rep 2021; 11:21270. [PMID: 34711889 PMCID: PMC8553855 DOI: 10.1038/s41598-021-00585-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2021] [Accepted: 10/14/2021] [Indexed: 11/25/2022] Open
Abstract
Contextual information is a key factor affecting semantic segmentation. Recently, many methods have tried to use the self-attention mechanism to capture more contextual information. However, these methods with self-attention mechanism need a huge computation. In order to solve this problem, a novel self-attention network, called FFANet, is designed to efficiently capture contextual information, which reduces the amount of calculation through strip pooling and linear layers. It proposes the feature fusion (FF) module to calculate the affinity matrix. The affinity matrix can capture the relationship between pixels. Then we multiply the affinity matrix with the feature map, which can selectively increase the weight of the region of interest. Extensive experiments on the public datasets (PASCAL VOC2012, CityScapes) and remote sensing dataset (DLRSD) have been conducted and achieved Mean Iou score 74.5%, 70.3%, and 63.9% respectively. Compared with the current typical algorithms, the proposed method has achieved excellent performance.
Collapse
Affiliation(s)
- Gaihua Wang
- School of Electrical and Electronic Engineering, Hubei University of Technology, Wuhan, 430068, China.,Hubei Key Laboratory for High-Efficiency Utilization of Solar Energy and Operation Control of Energy Storage System, Hubei University of Technology, Wuhan, 430068, China
| | - Qianyu Zhai
- School of Electrical and Electronic Engineering, Hubei University of Technology, Wuhan, 430068, China.
| |
Collapse
|
41
|
Liu M, Wang K, Ji R, Ge SS, Chen J. Person image generation with attention-based injection network. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.06.077] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
42
|
Wang W, Wang S, Li Y, Jin Y. Adaptive multi-scale dual attention network for semantic segmentation. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.06.068] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
43
|
Dual Branch Attention Network for Person Re-Identification. SENSORS 2021; 21:s21175839. [PMID: 34502731 PMCID: PMC8433887 DOI: 10.3390/s21175839] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/18/2021] [Revised: 08/22/2021] [Accepted: 08/25/2021] [Indexed: 11/16/2022]
Abstract
As a sub-direction of image retrieval, person re-identification (Re-ID) is usually used to solve the security problem of cross camera tracking and monitoring. A growing number of shopping centers have recently attempted to apply Re-ID technology. One of the development trends of related algorithms is using an attention mechanism to capture global and local features. We notice that these algorithms have apparent limitations. They only focus on the most salient features without considering certain detailed features. People's clothes, bags and even shoes are of great help to distinguish pedestrians. We notice that global features usually cover these important local features. Therefore, we propose a dual branch network based on a multi-scale attention mechanism. This network can capture apparent global features and inconspicuous local features of pedestrian images. Specifically, we design a dual branch attention network (DBA-Net) for better performance. These two branches can optimize the extracted features of different depths at the same time. We also design an effective block (called channel, position and spatial-wise attention (CPSA)), which can capture key fine-grained information, such as bags and shoes. Furthermore, based on ID loss, we use complementary triplet loss and adaptive weighted rank list loss (WRLL) on each branch during the training process. DBA-Net can not only learn semantic context information of the channel, position, and spatial dimensions but can integrate detailed semantic information by learning the dependency relationships between features. Extensive experiments on three widely used open-source datasets proved that DBA-Net clearly yielded overall state-of-the-art performance. Particularly on the CUHK03 dataset, the mean average precision (mAP) of DBA-Net achieved 83.2%.
Collapse
|
44
|
Luan S, Xue X, Ding Y, Wei W, Zhu B. Adaptive Attention Convolutional Neural Network for Liver Tumor Segmentation. Front Oncol 2021; 11:680807. [PMID: 34434891 PMCID: PMC8381250 DOI: 10.3389/fonc.2021.680807] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Accepted: 07/12/2021] [Indexed: 12/29/2022] Open
Abstract
Purpose Accurate segmentation of liver and liver tumors is critical for radiotherapy. Liver tumor segmentation, however, remains a difficult and relevant problem in the field of medical image processing because of the various factors like complex and variable location, size, and shape of liver tumors, low contrast between tumors and normal tissues, and blurred or difficult-to-define lesion boundaries. In this paper, we proposed a neural network (S-Net) that can incorporate attention mechanisms to end-to-end segmentation of liver tumors from CT images. Methods First, this study adopted a classical coding-decoding structure to realize end-to-end segmentation. Next, we introduced an attention mechanism between the contraction path and the expansion path so that the network could encode a longer range of semantic information in the local features and find the corresponding relationship between different channels. Then, we introduced long-hop connections between the layers of the contraction path and the expansion path, so that the semantic information extracted in both paths could be fused. Finally, the application of closed operation was used to dissipate the narrow interruptions and long, thin divide. This eliminated small cavities and produced a noise reduction effect. Results In this paper, we used the MICCAI 2017 liver tumor segmentation (LiTS) challenge dataset, 3DIRCADb dataset and doctors' manual contours of Hubei Cancer Hospital dataset to test the network architecture. We calculated the Dice Global (DG) score, Dice per Case (DC) score, volumetric overlap error (VOE), average symmetric surface distance (ASSD), and root mean square error (RMSE) to evaluate the accuracy of the architecture for liver tumor segmentation. The segmentation DG for tumor was found to be 0.7555, DC was 0.613, VOE was 0.413, ASSD was 1.186 and RMSE was 1.804. For a small tumor, DG was 0.3246 and DC was 0.3082. For a large tumor, DG was 0.7819 and DC was 0.7632. Conclusion S-Net obtained more semantic information with the introduction of an attention mechanism and long jump connection. Experimental results showed that this method effectively improved the effect of tumor recognition in CT images and could be applied to assist doctors in clinical treatment.
Collapse
Affiliation(s)
- Shunyao Luan
- Department of Optoelectronic Engineering, Huazhong University of Science and Technology, Wuhan, China
| | - Xudong Xue
- Oncology Radiotherapy Department, Hubei Cancer Hospital, Wuhan, China
| | - Yi Ding
- Oncology Radiotherapy Department, Hubei Cancer Hospital, Wuhan, China
| | - Wei Wei
- Oncology Radiotherapy Department, Hubei Cancer Hospital, Wuhan, China
| | - Benpeng Zhu
- Department of Optoelectronic Engineering, Huazhong University of Science and Technology, Wuhan, China
| |
Collapse
|
45
|
Abstract
Recently, deep learning to hash has extensively been applied to image retrieval, due to its low storage cost and fast query speed. However, there is a defect of insufficiency and imbalance when existing hashing methods utilize the convolutional neural network (CNN) to extract image semantic features and the extracted features do not include contextual information and lack relevance among features. Furthermore, the process of the relaxation hash code can lead to an inevitable quantization error. In order to solve these problems, this paper proposes deep hash with improved dual attention for image retrieval (DHIDA), which chiefly has the following contents: (1) this paper introduces the improved dual attention mechanism (IDA) based on the ResNet18 pre-trained module to extract the feature information of the image, which consists of the position attention module and the channel attention module; (2) when calculating the spatial attention matrix and channel attention matrix, the average value and maximum value of the column of the feature map matrix are integrated in order to promote the feature representation ability and fully leverage the features of each position; and (3) to reduce quantization error, this study designs a new piecewise function to directly guide the discrete binary code. Experiments on CIFAR-10, NUS-WIDE and ImageNet-100 show that the DHIDA algorithm achieves better performance.
Collapse
|
46
|
Yang L, Yang G, Chen X, Yang Q, Yao X, Bing Z, Niu Y, Huang L, Yang L. Deep Scoring Neural Network Replacing the Scoring Function Components to Improve the Performance of Structure-Based Molecular Docking. ACS Chem Neurosci 2021; 12:2133-2142. [PMID: 34081851 DOI: 10.1021/acschemneuro.1c00110] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Accurate prediction of protein-ligand interactions can greatly promote drug development. Recently, a number of deep-learning-based methods have been proposed to predict protein-ligand binding affinities. However, these methods independently extract the feature representations of proteins and ligands but ignore the relative spatial positions and interaction pairs between them. Here, we propose a virtual screening method based on deep learning, called Deep Scoring, which directly extracts the relative position information and atomic attribute information on proteins and ligands from the docking poses. Furthermore, we use two Resnets to extract the features of ligand atoms and protein residues, respectively, and generate an atom-residue interaction matrix to learn the underlying principles of the interactions between proteins and ligands. This is then followed by a dual attention network (DAN) to generate the attention for two related entities (i.e., proteins and ligands) and to weigh the contributions of each atom and residue to binding affinity prediction. As a result, Deep Scoring outperforms other structure-based deep learning methods in terms of screening performance (area under the receiver operating characteristic curve (AUC) of 0.901 for an unbiased DUD-E version), pose prediction (AUC of 0.935 for PDBbind test set), and generalization ability (AUC of 0.803 for the CHEMBL data set). Finally, Deep Scoring was used to select novel ERK2 inhibitor, and two compounds (D264-0698 and D483-1785) were obtained with potential inhibitory activity on ERK2 through the biological experiments.
Collapse
Affiliation(s)
- Lijuan Yang
- Institute of Modern Physics, Chinese Academy of Science, Lanzhou 730000, China
- School of Physics and Technology, Lanzhou University, Lanzhou 730000, China
- School of Physics, University of Chinese Academy of Science, Beijing 100049, China
- Advanced Energy Science and Technology Guangdong Laboratory, Huizhou 516000, China
| | - Guanghui Yang
- Institute of Modern Physics, Chinese Academy of Science, Lanzhou 730000, China
- Advanced Energy Science and Technology Guangdong Laboratory, Huizhou 516000, China
| | - Xiaolong Chen
- Institute of Modern Physics, Chinese Academy of Science, Lanzhou 730000, China
- Advanced Energy Science and Technology Guangdong Laboratory, Huizhou 516000, China
| | - Qiong Yang
- Institute of Modern Physics, Chinese Academy of Science, Lanzhou 730000, China
- Advanced Energy Science and Technology Guangdong Laboratory, Huizhou 516000, China
| | - Xiaojun Yao
- College of Chemistry and Chemical Engineering, Lanzhou University, Lanzhou 730000, China
| | - Zhitong Bing
- Institute of Modern Physics, Chinese Academy of Science, Lanzhou 730000, China
- Advanced Energy Science and Technology Guangdong Laboratory, Huizhou 516000, China
| | - Yuzhen Niu
- Shandong Provincial Research Center for Bioinformatic Engineering and Technique, School of Life Sciences, Shandong University of Technology, Zibo 255049, China
| | - Liang Huang
- School of Physics and Technology, Lanzhou University, Lanzhou 730000, China
| | - Lei Yang
- Institute of Modern Physics, Chinese Academy of Science, Lanzhou 730000, China
- Advanced Energy Science and Technology Guangdong Laboratory, Huizhou 516000, China
| |
Collapse
|
47
|
Sediqi KM, Lee HJ. A Novel Upsampling and Context Convolution for Image Semantic Segmentation. SENSORS 2021; 21:s21062170. [PMID: 33804591 PMCID: PMC8003770 DOI: 10.3390/s21062170] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 03/14/2021] [Accepted: 03/15/2021] [Indexed: 11/16/2022]
Abstract
Semantic segmentation, which refers to pixel-wise classification of an image, is a fundamental topic in computer vision owing to its growing importance in the robot vision and autonomous driving sectors. It provides rich information about objects in the scene such as object boundary, category, and location. Recent methods for semantic segmentation often employ an encoder-decoder structure using deep convolutional neural networks. The encoder part extracts features of the image using several filters and pooling operations, whereas the decoder part gradually recovers the low-resolution feature maps of the encoder into a full input resolution feature map for pixel-wise prediction. However, the encoder-decoder variants for semantic segmentation suffer from severe spatial information loss, caused by pooling operations or stepwise convolutions, and does not consider the context in the scene. In this paper, we propose a novel dense upsampling convolution method based on a guided filter to effectively preserve the spatial information of the image in the network. We further propose a novel local context convolution method that not only covers larger-scale objects in the scene but covers them densely for precise object boundary delineation. Theoretical analyses and experimental results on several benchmark datasets verify the effectiveness of our method. Qualitatively, our approach delineates object boundaries at a level of accuracy that is beyond the current excellent methods. Quantitatively, we report a new record of 82.86% and 81.62% of pixel accuracy on ADE20K and Pascal-Context benchmark datasets, respectively. In comparison with the state-of-the-art methods, the proposed method offers promising improvements.
Collapse
|
48
|
A Novel Post-Processing Method Based on a Weighted Composite Filter for Enhancing Semantic Segmentation Results. SENSORS 2020; 20:s20195500. [PMID: 32992816 PMCID: PMC7582749 DOI: 10.3390/s20195500] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Revised: 09/17/2020] [Accepted: 09/23/2020] [Indexed: 12/29/2022]
Abstract
Image semantic segmentation is one of the key problems in computer vision. Despite the enormous advances in applications, almost all the image semantic segmentation algorithms fail to achieve satisfactory segmentation results due to lack of sensitivity to details, or difficulty in evaluating the global similarity of pixels, or both. Posting-processing enhancement methods, as the outstandingly crucial means to ameliorate the above-mentioned inherent flaws of algorithms, are almost based on conditional random fields (CRFs). Inspired by CRFs, this paper proposes a novel post-processing enhancement framework with theoretical simplicity from the perspective of filtering, and a new weighted composite filter (WCF) is designed to enhance the segmentation masks in a unified framework. First, by adjusting the weight ratio, the WCF is decomposed into a local part and a global part. Secondly, a guided image filter is designed as the local filter, which can restore boundary information to present necessary details. Moreover, a minimum spanning tree (MST)-based filter is designed as the global filter to provide a natural measure of global pixel similarity for image matching. Thirdly, a unified post-processing enhancement framework, including selection and normalization, WCF and argmax, is designed. Finally, the effectiveness and superiority of the proposed method for enhancement, as well as its range of applications, are verified through experiments.
Collapse
|