1
|
Hu S, Zhang C, Zou G, Lou Z, Ye Y. Deep Multiview Clustering by Pseudo-Label Guided Contrastive Learning and Dual Correlation Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:3646-3658. [PMID: 38289840 DOI: 10.1109/tnnls.2024.3354731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
Deep multiview clustering (MVC) is to learn and utilize the rich relations across different views to enhance the clustering performance under a human-designed deep network. However, most existing deep MVCs meet two challenges. First, most current deep contrastive MVCs usually select the same instance across views as positive pairs and the remaining instances as negative pairs, which always leads to inaccurate contrastive learning (CL). Second, most deep MVCs only consider learning feature or cluster correlations across views, failing to explore the dual correlations. To tackle the above challenges, in this article, we propose a novel deep MVC framework by pseudo-label guided CL and dual correlation learning. Specifically, a novel pseudo-label guided CL mechanism is designed by using the pseudo-labels in each iteration to help removing false negative sample pairs, so that the CL for the feature distribution alignment can be more accurate, thus benefiting the discriminative feature learning. Different from most deep MVCs learning only one kind of correlation, we investigate both the feature and cluster correlations among views to discover the rich and comprehensive relations. Experiments on various datasets demonstrate the superiority of our method over many state-of-the-art compared deep MVCs. The source implementation code will be provided at https://github.com/ShizheHu/Deep-MVC-PGCL-DCL.
Collapse
|
2
|
Feng Q, Chen CLP, Liu L. A Review of Convex Clustering From Multiple Perspectives: Models, Optimizations, Statistical Properties, Applications, and Connections. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:13122-13142. [PMID: 37342947 DOI: 10.1109/tnnls.2023.3276393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/23/2023]
Abstract
Traditional partition-based clustering is very sensitive to the initialized centroids, which are easily stuck in the local minimum due to their nonconvex objectives. To this end, convex clustering is proposed by relaxing K -means clustering or hierarchical clustering. As an emerging and excellent clustering technology, convex clustering can solve the instability problems of partition-based clustering methods. Generally, convex clustering objective consists of the fidelity and the shrinkage terms. The fidelity term encourages the cluster centroids to estimate the observations and the shrinkage term shrinks the cluster centroids matrix so that their observations share the same cluster centroid in the same category. Regularized by the lpn -norm ( pn ∈ {1,2,+∞} ), the convex objective guarantees the global optimal solution of the cluster centroids. This survey conducts a comprehensive review of convex clustering. It starts with the convex clustering as well as its nonconvex variants and then concentrates on the optimization algorithms and the hyperparameters setting. In particular, the statistical properties, the applications, and the connections of convex clustering with other methods are reviewed and discussed thoroughly for a better understanding the convex clustering. Finally, we briefly summarize the development of convex clustering and present some potential directions for future research.
Collapse
|
3
|
Wang S, Huang S, Wu Z, Liu R, Chen Y, Zhang D. Heterogeneous graph convolutional network for multi-view semi-supervised classification. Neural Netw 2024; 178:106438. [PMID: 38906055 DOI: 10.1016/j.neunet.2024.106438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 05/19/2024] [Accepted: 06/05/2024] [Indexed: 06/23/2024]
Abstract
This paper proposes a novel approach to semantic representation learning from multi-view datasets, distinct from most existing methodologies which typically handle single-view data individually, maintaining a shared semantic link across the multi-view data via a unified optimization process. Notably, even recent advancements, such as Co-GCN, continue to treat each view as an independent graph, subsequently aggregating the respective GCN representations to form output representations, which ignores the complex semantic interactions among heterogeneous data. To address the issue, we design a unified framework to connect multi-view data with heterogeneous graphs. Specifically, our study envisions multi-view data as a heterogeneous graph composed of shared isomorphic nodes and multi-type edges, wherein the same nodes are shared across different views, but each specific view possesses its own unique edge type. This perspective motivates us to utilize the heterogeneous graph convolutional network (HGCN) to extract semantic representations from multi-view data for semi-supervised classification tasks. To the best of our knowledge, this is an early attempt to transfigure multi-view data into a heterogeneous graph within the realm of multi-view semi-supervised learning. In our approach, the original input of the HGCN is composed of concatenated multi-view matrices, and its convolutional operator (the graph Laplacian matrix) is adaptively learned from multi-type edges in a data-driven fashion. After rigorous experimentation on eight public datasets, our proposed method, hereafter referred to as HGCN-MVSC, demonstrated encouraging superiority over several state-of-the-art competitors for semi-supervised classification tasks.
Collapse
Affiliation(s)
- Shiping Wang
- College of Computer and Data Science, Fuzhou University, Fuzhou, 350108, China.
| | - Sujia Huang
- College of Computer and Data Science, Fuzhou University, Fuzhou, 350108, China.
| | - Zhihao Wu
- College of Computer and Data Science, Fuzhou University, Fuzhou, 350108, China.
| | - Rui Liu
- School of Computer Science, Beihang University, Beijing, 100191, China.
| | - Yong Chen
- School of Computer Science, Beijing University of Posts and Telecommunications, Beijing, 100871, China.
| | - Dell Zhang
- Thomson Reuters Labs, London, E14 5EP, UK.
| |
Collapse
|
4
|
Jin Z, Wang M, Tang C, Zheng X, Zhang W, Sha X, An S. Predicting miRNA-disease association via graph attention learning and multiplex adaptive modality fusion. Comput Biol Med 2024; 169:107904. [PMID: 38181611 DOI: 10.1016/j.compbiomed.2023.107904] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Revised: 12/12/2023] [Accepted: 12/23/2023] [Indexed: 01/07/2024]
Abstract
miRNAs are a class of small non-coding RNA molecules that play important roles in gene regulation. They are crucial for maintaining normal cellular functions, and dysregulation or dysfunction of miRNAs which are linked to the onset and advancement of multiple human diseases. Research on miRNAs has unveiled novel avenues in the realm of the diagnosis, treatment, and prevention of human diseases. However, clinical trials pose challenges and drawbacks, such as complexity and time-consuming processes, which create obstacles for many researchers. Graph Attention Network (GAT) has shown excellent performance in handling graph-structured data for tasks such as link prediction. Some studies have successfully applied GAT to miRNA-disease association prediction. However, there are several drawbacks to existing methods. Firstly, most of the previous models rely solely on concatenation operations to merge features of miRNAs and diseases, which results in the deprivation of significant modality-specific information and even the inclusion of redundant information. Secondly, as the number of layers in GAT increases, there is a possibility of excessive smoothing in the feature extraction process, which significantly affects the prediction accuracy. To address these issues and effectively complete miRNA disease prediction tasks, we propose an innovative model called Multiplex Adaptive Modality Fusion Graph Attention Network (MAMFGAT). MAMFGAT utilizes GAT as the main structure for feature aggregation and incorporates a multi-modal adaptive fusion module to extract features from three interconnected networks: the miRNA-disease association network, the miRNA similarity network, and the disease similarity network. It employs adaptive learning and cross-modality contrastive learning to fuse more effective miRNA and disease feature embeddings as well as incorporates multi-modal residual feature fusion to tackle the problem of excessive feature smoothing in GATs. Finally, we employ a Multi-Layer Perceptron (MLP) model that takes the embeddings of miRNA and disease features as input to anticipate the presence of potential miRNA-disease associations. Extensive experimental results provide evidence of the superior performance of MAMFGAT in comparison to other state-of-the-art methods. To validate the significance of various modalities and assess the efficacy of the designed modules, we performed an ablation analysis. Furthermore, MAMFGAT shows outstanding performance in three cancer case studies, indicating that it is a reliable method for studying the association between miRNA and diseases. The implementation of MAMFGAT can be accessed at the following GitHub repository: https://github.com/zixiaojin66/MAMFGAT-master.
Collapse
Affiliation(s)
- Zixiao Jin
- School of Computer, China University of Geosciences, Wuhan, 430074, China.
| | - Minhui Wang
- Department of Pharmacy, Lianshui People's Hospital of Kangda College Affiliated to Nanjing Medical University, Huai'an 223300, China.
| | - Chang Tang
- School of Computer, China University of Geosciences, Wuhan, 430074, China.
| | - Xiao Zheng
- School of Computer, National University of Defense Technology, Changsha, 410073, China.
| | - Wen Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China.
| | - Xiaofeng Sha
- Department of Oncology, Huai'an Hongze District People's Hospital, Huai'an, 223100, China.
| | - Shan An
- JD Health International Inc., China.
| |
Collapse
|
5
|
He G, Jiang W, Peng R, Yin M, Han M. Soft Subspace Based Ensemble Clustering for Multivariate Time Series Data. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:7761-7774. [PMID: 35157594 DOI: 10.1109/tnnls.2022.3146136] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Recently, multivariate time series (MTS) clustering has gained lots of attention. However, state-of-the-art algorithms suffer from two major issues. First, few existing studies consider correlations and redundancies between variables of MTS data. Second, since different clusters usually exist in different intrinsic variables, how to efficiently enhance the performance by mining the intrinsic variables of a cluster is challenging work. To deal with these issues, we first propose a variable-weighted K-medoids clustering algorithm (VWKM) based on the importance of a variable for a cluster. In VWKM, the proposed variable weighting scheme could identify the important variables for a cluster, which can also provide knowledge and experience to related experts. Then, a Reverse nearest neighborhood-based density Peaks approach (RP) is proposed to handle the problem of initialization sensitivity of VWKM. Next, based on VWKM and the density peaks approach, an ensemble Clustering framework (SSEC) is advanced to further enhance the clustering performance. Experimental results on ten MTS datasets show that our method works well on MTS datasets and outperforms the state-of-the-art clustering ensemble approaches.
Collapse
|
6
|
Kanwal M, Ur Rehman MM, Farooq MU, Chae DK. Mask-Transformer-Based Networks for Teeth Segmentation in Panoramic Radiographs. Bioengineering (Basel) 2023; 10:843. [PMID: 37508871 PMCID: PMC10376801 DOI: 10.3390/bioengineering10070843] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 07/10/2023] [Accepted: 07/12/2023] [Indexed: 07/30/2023] Open
Abstract
Teeth segmentation plays a pivotal role in dentistry by facilitating accurate diagnoses and aiding the development of effective treatment plans. While traditional methods have primarily focused on teeth segmentation, they often fail to consider the broader oral tissue context. This paper proposes a panoptic-segmentation-based method that combines the results of instance segmentation with semantic segmentation of the background. Particularly, we introduce a novel architecture for instance teeth segmentation that leverages a dual-path transformer-based network, integrated with a panoptic quality (PQ) loss function. The model directly predicts masks and their corresponding classes, with the PQ loss function streamlining the training process. Our proposed architecture features a dual-path transformer block that facilitates bi-directional communication between the pixel path CNN and the memory path. It also contains a stacked decoder block that aggregates multi-scale features across different decoding resolutions. The transformer block integrates pixel-to-memory feedback attention, pixel-to-pixel self-attention, and memory-to-pixel and memory-to-memory self-attention mechanisms. The output heads process features to predict mask classes, while the final mask is obtained by multiplying memory path and pixel path features. When applied to the UFBA-UESC Dental Image dataset, our model exhibits a substantial improvement in segmentation performance, surpassing existing state-of-the-art techniques in terms of performance and robustness. Our research signifies an essential step forward in teeth segmentation and contributes to a deeper understanding of oral structures.
Collapse
Affiliation(s)
| | - Muhammad Mutti Ur Rehman
- Department of Computer and Software Engineering, National University of Science and Technology, Islamabad 43701, Pakistan
| | - Muhammad Umar Farooq
- Department of Computer Science, Hanyang University, Seoul 04763, Republic of Korea
| | - Dong-Kyu Chae
- Department of Computer Science, Hanyang University, Seoul 04763, Republic of Korea
| |
Collapse
|
7
|
Li J, Liang B, Lu X, Li M, Lu G, Xu Y. From Global to Local: Multi-Patch and Multi-Scale Contrastive Similarity Learning for Unsupervised Defocus Blur Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:1158-1169. [PMID: 37022428 DOI: 10.1109/tip.2023.3240856] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Defocus blur detection (DBD), which aims to detect out-of-focus or in-focus pixels from a single image, has been widely applied to many vision tasks. To remove the limitation on the abundant pixel-level manual annotations, unsupervised DBD has attracted much attention in recent years. In this paper, a novel deep network named Multi-patch and Multi-scale Contrastive Similarity (M2CS) learning is proposed for unsupervised DBD. Specifically, the predicted DBD mask from a generator is first exploited to re-generate two composite images by transporting the estimated clear and unclear areas from the source image to realistic full-clear and full-blurred images, respectively. To encourage these two composite images to be completely in-focus or out-of-focus, a global similarity discriminator is exploited to measure the similarity of each pair in a contrastive way, through which each two positive samples (two clear images or two blurred images) are enforced to be close while each two negative samples (a clear image and a blurred image) are inversely far. Since the global similarity discriminator only focuses on the blur-level of a whole image and there do exist some fail-detected pixels which only cover a small part of areas, a set of local similarity discriminators are further designed to measure the similarity of image patches in multiple scales. Thanks to this joint global and local strategy, as well as the contrastive similarity learning, the two composite images are more efficiently moved to be all-clear or all-blurred. Experimental results on real-world datasets substantiate the superiority of our proposed method both in quantification and visualization. The source code is released at: https://github.com/jerysaw/M2CS.
Collapse
|
8
|
Tang C, Zheng X, Tang C. Adaptive Discriminative Regions Learning Network for Remote Sensing Scene Classification. SENSORS (BASEL, SWITZERLAND) 2023; 23:773. [PMID: 36679569 PMCID: PMC9865113 DOI: 10.3390/s23020773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 01/02/2023] [Accepted: 01/05/2023] [Indexed: 06/17/2023]
Abstract
As an auxiliary means of remote sensing (RS) intelligent interpretation, remote sensing scene classification (RSSC) attracts considerable attention and its performance has been improved significantly by the popular deep convolutional neural networks (DCNNs). However, there are still several challenges that hinder the practical applications of RSSC, such as complex composition of land cover, scale-variation of objects, and redundant and noisy areas for scene classification. In order to mitigate the impact of these issues, we propose an adaptive discriminative regions learning network for RSSC, referred as ADRL-Net briefly, which locates discriminative regions effectively for boosting the performance of RSSC by utilizing a novel self-supervision mechanism. Our proposed ADRL-Net consists of three main modules, including a discriminative region generator, a region discriminator, and a region scorer. Specifically, the discriminative region generator first generates some candidate regions which could be informative for RSSC. Then, the region discriminator evaluates the regions generated by region generator and provides feedback for the generator to update the informative regions. Finally, the region scorer makes prediction scores for the whole image by using the discriminative regions. In such a manner, the three modules of ADRL-Net can cooperate with each other and focus on the most informative regions of an image and reduce the interference of redundant regions for final classification, which is robust to the complex scene composition, object scales, and irrelevant information. In order to validate the efficacy of the proposed network, we conduct experiments on four widely used benchmark datasets, and the experimental results demonstrate that ADRL-Net consistently outperforms other state-of-the-art RSSC methods.
Collapse
Affiliation(s)
- Chuan Tang
- School of Computer Science, China University of Geosciences, No. 68 Jincheng Road, Wuhan 430078, China
| | - Xiao Zheng
- School of Computer, National University of Defense Technology, Deya Road, Changsha 410073, China
| | - Chang Tang
- School of Computer Science, China University of Geosciences, No. 68 Jincheng Road, Wuhan 430078, China
| |
Collapse
|
9
|
Jiang L, Tang C, Zhou H. White blood cell classification via a discriminative region detection assisted feature aggregation network. BIOMEDICAL OPTICS EXPRESS 2022; 13:5246-5260. [PMID: 36425625 PMCID: PMC9664878 DOI: 10.1364/boe.462905] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Revised: 07/22/2022] [Accepted: 08/04/2022] [Indexed: 06/16/2023]
Abstract
White blood cell (WBC) classification plays an important role in human pathological diagnosis since WBCs will show different appearance when they fight with various disease pathogens. Although many previous white blood cell classification have been proposed and earned great success, their classification accuracy is still significantly affected by some practical issues such as uneven staining, boundary blur and nuclear intra-class variability. In this paper, we propose a deep neural network for WBC classification via discriminative region detection assisted feature aggregation (DRFA-Net), which can accurately locate the WBC area to boost final classification performance. Specifically, DRFA-Net uses an adaptive feature enhancement module to refine multi-level deep features in a bilateral manner for efficiently capturing both high-level semantic information and low-level details of WBC images. Considering the fact that background areas could inevitably produce interference, we design a network branch to detect the WBC area with the supervision of segmented ground truth. The bilaterally refined features obtained from two directions are finally aggregated for final classification, and the detected WBC area is utilized to highlight the features of discriminative regions by an attention mechanism. Extensive experiments on several public datasets are conducted to validate that our proposed DRFA-Net can obtain higher accuracies when compared with other state-of-the-art WBC classification methods.
Collapse
Affiliation(s)
- Lei Jiang
- Department of Hematology, Suzhou Ninth People’s Hospital, Suzhou 215299, China
| | - Chang Tang
- School of Computer Science, China University of Geosciences, Wuhan 430074, China
| | - Hua Zhou
- Department of Hematology, Funing People’s Hospital, Yancheng 224400, China
| |
Collapse
|
10
|
Deep Learning Using Endobronchial-Ultrasound-Guided Transbronchial Needle Aspiration Image to Improve the Overall Diagnostic Yield of Sampling Mediastinal Lymphadenopathy. Diagnostics (Basel) 2022; 12:diagnostics12092234. [PMID: 36140635 PMCID: PMC9497910 DOI: 10.3390/diagnostics12092234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 08/23/2022] [Accepted: 09/13/2022] [Indexed: 11/17/2022] Open
Abstract
Lung cancer is the biggest cause of cancer-related death worldwide. An accurate nodal staging is critical for the determination of treatment strategy for lung cancer patients. Endobronchial-ultrasound-guided transbronchial needle aspiration (EBUS-TBNA) has revolutionized the field of pulmonology and is considered to be extremely sensitive, specific, and secure for lung cancer staging through rapid on-site evaluation (ROSE), but manual visual inspection on the entire slide of EBUS smears is challenging, time consuming, and worse, subjective, on a large interobserver scale. To satisfy ROSE’s needs, a rapid, automated, and accurate diagnosis system using EBUS-TBNA whole-slide images (WSIs) is highly desired to improve diagnosis accuracy and speed, minimize workload and labor costs, and ensure reproducibility. We present a fast, efficient, and fully automatic deep-convolutional-neural-network-based system for advanced lung cancer staging on gigapixel EBUS-TBNA cytological WSIs. Each WSI was converted into a patch-based hierarchical structure and examined by the proposed deep convolutional neural network, generating the segmentation of metastatic lesions in EBUS-TBNA WSIs. To the best of the authors’ knowledge, this is the first research on fully automated enlarged mediastinal lymph node analysis using EBUS-TBNA cytological WSIs. We evaluated the robustness of the proposed framework on a dataset of 122 WSIs, and the proposed method achieved a high precision of 93.4%, sensitivity of 89.8%, DSC of 82.2%, and IoU of 83.2% for the first experiment (37.7% training and 62.3% testing) and a high precision of 91.8 ± 1.2, sensitivity of 96.3 ± 0.8, DSC of 94.0 ± 1.0, and IoU of 88.7 ± 1.8 for the second experiment using a three-fold cross-validation, respectively. Furthermore, the proposed method significantly outperformed the three state-of-the-art baseline models, including U-Net, SegNet, and FCN, in terms of precision, sensitivity, DSC, and Jaccard index, based on Fisher’s least significant difference (LSD) test (p<0.001). For a computational time comparison on a WSI, the proposed method was 2.5 times faster than U-Net, 2.3 times faster than SegNet, and 3.4 times faster than FCN, using a single GeForce GTX 1080 Ti, respectively. With its high precision and sensitivity, the proposed method demonstrated that it manifested the potential to reduce the workload of pathologists in their routine clinical practice.
Collapse
|
11
|
Lin X, Li H, Cai Q. Hierarchical complementary residual attention learning for defocus blur detection. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.06.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
12
|
Xu Z, Li J, Meng Y, Zhang X. CAP-YOLO: Channel Attention Based Pruning YOLO for Coal Mine Real-Time Intelligent Monitoring. SENSORS (BASEL, SWITZERLAND) 2022; 22:4331. [PMID: 35746116 PMCID: PMC9229694 DOI: 10.3390/s22124331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 05/21/2022] [Accepted: 06/02/2022] [Indexed: 06/15/2023]
Abstract
Real-time coal mine intelligent monitoring for pedestrian identifying and positioning is an important means to ensure safety in production. Traditional object detection models based on neural networks require significant computational and storage resources, which results in difficulty of deploying models on edge devices for real-time intelligent monitoring. To address the above problems, CAP-YOLO (Channel Attention based Pruning YOLO) and AEPSM (adaptive image enhancement parameter selection module) are proposed in this paper to achieve real-time intelligent analysis for coal mine surveillance videos. Firstly, DCAM (Deep Channel Attention Module) is proposed to evaluate the importance level of channels in YOLOv3. Secondly, the filters corresponding to the low importance channels are pruned to generate CAP-YOLO, which recovers the accuracy through fine-tuning. Finally, considering the lighting environments are varied in different coal mine fields, AEPSM is proposed to select parameters for CLAHE (Contrast Limited Adaptive Histogram Equalization) under different fields. Experiment results show that the weight size of CAP-YOLO is 8.3× smaller than YOLOv3, but only 7% lower than mAP, and the inference speed of CAP-YOLO is three times faster than that of YOLOv3. On NVIDIA Jetson TX2, CAP-YOLO realizes 31 FPS inference speed.
Collapse
|
13
|
Zhang Q, Shi Y, Zhang X, Zhang L. Residual attentive feature learning network for salient object detection. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.06.052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
14
|
Jiang Z, Xu X, Zhang L, Zhang C, Foo CS, Zhu C. MA-GANet: A Multi-Attention Generative Adversarial Network for Defocus Blur Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:3494-3508. [PMID: 35533163 DOI: 10.1109/tip.2022.3171424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Background clutters pose challenges to defocus blur detection. Existing approaches often produce artifact predictions in background areas with clutter and relatively low confident predictions in boundary areas. In this work, we tackle the above issues from two perspectives. Firstly, inspired by the recent success of self-attention mechanism, we introduce channel-wise and spatial-wise attention modules to attentively aggregate features at different channels and spatial locations to obtain more discriminative features. Secondly, we propose a generative adversarial training strategy to suppress spurious and low reliable predictions. This is achieved by utilizing a discriminator to identify predicted defocus map from ground-truth ones. As such, the defocus network (generator) needs to produce 'realistic' defocus map to minimize discriminator loss. We further demonstrate that the generative adversarial training allows exploiting additional unlabeled data to improve performance, a.k.a. semi-supervised learning, and we provide the first benchmark on semi-supervised defocus detection. Finally, we demonstrate that the existing evaluation metrics for defocus detection generally fail to quantify the robustness with respect to thresholding. For a fair and practical evaluation, we introduce an effective yet efficient AUFβ metric. Extensive experiments on three public datasets verify the superiority of the proposed methods compared against state-of-the-art approaches.
Collapse
|
15
|
Hierarchical edge-aware network for defocus blur detection. COMPLEX INTELL SYST 2022. [DOI: 10.1007/s40747-022-00711-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
AbstractDefocus blur detection (DBD) aims to separate blurred and unblurred regions for a given image. Due to its potential and practical applications, this task has attracted much attention. Most of the existing DBD models have achieved competitive performance by aggregating multi-level features extracted from fully convolutional networks. However, they also suffer from several challenges, such as coarse object boundaries of the defocus blur regions, background clutter, and the detection of low contrast focal regions. In this paper, we develop a hierarchical edge-aware network to solve the above problems, to the best of our knowledge, it is the first trial to develop an end-to-end network with edge awareness for DBD. We design an edge feature extraction network to capture boundary information, a hierarchical interior perception network is used to generate local and global context information, which is helpful to detect the low contrast focal regions. Moreover, a hierarchical edge-aware fusion network is proposed to hierarchically fuse edge information and semantic features. Benefiting from the rich edge information, the fused features can generate more accurate boundaries. Finally, we propose a progressive feature refinement network to refine the output features. Experimental results on two widely used DBD datasets demonstrate that the proposed model outperforms the state-of-the-art approaches.
Collapse
|
16
|
Defocus Blur detection via transformer encoder and edge guidance. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03303-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
17
|
Li Z, Tang C, Zheng X, Liu X, Zhang W, Zhu E. High-Order Correlation Preserved Incomplete Multi-View Subspace Clustering. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:2067-2080. [PMID: 35188891 DOI: 10.1109/tip.2022.3147046] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Incomplete multi-view clustering aims to exploit the information of multiple incomplete views to partition data into their clusters. Existing methods only utilize the pair-wise sample correlation and pair-wise view correlation to improve the clustering performance but neglect the high-order correlation of samples and that of views. To address this issue, we propose a high-order correlation preserved incomplete multi-view subspace clustering (HCP-IMSC) method which effectively recovers the missing views of samples and the subspace structure of incomplete multi-view data. Specifically, multiple affinity matrices constructed from the incomplete multi-view data are treated as a third-order low rank tensor with a tensor factorization regularization which preserves the high-order view correlation and sample correlation. Then, a unified affinity matrix can be obtained by fusing the view-specific affinity matrices in a self-weighted manner. A hypergraph is further constructed from the unified affinity matrix to preserve the high-order geometrical structure of the data with incomplete views. Then, the samples with missing views are restricted to be reconstructed by their neighbor samples under the hypergraph-induced hyper-Laplacian regularization. Furthermore, the learning of view-specific affinity matrices as well as the unified one, tensor factorization, and hyper-Laplacian regularization are integrated into a unified optimization framework. An iterative algorithm is designed to solve the resultant model. Experimental results on various benchmark datasets indicate the superiority of the proposed method. The code is implemented by using MATLAB R2018a and MindSpore library: https://github.com/ChangTang/HCP-IMSC.
Collapse
|
18
|
State-of-the-Art Approaches for Image Deconvolution Problems, including Modern Deep Learning Architectures. MICROMACHINES 2021; 12:mi12121558. [PMID: 34945408 PMCID: PMC8707587 DOI: 10.3390/mi12121558] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Revised: 11/29/2021] [Accepted: 12/09/2021] [Indexed: 01/06/2023]
Abstract
In modern digital microscopy, deconvolution methods are widely used to eliminate a number of image defects and increase resolution. In this review, we have divided these methods into classical, deep learning-based, and optimization-based methods. The review describes the major architectures of neural networks, such as convolutional and generative adversarial networks, autoencoders, various forms of recurrent networks, and the attention mechanism used for the deconvolution problem. Special attention is paid to deep learning as the most powerful and flexible modern approach. The review describes the major architectures of neural networks used for the deconvolution problem. We describe the difficulties in their application, such as the discrepancy between the standard loss functions and the visual content and the heterogeneity of the images. Next, we examine how to deal with this by introducing new loss functions, multiscale learning, and prior knowledge of visual content. In conclusion, a review of promising directions and further development of deconvolution methods in microscopy is given.
Collapse
|
19
|
Zhao W, Hou X, He Y, Lu H. Defocus Blur Detection via Boosting Diversity of Deep Ensemble Networks. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:5426-5438. [PMID: 34097609 DOI: 10.1109/tip.2021.3084101] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Existing defocus blur detection (DBD) methods usually explore multi-scale and multi-level features to improve performance. However, defocus blur regions normally have incomplete semantic information, which will reduce DBD's performance if it can't be used properly. In this paper, we address the above problem by exploring deep ensemble networks, where we boost diversity of defocus blur detectors to force the network to generate diverse results that some rely more on high-level semantic information while some ones rely more on low-level information. Then, diverse result ensemble makes detection errors cancel out each other. Specifically, we propose two deep ensemble networks (e.g., adaptive ensemble network (AENet) and encoder-feature ensemble network (EFENet)), which focus on boosting diversity while costing less computation. AENet constructs different light-weight sequential adapters for one backbone network to generate diverse results without introducing too many parameters and computation. AENet is optimized only by the self- negative correlation loss. On the other hand, we propose EFENet by exploring the diversity of multiple encoded features and ensemble strategies of features (e.g., group-channel uniformly weighted average ensemble and self-gate weighted ensemble). Diversity is represented by encoded features with less parameters, and a simple mean squared error loss can achieve the superior performance. Experimental results demonstrate the superiority over the state-of-the-arts in terms of accuracy and speed. Codes and models are available at: https://github.com/wdzhao123/DENets.
Collapse
|
20
|
Li J, Fan D, Yang L, Gu S, Lu G, Xu Y, Zhang D. Layer-Output Guided Complementary Attention Learning for Image Defocus Blur Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:3748-3763. [PMID: 33729938 DOI: 10.1109/tip.2021.3065171] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Defocus blur detection (DBD), which has been widely applied to various fields, aims to detect the out-of-focus or in-focus pixels from a single image. Despite the fact that the deep learning based methods applied to DBD have outperformed the hand-crafted feature based methods, the performance cannot still meet our requirement. In this paper, a novel network is established for DBD. Unlike existing methods which only learn the projection from the in-focus part to the ground-truth, both in-focus and out-of-focus pixels, which are completely and symmetrically complementary, are taken into account. Specifically, two symmetric branches are designed to jointly estimate the probability of focus and defocus pixels, respectively. Due to their complementary constraint, each layer in a branch is affected by an attention obtained from another branch, effectively learning the detailed information which may be ignored in one branch. The feature maps from these two branches are then passed through a unique fusion block to simultaneously get the two-channel output measured by a complementary loss. Additionally, instead of estimating only one binary map from a specific layer, each layer is encouraged to estimate the ground truth to guide the binary map estimation in its linked shallower layer followed by a top-to-bottom combination strategy, gradually exploiting the global and local information. Experimental results on released datasets demonstrate that our proposed method remarkably outperforms state-of-the-art algorithms.
Collapse
|