1
|
Lu W, Wang M, Yu Y, Ma L, Shi Y, Huang Z, Gong M. A novel self-supervised graph clustering method with reliable semi-supervision. Neural Netw 2025; 187:107418. [PMID: 40120553 DOI: 10.1016/j.neunet.2025.107418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2024] [Revised: 02/25/2025] [Accepted: 03/16/2025] [Indexed: 03/25/2025]
Abstract
Cluster analysis, as a core technique in unsupervised learning, has widespread applications. With the increasing complexity of data, deep clustering, which integrates the advantages of deep learning and traditional clustering algorithms, demonstrates outstanding performance in processing high-dimensional and complex data. However, when applied to graph data, deep clustering faces two major challenges: noise and sparsity. Noise introduces misleading connections, while sparsity makes it difficult to accurately capture relationships between nodes. These two issues not only increase the difficulty of feature extraction but also significantly affect clustering performance. To address these problems, we propose a novel Self-Supervised Graph Clustering model based on Reliable Semi-Supervision (SSGC-RSS). This model innovates through upstream and downstream components. The upstream component employs a dual-decoder graph autoencoder with joint clustering optimization, preserving latent information of features and graph structure, and alleviates the sparsity problem by generating cluster centers and pseudo-labels. The downstream component utilizes a semi-supervised graph attention encoding network based on highly reliable samples and their pseudo-labels to select reliable samples for training, thereby effectively reducing the interference of noise. Experimental results on multiple graph datasets demonstrate that, compared to existing methods, SSGC-RSS achieves significant performance improvements, with accuracy improvements of 0.9%, 2.0%, and 5.6% on Cora, Citeseer, and Pubmed datasets respectively, proving its effectiveness and superiority in complex graph data clustering tasks.
Collapse
Affiliation(s)
- Weijia Lu
- Science and Technology Department, Affiliated Hospital of Nantong University, Nantong, Jiangsu, 226001, China; Jianghai Hospital of Nantong Sutong Science and Technology Park, Nantong, Jiangsu, 226001, China.
| | - Min Wang
- School of Life Sciences, Shanghai University, Shanghai, 200444, China
| | - Yun Yu
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, Jiangsu, 211166, China
| | - Liang Ma
- Information Center Department, Affiliated Hospital of Nantong University, Nantong, Jiangsu, 226001, China
| | - Yaxiang Shi
- Network Information Center, Zhongda Hospital Southeast University, Nanjing 210009, China
| | - Zhongqiu Huang
- Department Of Information, The First Affiliated Hospital with Nanjing Medical University, Nanjing, Jiangsu, 210029, China
| | - Ming Gong
- Information Center Department, Affiliated Hospital of Nantong University, Nantong, Jiangsu, 226001, China
| |
Collapse
|
2
|
Zhao T, Guo X, Lin Y, Du B. MixIR: Mixing Input and Representations for Contrastive Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:8255-8264. [PMID: 39141459 DOI: 10.1109/tnnls.2024.3439538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/16/2024]
Abstract
Recently, contrastive learning has shown significant progress in learning visual representations from unlabeled data. The core idea is training the backbone to be invariant to different augmentations of an instance. While most methods only maximize the feature similarity between two augmented data, we further generate more challenging training samples and force the model to keep predicting aggregated representation on these hard samples. In this article, we propose MixIR, a mixture-based approach upon the traditional Siamese network. On the one hand, we input two augmented images of an instance to the backbone and obtain the aggregated representation by performing an elementwise maximum of two features. On the other hand, we take the mixture of these augmented images as input and expect the model prediction to be close to the aggregated representation. In this way, the model could access more variant data samples of an instance and keep predicting invariant representations for them. Thus, the learned model is more discriminative compared with previous contrastive learning methods. Extensive experiments on large-scale datasets show that MixIR steadily improves the baseline and achieves competitive results with state-of-the-art methods. Our code is available at https://github.com/happytianhao/MixIR.
Collapse
|
3
|
Wang J, Le Y, Cao D, Lu S, Quan Z, Wang M. Graph Reasoning With Supervised Contrastive Learning for Legal Judgment Prediction. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:2801-2815. [PMID: 38163308 DOI: 10.1109/tnnls.2023.3344634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2024]
Abstract
Given the fact descriptions of legal cases, the legal judgment prediction (LJP) problem aims to determine three judgment tasks of law articles, charges, and the term of penalty. Most existing studies have considered task dependencies while neglecting the prior dependencies of labels among different tasks. Therefore, how to make better use of the information on the relation dependencies among tasks and labels becomes a crucial issue. To this end, we transform the text classification problem into a node classification framework based on graph reasoning and supervised contrastive learning (SCL) techniques, named GraSCL. Specifically, we first design a graph reasoning network to model the potential dependency structures and facilitate relational learning under various graph topologies. Then, we introduce the SCL method for the LJP task to further leverage the label relation on the graph. To accommodate the node classification settings, we extend the traditional SCL method to novel variants for SCL at the node level, which allows the GraSCL framework to be trained efficiently even with small batches. Furthermore, to recognize the importance of hard negative samples in contrastive learning, we introduce a simple yet effective technique called online hard negative mining (OHNM) to enhance our SCL approach. This technique complements our SCL method and enables us to control the number and complexity of negative samples, leading to further improvements in the model's performance. Finally, extensive experiments are conducted on two well-known benchmarks, demonstrating the effectiveness and rationality of our proposed SCL approach as compared to the state-of-the-art competitors.
Collapse
|
4
|
Shao Y, Sun L, Jiao L, Liu X, Liu F, Li L, Yang S. CoT: Contourlet Transformer for Hierarchical Semantic Segmentation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:132-146. [PMID: 38408011 DOI: 10.1109/tnnls.2024.3367901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/28/2024]
Abstract
The Transformer-convolutional neural network (CNN) hybrid learning approach is gaining traction for balancing deep and shallow image features for hierarchical semantic segmentation. However, they are still confronted with a contradiction between comprehensive semantic understanding and meticulous detail extraction. To solve this problem, this article proposes a novel Transformer-CNN hybrid hierarchical network, dubbed contourlet transformer (CoT). In the CoT framework, the semantic representation process of the Transformer is unavoidably peppered with sparsely distributed points that, while not desired, demand finer detail. Therefore, we design a deep detail representation (DDR) structure to investigate their fine-grained features. First, through contourlet transform (CT), we distill the high-frequency directional components from the raw image, yielding localized features that accommodate the inductive bias of CNN. Second, a CNN deep sparse learning (DSL) module takes them as input to represent the underlying detailed features. This memory- and energy-efficient learning method can keep the same sparse pattern between input and output. Finally, the decoder hierarchically fuses the detailed features with the semantic features via an image reconstruction-like fashion. Experiments demonstrate that CoT achieves competitive performance on three benchmark datasets: PASCAL Context [57.21% mean intersection over union (mIoU)], ADE20K (54.16% mIoU), and Cityscapes (84.23% mIoU). Furthermore, we conducted robustness studies to validate its resistance against various sorts of corruption. Our code is available at: https://github.com/yilinshao/CoT-Contourlet-Transformer.
Collapse
|
5
|
Li J, Pan Y, Tsang IW. Taming Overconfident Prediction on Unlabeled Data From Hindsight. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:14151-14163. [PMID: 37220056 DOI: 10.1109/tnnls.2023.3274845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Minimizing prediction uncertainty on unlabeled data is a key factor to achieve good performance in semi-supervised learning (SSL). The prediction uncertainty is typically expressed as the entropy computed by the transformed probabilities in output space. Most existing works distill low-entropy prediction by either accepting the determining class (with the largest probability) as the true label or suppressing subtle predictions (with the smaller probabilities). Unarguably, these distillation strategies are usually heuristic and less informative for model training. From this discernment, this article proposes a dual mechanism, named adaptive sharpening (ADS), which first applies a soft-threshold to adaptively mask out determinate and negligible predictions, and then seamlessly sharpens the informed predictions, distilling certain predictions with the informed ones only. More importantly, we theoretically analyze the traits of ADS by comparing it with various distillation strategies. Numerous experiments verify that ADS significantly improves state-of-the-art SSL methods by making it a plug-in. Our proposed ADS forges a cornerstone for future distillation-based SSL research.
Collapse
|
6
|
Zhai P, Cong H, Zhu E, Zhao G, Yu Y, Li J. MVCNet: Multiview Contrastive Network for Unsupervised Representation Learning for 3-D CT Lesions. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:7376-7390. [PMID: 36150004 DOI: 10.1109/tnnls.2022.3203412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
With the renaissance of deep learning, automatic diagnostic algorithms for computed tomography (CT) have achieved many successful applications. However, they heavily rely on lesion-level annotations, which are often scarce due to the high cost of collecting pathological labels. On the other hand, the annotated CT data, especially the 3-D spatial information, may be underutilized by approaches that model a 3-D lesion with its 2-D slices, although such approaches have been proven effective and computationally efficient. This study presents a multiview contrastive network (MVCNet), which enhances the representations of 2-D views contrastively against other views of different spatial orientations. Specifically, MVCNet views each 3-D lesion from different orientations to collect multiple 2-D views; it learns to minimize a contrastive loss so that the 2-D views of the same 3-D lesion are aggregated, whereas those of different lesions are separated. To alleviate the issue of false negative examples, the uninformative negative samples are filtered out, which results in more discriminative features for downstream tasks. By linear evaluation, MVCNet achieves state-of-the-art accuracies on the lung image database consortium and image database resource initiative (LIDC-IDRI) (88.62%), lung nodule database (LNDb) (76.69%), and TianChi (84.33%) datasets for unsupervised representation learning. When fine-tuned on 10% of the labeled data, the accuracies are comparable to the supervised learning models (89.46% versus 85.03%, 73.85% versus 73.44%, 83.56% versus 83.34% on the three datasets, respectively), indicating the superiority of MVCNet in learning representations with limited annotations. Our findings suggest that contrasting multiple 2-D views is an effective approach to capturing the original 3-D information, which notably improves the utilization of the scarce and valuable annotated CT data.
Collapse
|
7
|
Tan D, Huang Z, Peng X, Zhong W, Mahalec V. Deep Adaptive Fuzzy Clustering for Evolutionary Unsupervised Representation Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:6103-6117. [PMID: 37027776 DOI: 10.1109/tnnls.2023.3243666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Cluster assignment of large and complex datasets is a crucial but challenging task in pattern recognition and computer vision. In this study, we explore the possibility of employing fuzzy clustering in a deep neural network framework. Thus, we present a novel evolutionary unsupervised learning representation model with iterative optimization. It implements the deep adaptive fuzzy clustering (DAFC) strategy that learns a convolutional neural network classifier from given only unlabeled data samples. DAFC consists of a deep feature quality-verifying model and a fuzzy clustering model, where deep feature representation learning loss function and embedded fuzzy clustering with the weighted adaptive entropy is implemented. We joint fuzzy clustering to the deep reconstruction model, in which fuzzy membership is utilized to represent a clear structure of deep cluster assignments and jointly optimize for the deep representation learning and clustering. Also, the joint model evaluates current clustering performance by inspecting whether the resampled data from estimated bottleneck space have consistent clustering properties to improve the deep clustering model progressively. Experiments on various datasets show that the proposed method obtains a substantially better performance for both reconstruction and clustering quality compared to the other state-of-the-art deep clustering methods, as demonstrated with the in-depth analysis in the extensive experiments.
Collapse
|
8
|
Zhao P, Pan Y, Li X, Chen X, Tsang IW, Liao L. Coarse-to-Fine Contrastive Learning on Graphs. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:4622-4634. [PMID: 37018665 DOI: 10.1109/tnnls.2022.3228556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Inspired by the impressive success of contrastive learning (CL), a variety of graph augmentation strategies have been employed to learn node representations in a self-supervised manner. Existing methods construct the contrastive samples by adding perturbations to the graph structure or node attributes. Although impressive results are achieved, it is rather blind to the wealth of prior information assumed: with the increase of the perturbation degree applied on the original graph: 1) the similarity between the original graph and the generated augmented graph gradually decreases and 2) the discrimination between all nodes within each augmented view gradually increases. In this article, we argue that both such prior information can be incorporated (differently) into the CL paradigm following our general ranking framework. In particular, we first interpret CL as a special case of learning to rank (L2R), which inspires us to leverage the ranking order among positive augmented views. Meanwhile, we introduce a self-ranking paradigm to ensure that the discriminative information among different nodes can be maintained and also be less altered to the perturbations of different degrees. Experiment results on various benchmark datasets verify the effectiveness of our algorithm compared with the supervised and unsupervised models.
Collapse
|
9
|
Li S, Liu F, Jiao L, Liu X, Chen P. Learning Salient Feature for Salient Object Detection Without Labels. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:1012-1025. [PMID: 36227820 DOI: 10.1109/tcyb.2022.3209978] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Supervised salient object detection (SOD) methods achieve state-of-the-art performance by relying on human-annotated saliency maps, while unsupervised methods attempt to achieve SOD by not using any annotations. In unsupervised SOD, how to obtain saliency in a completely unsupervised manner is a huge challenge. Existing unsupervised methods usually gain saliency by introducing other handcrafted feature-based saliency methods. In general, the location information of salient objects is included in the feature maps. If the features belonging to salient objects are called salient features and the features that do not belong to salient objects, such as background, are called nonsalient features, by dividing the feature maps into salient features and nonsalient features in an unsupervised way, then the object at the location of the salient feature is the salient object. Based on the above motivation, a novel method called learning salient feature (LSF) is proposed, which achieves unsupervised SOD by LSF from the data itself. This method takes enhancing salient feature and suppressing nonsalient features as the objective. Furthermore, a salient object localization method is proposed to roughly locate objects where the salient feature is located, so as to obtain the salient activation map. Usually, the object in the salient activation map is incomplete and contains a lot of noise. To address this issue, a saliency map update strategy is introduced to gradually remove noise and strengthen boundaries. The visualization of images and their salient activation maps show that our method can effectively learn salient visual objects. Experiments show that we achieve superior unsupervised performance on a series of datasets.
Collapse
|