1
|
Zhou ZF, Huang D, Wang CD. Pyramid contrastive learning for clustering. Neural Netw 2025; 185:107217. [PMID: 39919524 DOI: 10.1016/j.neunet.2025.107217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2024] [Revised: 01/01/2025] [Accepted: 01/26/2025] [Indexed: 02/09/2025]
Abstract
With its ability of joint representation learning and clustering via deep neural networks, the deep clustering have gained significant attention in recent years. Despite the considerable progress, most of the previous deep clustering methods still suffer from three critical limitations. First, they tend to associate some distribution-based clustering loss to the neural network, which often overlook the sample-wise contrastiveness for discriminative representation learning. Second, they generally utilize the features learned at a single layer for the clustering process, which, surprisingly, cannot go beyond a single layer to explore multiple layers for joint multi-layer (multi-stage) learning. Third, they typically use the convolutional neural network (CNN) for clustering images, which focus on local information yet cannot well capture the global dependencies. To tackle these issues, this paper presents a new deep clustering method called pyramid contrastive learning for clustering (PCLC), which is able to incorporate a pyramidal contrastive architecture to jointly enforce contrastive learning and clustering at multiple network layers (or stages). Particularly, for an input image, two types of augmentations are first performed to generate two paralleled augmented views. To bridge the gap between the CNN (for capturing local information) and the Transformer (for reflecting global dependencies), a mixed CNN-Transformer based encoder is utilized as the backbone, whose CNN-Transformer blocks are further divided into four stages, thus giving rise to a pyramid of multi-stage feature representations. Thereafter, multiple stages of twin contrastive learning are simultaneously conducted at both the instance-level and the cluster-level, through the optimization of which the final clustering can be achieved. Extensive experiments on multiple challenging image datasets demonstrate the superior clustering performance of PCLC over the state-of-the-art. The source code is available at https://github.com/Zachary-Chow/PCLC.
Collapse
Affiliation(s)
- Zi-Feng Zhou
- College of Mathematics and Informatics, South China Agricultural University, Guangzhou, China.
| | - Dong Huang
- College of Mathematics and Informatics, South China Agricultural University, Guangzhou, China; Key Laboratory of Smart Agricultural Technology in Tropical South China, Ministry of Agriculture and Rural Affairs, China.
| | - Chang-Dong Wang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China; Guangdong Key Laboratory of Big Data Analysis and Processing, Guangzhou, China.
| |
Collapse
|
2
|
Ji J, Feng S. Anchors Crash Tensor: Efficient and Scalable Tensorial Multi-View Subspace Clustering. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2025; 47:2660-2675. [PMID: 40031059 DOI: 10.1109/tpami.2025.3526790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Tensorial Multi-view Clustering (TMC), a prominent approach in multi-view clustering, leverages low-rank tensor learning to capture high-order correlation among views for consistent clustering structure identification. Despite its promising performance, the TMC algorithms face three key challenges: 1). The severe computational burden makes it difficult for TMC methods to handle large-scale datasets. 2). Estimation bias problem caused by the convex surrogate of the tensor rank. 3). Lack of explicit balance of consistency and complementarity. Being aware of these, we propose a basic framework Efficient and Scalable Tensorial Multi-View Subspace Clustering (ESTMC) for large-scale multi-view clustering. ESTMC integrates anchor representation learning and non-convex function-based low-rank tensor learning with a Generalized Non-convex Tensor Rank (GNTR) into a unified objective function, which enhances the efficiency of the existing subspace-based TMC framework. Furthermore, a novel model ESTMC-C with the proposed Enhanced Tensor Rank (ETR), Consistent Geometric Regularization (CGR), and Tensorial Exclusive Regularization (TER) is extended to balance the learning of consistency and complementarity among views, delivering divisible representations for the clustering task. Efficient iterative optimization algorithms are designed to solve the proposed ESTMC and ESTMC-C, which enjoy time-economical complexity and exhibit theoretical convergence. Extensive experimental results on various datasets demonstrate the superiority of the proposed algorithms as compared to state-of-the-art methods.
Collapse
|
3
|
Wang Z, Lin Q, Ma Y, Ma X. Local High-Order Graph Learning for Multi-View Clustering. IEEE TRANSACTIONS ON BIG DATA 2025; 11:761-773. [DOI: 10.1109/tbdata.2024.3433525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2025]
Affiliation(s)
- Zhi Wang
- School of Computer Science and Technology, Xidian University, Xian, China
| | - Qiang Lin
- School of Mathematic and Information, Northwest Minzu University, Lanzhou, China
| | - Yaxiong Ma
- School of Computer Science and Technology, Xidian University, Xian, China
| | - Xiaoke Ma
- School of Computer Science and Technology, Xidian University, Xian, China
| |
Collapse
|
4
|
Lin JQ, Chen MS, Zhu XR, Wang CD, Zhang H. Dual Information Enhanced Multiview Attributed Graph Clustering. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:6466-6477. [PMID: 38814767 DOI: 10.1109/tnnls.2024.3401449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Multiview attributed graph clustering is an important approach to partition multiview data based on the attribute characteristics and adjacent matrices from different views. Some attempts have been made in using graph neural network (GNN), which have achieved promising clustering performance. Despite this, few of them pay attention to the inherent specific information embedded in multiple views. Meanwhile, they are incapable of recovering the latent high-level representation from the low-level ones, greatly limiting the downstream clustering performance. To fill these gaps, a novel dual information enhanced multiview attributed graph clustering (DIAGC) method is proposed in this article. Specifically, the proposed method introduces the specific information reconstruction (SIR) module to disentangle the explorations of the consensus and specific information from multiple views, which enables graph convolutional network (GCN) to capture the more essential low-level representations. Besides, the contrastive learning (CL) module maximizes the agreement between the latent high-level representation and low-level ones and enables the high-level representation to satisfy the desired clustering structure with the help of the self-supervised clustering (SC) module. Extensive experiments on several real-world benchmarks demonstrate the effectiveness of the proposed DIAGC method compared with the state-of-the-art baselines.
Collapse
|
5
|
Xu X, Wang Z, Ren S, Niu S, Li D. Local-Global Geometric Information and View Complementarity Introduced Multiview Metric Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:5428-5441. [PMID: 38546990 DOI: 10.1109/tnnls.2024.3380020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
Geometry studies the spatial structure and location information of objects, providing a priori knowledge and intuitive explanation for classification methods. Considering samples from a geometric perspective offers a novel approach to understanding their information. In this article, we propose a method called local-global geometric information and view complementarity introduced multiview metric learning (GIVCMML). Our method effectively exploits the geometric information of multiview samples. The learned metric space retains the geometric relations of samples and makes them more separable. First, we propose the global geometrical constraint in the maximum margin criterion framework. By maximizing the distance between class centers in the metric space, we ensure that samples from different classes are well separated. Second, to maintain the manifold structure of the original space, we build an adjacency matrix that contains the sample label information. This helps explore the local geometric information of sample pairs. Finally, to better mine the complementary information of multiview samples, GIVCMML maximizes the correlation between each view in the metric space. This enables each view to adaptively learn from the others and explore the complementary information between views. We extensively evaluate the effectiveness of our method on real-world datasets. The experimental results demonstrate that GIVCMML achieves competitive performance compared with multiview metric learning (MvML) methods.
Collapse
|
6
|
Zhang HX, Huang D, Ling HB, Sun W, Wen Z. Learning clustering-friendly representations via partial information discrimination and cross-level interaction. Neural Netw 2024; 180:106696. [PMID: 39255633 DOI: 10.1016/j.neunet.2024.106696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Revised: 05/27/2024] [Accepted: 09/02/2024] [Indexed: 09/12/2024]
Abstract
Despite significant advances in the deep clustering research, there remain three critical limitations to most of the existing approaches. First, they often derive the clustering result by associating some distribution-based loss to specific network layers, neglecting the potential benefits of leveraging the contrastive sample-wise relationships. Second, they frequently focus on representation learning at the full-image scale, overlooking the discriminative information latent in partial image regions. Third, although some prior studies perform the learning process at multiple levels, they mostly lack the ability to exploit the interaction between different learning levels. To overcome these limitations, this paper presents a novel deep image clustering approach via Partial Information discrimination and Cross-level Interaction (PICI). Specifically, we utilize a Transformer encoder as the backbone, coupled with two types of augmentations to formulate two parallel views. The augmented samples, integrated with masked patches, are processed through the Transformer encoder to produce the class tokens. Subsequently, three partial information learning modules are jointly enforced, namely, the partial information self-discrimination (PISD) module for masked image reconstruction, the partial information contrastive discrimination (PICD) module for the simultaneous instance- and cluster-level contrastive learning, and the cross-level interaction (CLI) module to ensure the consistency across different learning levels. Through this unified formulation, our PICI approach for the first time, to our knowledge, bridges the gap between the masked image modeling and the deep contrastive clustering, offering a novel pathway for enhanced representation learning and clustering. Experimental results across six image datasets demonstrate the superiority of our PICI approach over the state-of-the-art. In particular, our approach achieves an ACC of 0.772 (0.634) on the RSOD (UC-Merced) dataset, which shows an improvement of 29.7% (24.8%) over the best baseline. The source code is available at https://github.com/Regan-Zhang/PICI.
Collapse
Affiliation(s)
- Hai-Xin Zhang
- College of Mathematics and Informatics, South China Agricultural University, China.
| | - Dong Huang
- College of Mathematics and Informatics, South China Agricultural University, China; Key Laboratory of Smart Agricultural Technology in Tropical South China, Ministryof Agriculture and Rural Affairs, China.
| | - Hua-Bao Ling
- School of Computer Science and Engineering, Sun Yat-sen University, China.
| | - Weijun Sun
- School of Automation, Guangdong University of Technology, China.
| | - Zihao Wen
- College of Mathematics and Informatics, South China Agricultural University, China.
| |
Collapse
|
7
|
Fang SG, Huang D, Cai XS, Wang CD, He C, Tang Y. Efficient Multi-View Clustering via Unified and Discrete Bipartite Graph Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:11436-11447. [PMID: 37030820 DOI: 10.1109/tnnls.2023.3261460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Although previous graph-based multi-view clustering (MVC) algorithms have gained significant progress, most of them are still faced with three limitations. First, they often suffer from high computational complexity, which restricts their applications in large-scale scenarios. Second, they usually perform graph learning either at the single-view level or at the view-consensus level, but often neglect the possibility of the joint learning of single-view and consensus graphs. Third, many of them rely on the k -means for discretization of the spectral embeddings, which lack the ability to directly learn the graph with discrete cluster structure. In light of this, this article presents an efficient MVC approach via u nified and d iscrete b ipartite g raph l earning (UDBGL). Specifically, the anchor-based subspace learning is incorporated to learn the view-specific bipartite graphs from multiple views, upon which the bipartite graph fusion is leveraged to learn a view-consensus bipartite graph with adaptive weight learning. Furthermore, the Laplacian rank constraint is imposed to ensure that the fused bipartite graph has discrete cluster structures (with a specific number of connected components). By simultaneously formulating the view-specific bipartite graph learning, the view-consensus bipartite graph learning, and the discrete cluster structure learning into a unified objective function, an efficient minimization algorithm is then designed to tackle this optimization problem and directly achieve a discrete clustering solution without requiring additional partitioning, which notably has linear time complexity in data size. Experiments on a variety of multi-view datasets demonstrate the robustness and efficiency of our UDBGL approach. The code is available at https://github.com/huangdonghere/UDBGL.
Collapse
|
8
|
Cai H, Huang W, Yang S, Ding S, Zhang Y, Hu B, Zhang F, Cheung YM. Realize Generative Yet Complete Latent Representation for Incomplete Multi-View Learning. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:3637-3652. [PMID: 38145535 DOI: 10.1109/tpami.2023.3346869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2023]
Abstract
In multi-view environment, it would yield missing observations due to the limitation of the observation process. The most current representation learning methods struggle to explore complete information by lacking either cross-generative via simply filling in missing view data, or solidative via inferring a consistent representation among the existing views. To address this problem, we propose a deep generative model to learn a complete generative latent representation, namely Complete Multi-view Variational Auto-Encoders (CMVAE), which models the generation of the multiple views from a complete latent variable represented by a mixture of Gaussian distributions. Thus, the missing view can be fully characterized by the latent variables and is resolved by estimating its posterior distribution. Accordingly, a novel variational lower bound is introduced to integrate view-invariant information into posterior inference to enhance the solidative of the learned latent representation. The intrinsic correlations between views are mined to seek cross-view generality, and information leading to missing views is fused by view weights to reach solidity. Benchmark experimental results in clustering, classification, and cross-view image generation tasks demonstrate the superiority of CMVAE, while time complexity and parameter sensitivity analyses illustrate the efficiency and robustness. Additionally, application to bioinformatics data exemplifies its practical significance.
Collapse
|
9
|
Deng X, Huang D, Wang CD. Heterogeneous Tri-stream Clustering Network. Neural Process Lett 2023. [DOI: 10.1007/s11063-023-11147-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
|
10
|
Multiview nonnegative matrix factorization with dual HSIC constraints for clustering. INT J MACH LEARN CYB 2022. [DOI: 10.1007/s13042-022-01742-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
11
|
Zhang GY, Huang D, Wang CD. Facilitated low-rank multi-view subspace clustering. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.110141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
|