1
|
Wang H, Zhang W, Wang Q, Ma X. Adaptive structural-guided multi-level representation learning with graph contrastive for incomplete multi-view clustering. INFORMATION FUSION 2025; 119:103035. [DOI: 10.1016/j.inffus.2025.103035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2025]
|
2
|
Dong Z, Jin J, Xiao Y, Xiao B, Wang S, Liu X, Zhu E. Subgraph Propagation and Contrastive Calibration for Incomplete Multiview Data Clustering. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:3218-3230. [PMID: 38236668 DOI: 10.1109/tnnls.2024.3350671] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
The success of multiview raw data mining relies on the integrity of attributes. However, each view faces various noises and collection failures, which leads to a condition that attributes are only partially available. To make matters worse, the attributes in multiview raw data are composed of multiple forms, which makes it more difficult to explore the structure of the data especially in multiview clustering task. Due to the missing data in some views, the clustering task on incomplete multiview data confronts the following challenges, namely: 1) mining the topology of missing data in multiview is an urgent problem to be solved; 2) most approaches do not calibrate the complemented representations with common information of multiple views; and 3) we discover that the cluster distributions obtained from incomplete views have a cluster distribution unaligned problem (CDUP) in the latent space. To solve the above issues, we propose a deep clustering framework based on subgraph propagation and contrastive calibration (SPCC) for incomplete multiview raw data. First, the global structural graph is reconstructed by propagating the subgraphs generated by the complete data of each view. Then, the missing views are completed and calibrated under the guidance of the global structural graph and contrast learning between views. In the latent space, we assume that different views have a common cluster representation in the same dimension. However, in the unsupervised condition, the fact that the cluster distributions of different views do not correspond affects the information completion process to use information from other views. Finally, the complemented cluster distributions for different views are aligned by contrastive learning (CL), thus solving the CDUP in the latent space. Our method achieves advanced performance on six benchmarks, which validates the effectiveness and superiority of our SPCC.
Collapse
|
3
|
Yan X, Mao Y, Ye Y, Yu H. Cross-Modal Clustering With Deep Correlated Information Bottleneck Method. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:13508-13522. [PMID: 37220062 DOI: 10.1109/tnnls.2023.3269789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Cross-modal clustering (CMC) intends to improve the clustering accuracy (ACC) by exploiting the correlations across modalities. Although recent research has made impressive advances, it remains a challenge to sufficiently capture the correlations across modalities due to the high-dimensional nonlinear characteristics of individual modalities and the conflicts in heterogeneous modalities. In addition, the meaningless modality-private information in each modality might become dominant in the process of correlation mining, which also interferes with the clustering performance. To tackle these challenges, we devise a novel deep correlated information bottleneck (DCIB) method, which aims at exploring the correlation information between multiple modalities while eliminating the modality-private information in each modality in an end-to-end manner. Specifically, DCIB treats the CMC task as a two-stage data compression procedure, in which the modality-private information in each modality is eliminated under the guidance of the shared representation of multiple modalities. Meanwhile, the correlations between multiple modalities are preserved from the aspects of feature distributions and clustering assignments simultaneously. Finally, the objective of DCIB is formulated as an objective function based on a mutual information measurement, in which a variational optimization approach is proposed to ensure its convergence. Experimental results on four cross-modal datasets validate the superiority of the DCIB. Code is released at https://github.com/Xiaoqiang-Yan/DCIB.
Collapse
|
4
|
Cai H, Huang W, Yang S, Ding S, Zhang Y, Hu B, Zhang F, Cheung YM. Realize Generative Yet Complete Latent Representation for Incomplete Multi-View Learning. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:3637-3652. [PMID: 38145535 DOI: 10.1109/tpami.2023.3346869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2023]
Abstract
In multi-view environment, it would yield missing observations due to the limitation of the observation process. The most current representation learning methods struggle to explore complete information by lacking either cross-generative via simply filling in missing view data, or solidative via inferring a consistent representation among the existing views. To address this problem, we propose a deep generative model to learn a complete generative latent representation, namely Complete Multi-view Variational Auto-Encoders (CMVAE), which models the generation of the multiple views from a complete latent variable represented by a mixture of Gaussian distributions. Thus, the missing view can be fully characterized by the latent variables and is resolved by estimating its posterior distribution. Accordingly, a novel variational lower bound is introduced to integrate view-invariant information into posterior inference to enhance the solidative of the learned latent representation. The intrinsic correlations between views are mined to seek cross-view generality, and information leading to missing views is fused by view weights to reach solidity. Benchmark experimental results in clustering, classification, and cross-view image generation tasks demonstrate the superiority of CMVAE, while time complexity and parameter sensitivity analyses illustrate the efficiency and robustness. Additionally, application to bioinformatics data exemplifies its practical significance.
Collapse
|
5
|
Wang H, Zhang W, Ma X. Contrastive and adversarial regularized multi-level representation learning for incomplete multi-view clustering. Neural Netw 2024; 172:106102. [PMID: 38219677 DOI: 10.1016/j.neunet.2024.106102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Revised: 11/20/2023] [Accepted: 01/04/2024] [Indexed: 01/16/2024]
Abstract
Incomplete multi-view clustering is a significant task in machine learning, given that complex systems in nature and society cannot be fully observed; it provides an opportunity to exploit the structure and functions of underlying systems. Current algorithms are criticized for failing either to balance data restoration and clustering or to capture the consistency of the representation of various views. To address these problems, a novel Multi-level Representation Learning Contrastive and Adversarial Learning (aka MRL_CAL) for incomplete multi-view clustering is proposed, in which data restoration, consistent representation, and clustering are jointly learned by exploiting features in various subspaces. Specifically, MRL_CAL employs v auto-encoder to obtain a low-level specific-view representation of instances, which restores data by estimating the distribution of the original incomplete data with adversarial learning. Then, MRL_CAL extracts a high-level representation of instances, in which the consistency of various views and labels of clusters is incorporated with contrastive learning. In this case, MRL_CAL simultaneously learns multi-level features of instances in various subspaces, which not only overcomes the confliction of representations but also improves the quality of features. Finally, MRL_CAL transforms incomplete multi-view clustering into an overall objective, where features are learned under the guidance of clustering. Extensive experimental results indicate that MRL_CAL outperforms state-of-the-art algorithms in terms of various measurements, implying that the proposed method is promising for incomplete multi-view clustering.
Collapse
Affiliation(s)
- Haiyue Wang
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi, 710071, China
| | - Wensheng Zhang
- School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, 510006, China
| | - Xiaoke Ma
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi, 710071, China.
| |
Collapse
|
6
|
Zhou Y, Guo Y, Hao S, Hong R, Luo J. Few-Shot Partial Multi-View Learning. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:11824-11841. [PMID: 37167050 DOI: 10.1109/tpami.2023.3275162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
It is often the case that data are with multiple views in real-world applications. Fully exploring the information of each view is significant for making data more representative. However, due to various limitations and failures in data collection and pre-processing, it is inevitable for real data to suffer from view missing and data scarcity. The coexistence of these two issues makes it more challenging to achieve the pattern classification task. Currently, to our best knowledge, few appropriate methods can well-handle these two issues simultaneously. Aiming to draw more attention from the community to this challenge, we propose a new task in this paper, called few-shot partial multi-view learning, which focuses on overcoming the negative impact of the view-missing issue in the low-data regime. The challenges of this task are twofold: (i) it is difficult to overcome the impact of data scarcity under the interference of missing views; (ii) the limited number of data exacerbates information scarcity, thus making it harder to address the view-missing issue in turn. To address these challenges, we propose a new unified Gaussian dense-anchoring method. The unified dense anchors are learned for the limited partial multi-view data, thereby anchoring them into a unified dense representation space where the influence of data scarcity and view missing can be alleviated. We conduct extensive experiments to evaluate our method. The results on Cub-googlenet-doc2vec, Handwritten, Caltech102, Scene15, Animal, ORL, tieredImagenet, and Birds-200-2011 datasets validate its effectiveness. The codes will be released at https://github.com/zhouyuan888888/UGDA.
Collapse
|
7
|
Yin J, Jiang J. Incomplete Multi-view Clustering Based on Self-representation. Neural Process Lett 2023. [DOI: 10.1007/s11063-023-11172-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/28/2023]
|
8
|
Xia W, Gao Q, Wang Q, Gao X. Tensor Completion-Based Incomplete Multiview Clustering. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:13635-13644. [PMID: 35077379 DOI: 10.1109/tcyb.2021.3140068] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Incomplete multiview clustering is a challenging problem in the domain of unsupervised learning. However, the existing incomplete multiview clustering methods only consider the similarity structure of intraview while neglecting the similarity structure of interview. Thus, they cannot take advantage of both the complementary information and spatial structure embedded in similarity matrices of different views. To this end, we complete the incomplete graph with missing data referring to tensor complete and present a novel and effective model to handel the incomplete multiview clustering task. To be specific, we consider the similarity of the interview graphs via the tensor Schatten p -norm-based completion technique to make use of both the complementary information and spatial structure. Meanwhile, we employ the connectivity constraint for similarity matrices of different views such that the connected components approximately represent clusters. Thus, the learned entire graph not only has the low-rank structure but also well characterizes the relationship between unmissing data. Extensive experiments show the promising performance of the proposed method comparing with several incomplete multiview approaches in the clustering tasks.
Collapse
|
9
|
Xie Z, Yang Y, Zhang Y, Wang J, Du S. Deep learning on multi-view sequential data: a survey. Artif Intell Rev 2022; 56:6661-6704. [PMID: 36466765 PMCID: PMC9707228 DOI: 10.1007/s10462-022-10332-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
With the progress of human daily interaction activities and the development of industrial society, a large amount of media data and sensor data become accessible. Humans collect these multi-source data in chronological order, called multi-view sequential data (MvSD). MvSD has numerous potential application domains, including intelligent transportation, climate science, health care, public safety and multimedia, etc. However, as the volume and scale of MvSD increases, the traditional machine learning methods become difficult to withstand such large-scale data, and it is no longer appropriate to use hand-craft features to represent these complex data. In addition, there is no general framework in the process of mining multi-view relationships and integrating multi-view information. In this paper, We first introduce four common data types that constitute MvSD, including point data, sequence data, graph data, and raster data. Then, we summarize the technical challenges of MvSD. Subsequently, we review the recent progress in deep learning technology applied to MvSD. Meanwhile, we discuss how the network represents and learns features of MvSD. Finally, we summarize the applications of MvSD in different domains and give potential research directions.
Collapse
Affiliation(s)
- Zhuyang Xie
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, 611756 China
- Manufacturing Industry Chains Collaboration and Information Support Technology Key Laboratory, Southwest Jiaotong University, Chengdu, 611756 China
| | - Yan Yang
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, 611756 China
- Manufacturing Industry Chains Collaboration and Information Support Technology Key Laboratory, Southwest Jiaotong University, Chengdu, 611756 China
| | - Yiling Zhang
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, 611756 China
- Manufacturing Industry Chains Collaboration and Information Support Technology Key Laboratory, Southwest Jiaotong University, Chengdu, 611756 China
| | - Jie Wang
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, 611756 China
- Manufacturing Industry Chains Collaboration and Information Support Technology Key Laboratory, Southwest Jiaotong University, Chengdu, 611756 China
| | - Shengdong Du
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, 611756 China
- Manufacturing Industry Chains Collaboration and Information Support Technology Key Laboratory, Southwest Jiaotong University, Chengdu, 611756 China
| |
Collapse
|
10
|
Niu C, Shan H, Wang G. SPICE: Semantic Pseudo-Labeling for Image Clustering. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:7264-7278. [PMID: 36378790 PMCID: PMC9767807 DOI: 10.1109/tip.2022.3221290] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The similarity among samples and the discrepancy among clusters are two crucial aspects of image clustering. However, current deep clustering methods suffer from inaccurate estimation of either feature similarity or semantic discrepancy. In this paper, we present a Semantic Pseudo-labeling-based Image ClustEring (SPICE) framework, which divides the clustering network into a feature model for measuring the instance-level similarity and a clustering head for identifying the cluster-level discrepancy. We design two semantics-aware pseudo-labeling algorithms, prototype pseudo-labeling and reliable pseudo-labeling, which enable accurate and reliable self-supervision over clustering. Without using any ground-truth label, we optimize the clustering network in three stages: 1) train the feature model through contrastive learning to measure the instance similarity; 2) train the clustering head with the prototype pseudo-labeling algorithm to identify cluster semantics; and 3) jointly train the feature model and clustering head with the reliable pseudo-labeling algorithm to improve the clustering performance. Extensive experimental results demonstrate that SPICE achieves significant improvements (~10%) over existing methods and establishes the new state-of-the-art clustering results on six balanced benchmark datasets in terms of three popular metrics. Importantly, SPICE significantly reduces the gap between unsupervised and fully-supervised classification; e.g. there is only 2% (91.8% vs 93.8%) accuracy difference on CIFAR-10. Our code is made publicly available at https://github.com/niuchuangnn/SPICE.
Collapse
|
11
|
Zhang T, Cong Y, Sun G, Dong J. Visual-Tactile Fused Graph Learning for Object Clustering. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:12275-12289. [PMID: 34133303 DOI: 10.1109/tcyb.2021.3080321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Object clustering has received considerable research attention most recently. However, 1) most existing object clustering methods utilize visual information while ignoring important tactile modality, which would inevitably lead to model performance degradation and 2) simply concatenating visual and tactile information via multiview clustering method can make complementary information to not be fully explored, since there are many differences between vision and touch. To address these issues, we put forward a graph-based visual-tactile fused object clustering framework with two modules: 1) a modality-specific representation learning module MR and 2) a unified affinity graph learning module MU . Specifically, MR focuses on learning modality-specific representations for visual-tactile data, where deep non-negative matrix factorization (NMF) is adopted to extract the hidden information behind each modality. Meanwhile, we employ an autoencoder-like structure to enhance the robustness of the learned representations, and two graphs to improve its compactness. Furthermore, MU highlights how to mitigate the differences between vision and touch, and further maximize the mutual information, which adopts a minimizing disagreement scheme to guide the modality-specific representations toward a unified affinity graph. To achieve ideal clustering performance, a Laplacian rank constraint is imposed to regularize the learned graph with ideal connected components, where noises that caused wrong connections are removed and clustering labels can be obtained directly. Finally, we propose an efficient alternating iterative minimization updating strategy, followed by a theoretical proof to prove framework convergence. Comprehensive experiments on five public datasets demonstrate the superiority of the proposed framework.
Collapse
|
12
|
Lv Z, Gao Q, Zhang X, Li Q, Yang M. View-Consistency Learning for Incomplete Multiview Clustering. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:4790-4802. [PMID: 35797312 DOI: 10.1109/tip.2022.3187562] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In this article, we present a novel general framework for incomplete multi-view clustering by integrating graph learning and spectral clustering. In our model, a tensor low-rank constraint are introduced to learn a stable low-dimensional representation, which encodes the complementary information and takes into account the cluster structure between different views. A corresponding algorithm associated with augmented Lagrangian multipliers is established. In particular, tensor Schatten p -norm is used as a tighter approximation to the tensor rank function. Besides, both consistency and specificity are jointly exploited for subspace representation learning. Extensive experiments on benchmark datasets demonstrate that our model outperforms several baseline methods in incomplete multi-view clustering.
Collapse
|
13
|
Xu B, Zeng Z, Lian C, Ding Z. Few-Shot Domain Adaptation via Mixup Optimal Transport. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:2518-2528. [PMID: 35275818 DOI: 10.1109/tip.2022.3157139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Unsupervised domain adaptation aims to learn a classification model for the target domain without any labeled samples by transferring the knowledge from the source domain with sufficient labeled samples. The source and the target domains usually share the same label space but are with different data distributions. In this paper, we consider a more difficult but insufficient-explored problem named as few-shot domain adaptation, where a classifier should generalize well to the target domain given only a small number of examples in the source domain. In such a problem, we recast the link between the source and target samples by a mixup optimal transport model. The mixup mechanism is integrated into optimal transport to perform the few-shot adaptation by learning the cross-domain alignment matrix and domain-invariant classifier simultaneously to augment the source distribution and align the two probability distributions. Moreover, spectral shrinkage regularization is deployed to improve the transferability and discriminability of the mixup optimal transport model by utilizing all singular eigenvectors. Experiments conducted on several domain adaptation tasks demonstrate the effectiveness of our proposed model dealing with the few-shot domain adaptation problem compared with state-of-the-art methods.
Collapse
|
14
|
|
15
|
Sun G, Cong Y, Dong J, Liu Y, Ding Z, Yu H. What and How: Generalized Lifelong Spectral Clustering via Dual Memory. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; PP:1-1. [PMID: 33571090 DOI: 10.1109/tpami.2021.3058852] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Spectral clustering has become one of the most effective clustering algorithms. We in this work explore the problem of spectral clustering in a lifelong learning framework termed as Generalized Lifelong Spectral Clustering (GL 2SC). Different from most current studies, which concentrate on a fixed spectral clustering task set and cannot efficiently incorporate a new clustering task, the goal of our work is to establish a generalized model for new spectral clustering task by What and How to lifelong learn from past tasks. For what to lifelong learn, our GL 2SC framework contains a dual memory mechanism with a deep orthogonal factorization manner: an orthogonal basis memory stores hidden and hierarchical clustering centers among learned tasks, and a feature embedding memory captures deep manifold representation common across multiple related tasks. When a new clustering task arrives, the intuition here for how to lifelong learn is that GL 2SC can transfer intrinsic knowledge from dual memory mechanism to obtain task-specific encoding matrix. Then the encoding matrix can redefine the dual memory over time to provide maximal benefits when learning future tasks. To the end, empirical comparisons on several benchmark datasets show the effectiveness of our GL 2SC, in comparison with several state-of-the-art spectral clustering models.
Collapse
|
16
|
Wang Q, Ding Z, Tao Z, Gao Q, Fu Y. Generative Partial Multi-View Clustering With Adaptive Fusion and Cycle Consistency. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:1771-1783. [PMID: 33417549 DOI: 10.1109/tip.2020.3048626] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Nowadays, with the rapid development of data collection sources and feature extraction methods, multi-view data are getting easy to obtain and have received increasing research attention in recent years, among which, multi-view clustering (MVC) forms a mainstream research direction and is widely used in data analysis. However, existing MVC methods mainly assume that each sample appears in all the views, without considering the incomplete view case due to data corruption, sensor failure, equipment malfunction, etc. In this study, we design and build a generative partial multi-view clustering model with adaptive fusion and cycle consistency, named as GP-MVC, to solve the incomplete multi-view problem by explicitly generating the data of missing views. The main idea of GP-MVC lies in two-fold. First, multi-view encoder networks are trained to learn common low-dimensional representations, followed by a clustering layer to capture the shared cluster structure across multiple views. Second, view-specific generative adversarial networks with multi-view cycle consistency are developed to generate the missing data of one view conditioning on the shared representation given by other views. These two steps could be promoted mutually, where the learned common representation facilitates data imputation and the generated data could further explores the view consistency. Moreover, an weighted adaptive fusion scheme is implemented to exploit the complementary information among different views. Experimental results on four benchmark datasets are provided to show the effectiveness of the proposed GP-MVC over the state-of-the-art methods.
Collapse
|