1
|
Qin Y, Zhang X, Yu S, Feng G. A survey on representation learning for multi-view data. Neural Netw 2025; 181:106842. [PMID: 39515080 DOI: 10.1016/j.neunet.2024.106842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2024] [Revised: 09/19/2024] [Accepted: 10/23/2024] [Indexed: 11/16/2024]
Abstract
Multi-view clustering has become a rapidly growing field in machine learning and data mining areas by combining useful information from different views for last decades. Although there have been some surveys based on multi-view clustering, most of these works ignore simultaneously taking the self-supervised and non-self supervised multi-view clustering into consideration. We give a novel survey for sorting out the existing algorithms of multi-view clustering in this work, which can be classified into two different categories, i.e., non-self supervised and self-supervised multi-view clustering. We first review the representative approaches based on the non-self supervised multi-view clustering, which consist of methods based on non-representation learning and representation learning. Furthermore, the methods built on non-representation learning contain works based on matrix factorization, kernel and other non-representation learning. Methods based on representation learning consist of multi-view graph clustering, deep representation learning and multi-view subspace clustering. For the methods based on self-supervised multi-view clustering, we divide them into contrastive methods and generative methods. Overall, this survey attempts to give an insightful overview regarding the developments in the multi-view clustering field.
Collapse
Affiliation(s)
- Yalan Qin
- School of Communication and Information Engineering, Shanghai University, China
| | - Xinpeng Zhang
- School of Communication and Information Engineering, Shanghai University, China
| | - Shui Yu
- School of Computer Science, University of Technology Sydney, Australia
| | - Guorui Feng
- School of Communication and Information Engineering, Shanghai University, China.
| |
Collapse
|
2
|
You J, Ren Z, Yu FR, You X. One-Stage Shifted Laplacian Refining for Multiple Kernel Clustering. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:11501-11513. [PMID: 37030712 DOI: 10.1109/tnnls.2023.3262590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Graph learning can effectively characterize the similarity structure of sample pairs, hence multiple kernel clustering based on graph learning (MKC-GL) achieves promising results on nonlinear clustering tasks. However, previous methods confine to a "three-stage" scheme, that is, affinity graph learning, Laplacian construction, and clustering indicator extracting, which results in the information distortion in the step alternating. Meanwhile, the energy of Laplacian reconstruction and the necessary cluster information cannot be preserved simultaneously. To address these problems, we propose a one-stage shifted Laplacian refining (OSLR) method for multiple kernel clustering (MKC), where using the "one-stage" scheme focuses on Laplacian learning rather than traditional graph learning. Concretely, our method treats each kernel matrix as an affinity graph rather than ordinary data and constructs its corresponding Laplacian matrix in advance. Compared to the traditional Laplacian methods, we transform each Laplacian to an approximately shifted Laplacian (ASL) for refining a consensus Laplacian. Then, we project the consensus Laplacian onto a Fantope space to ensure that reconstruction information and clustering information concentrate on larger eigenvalues. Theoretically, our OSLR reduces the memory complexity and computation complexity to O(n) and O(n2) , respectively. Moreover, experimental results have shown that it outperforms state-of-the-art MKC methods on multiple benchmark datasets.
Collapse
|
3
|
Sun L, Wen J, Liu C, Fei L, Li L. Balance guided incomplete multi-view spectral clustering. Neural Netw 2023; 166:260-272. [PMID: 37531726 DOI: 10.1016/j.neunet.2023.07.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Revised: 07/13/2023] [Accepted: 07/14/2023] [Indexed: 08/04/2023]
Abstract
There is a large volume of incomplete multi-view data in the real-world. How to partition these incomplete multi-view data is an urgent realistic problem since almost all of the conventional multi-view clustering methods are inapplicable to cases with missing views. In this paper, a novel graph learning-based incomplete multi-view clustering (IMVC) method is proposed to address this issue. Different from existing works, our method aims at learning a common consensus graph from all incomplete views and obtaining a clustering indicator matrix in a unified framework. To achieve a stable clustering result, a relaxed spectral clustering model is introduced to obtain a probability consensus representation with all positive elements that reflect the data clustering result. Considering the different contributions of views to the clustering task, a weighted multi-view learning mechanism is introduced to automatically balance the effects of different views in model optimization. In this way, the intrinsic information of the incomplete multi-view data can be fully exploited. The experiments on several incomplete multi-view datasets show that our method outperforms the compared state-of-the-art clustering methods, which demonstrates the effectiveness of our method for IMVC.
Collapse
Affiliation(s)
- Lilei Sun
- School of Data Science and Information Engineering, Guizhou Minzu University, Guiyang, 550025, China; Shenzhen Key Laboratory of Visual Object Detection and Recognition, Harbin Institute of Technology, Shenzhen, Shenzhen, 518000, China
| | - Jie Wen
- Shenzhen Key Laboratory of Visual Object Detection and Recognition, Harbin Institute of Technology, Shenzhen, Shenzhen, 518000, China.
| | - Chengliang Liu
- Shenzhen Key Laboratory of Visual Object Detection and Recognition, Harbin Institute of Technology, Shenzhen, Shenzhen, 518000, China
| | - Lunke Fei
- School of Computer Science and Technology, Guangdong University of Technology, Guangzhou, 510000, China
| | - Lusi Li
- Department of Computer Science, Old Dominion University, USA
| |
Collapse
|
4
|
Fang Z, Du S, Lin X, Yang J, Wang S, Shi Y. DBO-Net: Differentiable Bi-level Optimization Network for Multi-view Clustering. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.01.071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
|
5
|
Multiview nonnegative matrix factorization with dual HSIC constraints for clustering. INT J MACH LEARN CYB 2022. [DOI: 10.1007/s13042-022-01742-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
6
|
Xie Z, Yang Y, Zhang Y, Wang J, Du S. Deep learning on multi-view sequential data: a survey. Artif Intell Rev 2022; 56:6661-6704. [PMID: 36466765 PMCID: PMC9707228 DOI: 10.1007/s10462-022-10332-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
With the progress of human daily interaction activities and the development of industrial society, a large amount of media data and sensor data become accessible. Humans collect these multi-source data in chronological order, called multi-view sequential data (MvSD). MvSD has numerous potential application domains, including intelligent transportation, climate science, health care, public safety and multimedia, etc. However, as the volume and scale of MvSD increases, the traditional machine learning methods become difficult to withstand such large-scale data, and it is no longer appropriate to use hand-craft features to represent these complex data. In addition, there is no general framework in the process of mining multi-view relationships and integrating multi-view information. In this paper, We first introduce four common data types that constitute MvSD, including point data, sequence data, graph data, and raster data. Then, we summarize the technical challenges of MvSD. Subsequently, we review the recent progress in deep learning technology applied to MvSD. Meanwhile, we discuss how the network represents and learns features of MvSD. Finally, we summarize the applications of MvSD in different domains and give potential research directions.
Collapse
Affiliation(s)
- Zhuyang Xie
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, 611756 China
- Manufacturing Industry Chains Collaboration and Information Support Technology Key Laboratory, Southwest Jiaotong University, Chengdu, 611756 China
| | - Yan Yang
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, 611756 China
- Manufacturing Industry Chains Collaboration and Information Support Technology Key Laboratory, Southwest Jiaotong University, Chengdu, 611756 China
| | - Yiling Zhang
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, 611756 China
- Manufacturing Industry Chains Collaboration and Information Support Technology Key Laboratory, Southwest Jiaotong University, Chengdu, 611756 China
| | - Jie Wang
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, 611756 China
- Manufacturing Industry Chains Collaboration and Information Support Technology Key Laboratory, Southwest Jiaotong University, Chengdu, 611756 China
| | - Shengdong Du
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, 611756 China
- Manufacturing Industry Chains Collaboration and Information Support Technology Key Laboratory, Southwest Jiaotong University, Chengdu, 611756 China
| |
Collapse
|
7
|
Feature selection based on a hybrid simplified particle swarm optimization algorithm with maximum separation and minimum redundancy. INT J MACH LEARN CYB 2022. [DOI: 10.1007/s13042-022-01663-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
8
|
Kong Y, Qian Y, Tan F, Bai L, Shao J, Ma T, Tereshchenko SN. CVDP k-means clustering algorithm for differential privacy based on coefficient of variation. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2022. [DOI: 10.3233/jifs-213564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Data clustering has been applied and developed in all walks of life, which can provide convenience for enterprise service optimization. However, when the original data to be analyzed contains users’ personal privacy information, the clustering analysis process of the data holder may expose users’ privacy. Differential privacy k-means algorithm is a clustering method based on differential privacy protection technology, which can solve the privacy disclosure problem in the process of data clustering. In the differential privacy k-means algorithm, Laplacian noise controlled by privacy parameter ɛ is added to the center point of clustering to protect user sensitive information and clustering results in the original data, but the addition of noise will affect the utility of clustering. In order to balance the availability and privacy of the differential privacy k-means clustering algorithm, the research on the improvement of the algorithm pays more attention to the selection of the initial clustering center or the optimization of the outlier processing, but does not consider the different contribution degree of each dimension data to the clustering. Therefore, this paper proposes a differential privacy CVDP k-means clustering algorithm based on coefficient of variation. The CVDP scheme first eliminates outliers in the original data through data density, and then designs weighted data point similarity calculation method and initial centroid selection method using variation coefficient. Experimental results show that CVDP k-means algorithm has some improvements in availability, performance and privacy.
Collapse
Affiliation(s)
- Yuting Kong
- School of Software, Xinjiang University, Urumqi, Xinjiang Uygur Autonomous Region, China
- Key Laboratory of Signal Detection and Processing in Xinjiang Uygur Autonomous Region, Xinjiang University, Urumqi, China
- Key Laboratory of Software Engineering, Xinjiang University, Urumqi, China
| | - Yurong Qian
- School of Software, Xinjiang University, Urumqi, Xinjiang Uygur Autonomous Region, China
- Key Laboratory of Signal Detection and Processing in Xinjiang Uygur Autonomous Region, Xinjiang University, Urumqi, China
- Key Laboratory of Software Engineering, Xinjiang University, Urumqi, China
| | - Fuxiang Tan
- School of Software, Xinjiang University, Urumqi, Xinjiang Uygur Autonomous Region, China
- Key Laboratory of Signal Detection and Processing in Xinjiang Uygur Autonomous Region, Xinjiang University, Urumqi, China
- Key Laboratory of Software Engineering, Xinjiang University, Urumqi, China
| | - Lu Bai
- School of Software, Xinjiang University, Urumqi, Xinjiang Uygur Autonomous Region, China
- Key Laboratory of Signal Detection and Processing in Xinjiang Uygur Autonomous Region, Xinjiang University, Urumqi, China
- Key Laboratory of Software Engineering, Xinjiang University, Urumqi, China
| | - Jinxin Shao
- School of Software, Xinjiang University, Urumqi, Xinjiang Uygur Autonomous Region, China
- Key Laboratory of Signal Detection and Processing in Xinjiang Uygur Autonomous Region, Xinjiang University, Urumqi, China
- Key Laboratory of Software Engineering, Xinjiang University, Urumqi, China
| | - Tinghuai Ma
- Nanjing University of Information Science & Technology, Nanjing, China
| | | |
Collapse
|
9
|
Raheja N, Kumar Manocha A. IoT based ECG monitoring system with encryption and authentication in secure data transmission for clinical health care approach. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2022.103481] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
10
|
Mi Y, Ren Z, Xu Z, Li H, Sun Q, Chen H, Dai J. Multi-view clustering with dual tensors. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-06927-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
11
|
Yihong L, Yunpeng W, Tao L, Xiaolong L, Han S. GNN-DBSCAN: A new density-based algorithm using grid and the nearest neighbor. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-211922] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
DBSCAN (density-based spatial clustering of applications with noise) is one of the most widely used density-based clustering algorithms, which can find arbitrary shapes of clusters, determine the number of clusters, and identify noise samples automatically. However, the performance of DBSCAN is significantly limited as it is quite sensitive to the parameters of eps and MinPts. Eps represents the eps-neighborhood and MinPts stands for a minimum number of points. Additionally, a dataset with large variations in densities will probably trap the DBSCAN because its parameters are fixed. In order to overcome these limitations, we propose a new density-clustering algorithm called GNN-DBSCAN which uses an adaptive Grid to divide the dataset and defines local core samples by using the Nearest Neighbor. With the help of grid, the dataset space will be divided into a finite number of cells. After that, the nearest neighbor lying in every filled cell and adjacent filled cells are defined as the local core samples. Then, GNN-DBSCAN obtains global core samples by enhancing and screening local core samples. In this way, our algorithm can identify higher-quality core samples than DBSCAN. Lastly, give these global core samples and use dynamic radius based on k-nearest neighbors to cluster the datasets. Dynamic radius can overcome the problems of DBSCAN caused by its fixed parameter eps. Therefore, our method can perform better on dataset with large variations in densities. Experiments on synthetic and real-world datasets were conducted. The results indicate that the average Adjusted Rand Index (ARI), Normalized Mutual Information (NMI), Adjusted Mutual Information (AMI) and V-measure of our proposed algorithm outperform the existing algorithm DBSCAN, DPC, ADBSCAN, and HDBSCAN.
Collapse
Affiliation(s)
- Li Yihong
- School of Cyber Science and Engineering, Sichuan University, Chengdu, China
| | - Wang Yunpeng
- School of Cyber Science and Engineering, Sichuan University, Chengdu, China
| | - Li Tao
- School of Cyber Science and Engineering, Sichuan University, Chengdu, China
| | - Lan Xiaolong
- School of Cyber Science and Engineering, Sichuan University, Chengdu, China
| | - Song Han
- School of Cyber Science and Engineering, Sichuan University, Chengdu, China
| |
Collapse
|
12
|
Khan GA, Hu J, Li T, Diallo B, Zhao Y. Multi-view low rank sparse representation method for three-way clustering. INT J MACH LEARN CYB 2021. [DOI: 10.1007/s13042-021-01394-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|