1
|
Wu J, Yang B, Xue Z, Zhang X, Lin Z, Chen B. Fast multi-view clustering via correntropy-based orthogonal concept factorization. Neural Netw 2024; 173:106170. [PMID: 38387199 DOI: 10.1016/j.neunet.2024.106170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 01/15/2024] [Accepted: 02/08/2024] [Indexed: 02/24/2024]
Abstract
Owing to its ability to handle negative data and promising clustering performance, concept factorization (CF), an improved version of non-negative matrix factorization, has been incorporated into multi-view clustering recently. Nevertheless, existing CF-based multi-view clustering methods still have the following issues: (1) they directly conduct factorization in the original data space, which means its efficiency is sensitive to the feature dimension; (2) they ignore the high degree of factorization freedom of standard CF, which may lead to non-uniqueness factorization thereby causing reduced effectiveness; (3) traditional robust norms they used are unable to handle complex noises, significantly challenging their robustness. To address these issues, we establish a fast multi-view clustering via correntropy-based orthogonal concept factorization (FMVCCF). Specifically, FMVCCF executes factorization on a learned consensus anchor graph rather than directly decomposing the original data, lessening the dimensionality sensitivity. Then, a lightweight graph regularization term is incorporated to refine the factorization process with a low computational burden. Moreover, an improved multi-view correntropy-based orthogonal CF model is developed, which can enhance the effectiveness and robustness under the orthogonal constraint and correntropy criterion, respectively. Extensive experiments demonstrate that FMVCCF can achieve promising effectiveness and robustness on various real-world datasets with high efficiency.
Collapse
Affiliation(s)
- Jinghan Wu
- National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, Xi'an 710049, China; National Engineering Research Center for Visual Information and Applications, Xi'an 710049, China; Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an 710049, China
| | - Ben Yang
- National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, Xi'an 710049, China; National Engineering Research Center for Visual Information and Applications, Xi'an 710049, China; Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an 710049, China
| | - Zhiyuan Xue
- National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, Xi'an 710049, China; National Engineering Research Center for Visual Information and Applications, Xi'an 710049, China; Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an 710049, China
| | - Xuetao Zhang
- National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, Xi'an 710049, China; National Engineering Research Center for Visual Information and Applications, Xi'an 710049, China; Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an 710049, China.
| | - Zhiping Lin
- School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798, Singapore
| | - Badong Chen
- National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, Xi'an 710049, China; National Engineering Research Center for Visual Information and Applications, Xi'an 710049, China; Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an 710049, China
| |
Collapse
|
2
|
Chen B, Xu S, Xu H, Bian X, Guo N, Xu X, Hua X, Zhou T. Structural deep multi-view clustering with integrated abstraction and detail. Neural Netw 2024; 175:106287. [PMID: 38593558 DOI: 10.1016/j.neunet.2024.106287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 02/21/2024] [Accepted: 03/30/2024] [Indexed: 04/11/2024]
Abstract
Deep multi-view clustering, which can obtain complementary information from different views, has received considerable attention in recent years. Although some efforts have been made and achieve decent performances, most of them overlook the structural information and are susceptible to poor quality views, which may seriously restrict the capacity for clustering. To this end, we propose Structural deep Multi-View Clustering with integrated abstraction and detail (SMVC). Specifically, multi-layer perceptrons are used to extract features from specific views, which are then concatenated to form the global features. Besides, a global target distribution is constructed and guides the soft cluster assignments of specific views. In addition to the exploitation of the top-level abstraction, we also design the mining of the underlying details. We construct instance-level contrastive learning using high-order adjacency matrices, which has an equivalent effect to graph attention network and reduces feature redundancy. By integrating the top-level abstraction and underlying detail into a unified framework, our model can jointly optimize the cluster assignments and feature embeddings. Extensive experiments on four benchmark datasets have demonstrated that the proposed SMVC consistently outperforms the state-of-the-art methods.
Collapse
Affiliation(s)
- Bowei Chen
- School of Information Engineering, Yancheng Institute of Technology, Yancheng, 224051, China.
| | - Sen Xu
- School of Information Engineering, Yancheng Institute of Technology, Yancheng, 224051, China.
| | - Heyang Xu
- School of Information Engineering, Yancheng Institute of Technology, Yancheng, 224051, China
| | - Xuesheng Bian
- School of Information Engineering, Yancheng Institute of Technology, Yancheng, 224051, China
| | - Naixuan Guo
- School of Information Engineering, Yancheng Institute of Technology, Yancheng, 224051, China
| | - Xiufang Xu
- School of Information Engineering, Yancheng Institute of Technology, Yancheng, 224051, China
| | - Xiaopeng Hua
- School of Information Engineering, Yancheng Institute of Technology, Yancheng, 224051, China
| | - Tian Zhou
- National Key Laboratory of Underwater Acoustic Technology, Key Laboratory of Marine Information Acquisition and Security, Ministry of Industry and Information Technology, College of Underwater Acoustic Engineering, Harbin Engineering University, Harbin, 150001, China
| |
Collapse
|
3
|
Liu M, Palade V, Zheng Z. Learning the consensus and complementary information for large-scale multi-view clustering. Neural Netw 2024; 172:106103. [PMID: 38219678 DOI: 10.1016/j.neunet.2024.106103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 11/25/2023] [Accepted: 01/04/2024] [Indexed: 01/16/2024]
Abstract
The multi-view data clustering has attracted much interest from researchers, and the large-scale multi-view clustering has many important applications and significant research value. In this article, we fully make use of the consensus and complementary information, and exploit a bipartite graph to depict the duality relationship between original points and anchor points. To be specific, representative anchor points are selected for each view to construct corresponding anchor representation matrices, and all views' anchor points are utilized to construct a common representation matrix. Using anchor points also reduces the computation complexity. Next, the bipartite graph is built by fusing these representation matrices, and a Laplacian rank constraint is enforced on the bipartite graph. This will make the bipartite graph have k connected components to obtain accurate clustering labels, where the bipartite graph is specifically designed for a large-scale dataset problem. In addition, the anchor points are also updated by dictionary learning. The experimental results on the four benchmark image processing datasets have demonstrated superior performance of the proposed large-scale multi-view clustering algorithm over other state-of-the-art multi-view clustering algorithms.
Collapse
Affiliation(s)
- Maoshan Liu
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua 321004, China.
| | - Vasile Palade
- Centre for Computational Science and Mathematical Modelling, Coventry University, Coventry CV1 2TL, UK.
| | - Zhonglong Zheng
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua 321004, China.
| |
Collapse
|
4
|
Zhang D, Huang H, Zhao Q, Zhou G. Generalized latent multi-view clustering with tensorized bipartite graph. Neural Netw 2024; 175:106282. [PMID: 38599137 DOI: 10.1016/j.neunet.2024.106282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Revised: 03/22/2024] [Accepted: 03/27/2024] [Indexed: 04/12/2024]
Abstract
Tensor-based multi-view spectral clustering algorithms use tensors to model the structure of multi-dimensional data to take advantage of the complementary information and high-order correlations embedded in the graph, thus achieving impressive clustering performance. However, these algorithms use linear models to obtain consensus, which prevents the learned consensus from adequately representing the nonlinear structure of complex data. In order to address this issue, we propose a method called Generalized Latent Multi-View Clustering with Tensorized Bipartite Graph (GLMC-TBG). Specifically, in this paper we introduce neural networks to learn highly nonlinear mappings that encode nonlinear structures in graphs into latent representations. In addition, multiple views share the same latent consensus through nonlinear interactions. In this way, a more comprehensive common representation from multiple views can be achieved. An Augmented Lagrangian Multiplier with Alternating Direction Minimization (ALM-ADM) framework is designed to optimize the model. Experiments on seven real-world data sets verify that the proposed algorithm is superior to state-of-the-art algorithms.
Collapse
Affiliation(s)
- Dongping Zhang
- School of Automation, Guangdong University of Technology, Guangzhou 510006, China; Guangdong Key Laboratory of IoT Information Technology, Guangdong University of Technology, Guangzhou 510006, China.
| | - Haonan Huang
- School of Automation, Guangdong University of Technology, Guangzhou 510006, China; Key Laboratory of Intelligent Information Processing and System Integration of IoT, Ministry of Education, Guangzhou 510006, China; Guangdong-HongKong-Macao Joint Laboratory for Smart Discrete Manufacturing, Guangzhou 510006, China.
| | - Qibin Zhao
- School of Automation, Guangdong University of Technology, Guangzhou 510006, China; Center for Advanced Intelligence Project (AIP), RIKEN, Tokyo 103-0027, Japan.
| | - Guoxu Zhou
- School of Automation, Guangdong University of Technology, Guangzhou 510006, China; Key Laboratory of Intelligent Detection and The Internet of Things in Manufacturing, Ministry of Education, Guangzhou 510006, China.
| |
Collapse
|
5
|
Chen R, Tang Y, Zhang W, Feng W. Adaptive-weighted deep multi-view clustering with uniform scale representation. Neural Netw 2024; 171:114-126. [PMID: 38091755 DOI: 10.1016/j.neunet.2023.11.066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2023] [Revised: 10/07/2023] [Accepted: 11/29/2023] [Indexed: 01/29/2024]
Abstract
Multi-view clustering has attracted growing attention owing to its powerful capacity of multi-source information integration. Although numerous advanced methods have been proposed in past decades, most of them generally fail to distinguish the unequal importance of multiple views to the clustering task and overlook the scale uniformity of learned latent representation among different views, resulting in blurry physical meaning and suboptimal model performance. To address these issues, in this paper, we propose a joint learning framework, termed Adaptive-weighted deep Multi-view Clustering with Uniform scale representation (AMCU). Specifically, to achieve more reasonable multi-view fusion, we introduce an adaptive weighting strategy, which imposes simplex constraints on heterogeneous views for measuring their varying degrees of contribution to consensus prediction. Such a simple yet effective strategy shows its clear physical meaning for the multi-view clustering task. Furthermore, a novel regularizer is incorporated to learn multiple latent representations sharing approximately the same scale, so that the objective for calculating clustering loss cannot be sensitive to the views and thus the entire model training process can be guaranteed to be more stable as well. Through comprehensive experiments on eight popular real-world datasets, we demonstrate that our proposal performs better than several state-of-the-art single-view and multi-view competitors.
Collapse
Affiliation(s)
- Rui Chen
- College of Information Science and Technology, Hainan University, Haikou, 570208, China; State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China.
| | - Yongqiang Tang
- State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China.
| | - Wensheng Zhang
- College of Information Science and Technology, Hainan University, Haikou, 570208, China; State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China.
| | - Wenlong Feng
- College of Information Science and Technology, Hainan University, Haikou, 570208, China; State Key Laboratory of Marine Resource Utilization in South China Sea, Hainan University, Haikou, 570208, China.
| |
Collapse
|
6
|
Zhao M, Yang W, Nie F. Deep graph reconstruction for multi-view clustering. Neural Netw 2023; 168:560-568. [PMID: 37837745 DOI: 10.1016/j.neunet.2023.10.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 07/01/2023] [Accepted: 10/01/2023] [Indexed: 10/16/2023]
Abstract
Graph-based multi-view clustering methods have achieved impressive success by exploring a complemental or independent graph embedding with low-dimension among multiple views. The majority of them, however, are shallow models with limited ability to learn the nonlinear information in multi-view data. To this end, we propose a novel deep graph reconstruction (DGR) framework for multi-view clustering, which contains three modules. Specifically, a Multi-graph Fusion Module (MFM) is employed to obtain the consensus graph. Then node representation is learned by the Graph Embedding Network (GEN). To assign clusters directly, the Clustering Assignment Module (CAM) is devised to obtain the final low-dimensional graph embedding, which can serve as the indicator matrix. In addition, a simple and powerful loss function is designed in the proposed DGR. Extensive experiments on seven real-world datasets have been conducted to verify the superior clustering performance and efficiency of DGR compared with the state-of-the-art methods.
Collapse
Affiliation(s)
- Mingyu Zhao
- School of Computer Science, Fudan University, Shanghai 200433, PR China.
| | - Weidong Yang
- School of Computer Science, Fudan University, Shanghai 200433, PR China.
| | - Feiping Nie
- School of Computer Science, School of Artificial Intelligence, Optics and Electronics (iOPEN), and the Key Laboratory of Intelligent Interaction and Applications (Ministry of Industry and Information Technology), Northwestern Polytechnical University, Xi'an 710072, Shaanxi, PR China.
| |
Collapse
|
7
|
Xie D, Gao Q, Yang M. Enhanced tensor low-rank representation learning for multi-view clustering. Neural Netw 2023; 161:93-104. [PMID: 36738492 DOI: 10.1016/j.neunet.2023.01.037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Revised: 09/27/2022] [Accepted: 01/24/2023] [Indexed: 01/30/2023]
Abstract
Multi-view subspace clustering (MSC), assuming the multi-view data are generated from a latent subspace, has attracted considerable attention in multi-view clustering. To recover the underlying subspace structure, a successful approach adopted recently is subspace clustering based on tensor nuclear norm (TNN). But there are some limitations to this approach that the existing TNN-based methods usually fail to exploit the intrinsic cluster structure and high-order correlations well, which leads to limited clustering performance. To address this problem, the main purpose of this paper is to propose a novel tensor low-rank representation (TLRR) learning method to perform multi-view clustering. First, we construct a 3rd-order tensor by organizing the features from all views, and then use the t-product in the tensor space to obtain the self-representation tensor of the tensorial data. Second, we use the ℓ1,2 norm to constrain the self-representation tensor to make it capture the class-specificity distribution, that is important for depicting the intrinsic cluster structure. And simultaneously, we rotate the self-representation tensor, and use the tensor singular value decomposition-based weighted TNN as a tighter tensor rank approximation to constrain the rotated tensor. For the challenged mathematical optimization problem, we present an effective optimization algorithm with a theoretical convergence guarantee and relatively low computation complexity. The constructed convergent sequence to the Karush-Kuhn-Tucker (KKT) critical point solution is mathematically validated in detail. We perform extensive experiments on four datasets and demonstrate that TLRR outperforms state-of-the-art multi-view subspace clustering methods.
Collapse
Affiliation(s)
- Deyan Xie
- School of Science and Information Science, Qingdao Agricultural University, Qingdao, China.
| | - Quanxue Gao
- School of Telecommunications Engineering, Xidian University, Xi'an, China.
| | - Ming Yang
- Mathematics department of the University of Evansville, Evansville, IN 47722, United States of America.
| |
Collapse
|
8
|
Pan B, Li C, Che H. Nonconvex low-rank tensor approximation with graph and consistent regularizations for multi-view subspace learning. Neural Netw 2023; 161:638-658. [PMID: 36827961 DOI: 10.1016/j.neunet.2023.02.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 10/27/2022] [Accepted: 02/09/2023] [Indexed: 02/16/2023]
Abstract
Multi-view clustering is widely used to improve clustering performance. Recently, the subspace clustering tensor learning method based on Markov chain is a crucial branch of multi-view clustering. Tensor learning is commonly used to apply tensor low-rank approximation to represent the relationships between data samples. However, most of the current tensor learning methods have the following shortcomings: the information of the local graph is not taken into account, the relationships between different views are not shown, and the existing tensor low-rank representation takes a biased tensor rank function for estimation. Therefore, a nonconvex low-rank tensor approximation with graph and consistent regularizations (NLRTGC) model is proposed for multi-view subspace learning. NLRTGC retains the local manifold information through graph regularization, and adopts a consistent regularization between multi-views to keep the diagonal block structure of representation matrices. Furthermore, a nonnegative nonconvex low-rank tensor kernel function is used to replace the existing classical tensor nuclear norm via tensor-singular value decomposition (t-SVD), so as to reduce the deviation from rank. Then, an alternating direction method of multipliers (ADMM) which makes the objective function monotonically non-increasing is proposed to solve NLRTGC. Finally, the effectiveness and superiority of the NLRTGC are shown through abundant comparative experiments with various state-of-the-art algorithms on noisy datasets and real world datasets.
Collapse
Affiliation(s)
- Baicheng Pan
- Chongqing Key Laboratory of Nonlinear Circuits and Intelligent Information Processing, College of Electronic and Information Engineering, Southwest University, Chongqing 400715, China
| | - Chuandong Li
- Chongqing Key Laboratory of Nonlinear Circuits and Intelligent Information Processing, College of Electronic and Information Engineering, Southwest University, Chongqing 400715, China.
| | - Hangjun Che
- Chongqing Key Laboratory of Nonlinear Circuits and Intelligent Information Processing, College of Electronic and Information Engineering, Southwest University, Chongqing 400715, China
| |
Collapse
|
9
|
Zheng T, Zhang Y, Wang Y. Dynamic guided metric representation learning for multi-view clustering. PeerJ Comput Sci 2022; 8:e922. [PMID: 35494795 PMCID: PMC9044235 DOI: 10.7717/peerj-cs.922] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Accepted: 02/16/2022] [Indexed: 06/14/2023]
Abstract
Multi-view clustering (MVC) is a mainstream task that aims to divide objects into meaningful groups from different perspectives. The quality of data representation is the key issue in MVC. A comprehensive meaningful data representation should be with the discriminant characteristics in a single view and the correlation of multiple views. Considering this, a novel framework called Dynamic Guided Metric Representation Learning for Multi-View Clustering (DGMRL-MVC) is proposed in this paper, which can cluster multi-view data in a learned latent discriminated embedding space. Specifically, in the framework, the data representation can be enhanced by multi-steps. Firstly, the class separability is enforced with Fisher Discriminant Analysis (FDA) within each single view, while the consistence among different views is enhanced based on Hilbert-Schmidt independence criteria (HSIC). Then, the 1st enhanced representation is obtained. In the second step, a dynamic routing mechanism is introduced, in which the location or direction information is added to fulfil the expression. After that, a generalized canonical correlation analysis (GCCA) model is used to get the final ultimate common discriminated representation. The learned fusion representation can substantially improve multi-view clustering performance. Experiments validated the effectiveness of the proposed method for clustering tasks.
Collapse
Affiliation(s)
- Tingyi Zheng
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, Shanxi, China
- Department of Electrical and control Engineering, Shanxi Institute of Energy, Jinzhong, Shanxi, China
| | - Yilin Zhang
- Software College, Taiyuan University of Technology, Taiyuan, Shanxi, China
| | - Yuhang Wang
- College of Data Science, Taiyuan University of Technology, Taiyuan, Shanxi, China
| |
Collapse
|
10
|
Yang B, Zhang X, Chen B, Nie F, Lin Z, Nan Z. Efficient correntropy-based multi-view clustering with anchor graph embedding. Neural Netw 2021; 146:290-302. [PMID: 34915413 DOI: 10.1016/j.neunet.2021.11.027] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Revised: 09/22/2021] [Accepted: 11/26/2021] [Indexed: 11/17/2022]
Abstract
Although multi-view clustering has received widespread attention due to its far superior performance to single-view clustering, it still faces the following issues: (1) high computational cost, considering the introduction of multi-view information, reduces the clustering efficiency greatly; (2) complex noises and outliers, existed in real-world data, pose a huge challenge to the robustness of clustering algorithms. Currently, how to increase the efficiency and robustness has become two important issues of multi-view clustering. To cope with the above issues, an efficient correntropy-based multi-view clustering algorithm (ECMC) is proposed in this paper, which can not only improve clustering efficiency by constructing embedded anchor graph and utilizing nonnegative matrix factorization (NMF), but also enhance the robustness by exploring correntropy to suppress various noises and outliers. To further improve clustering efficiency, one of the factors of NMF is constrained to be an indicator matrix instead of a traditional non-negative matrix, so that the categories of samples can be obtained directly without any extra operation. Subsequently, a novel half-quadratic-based strategy is proposed to optimize the non-convex objective function of ECMC. Finally, extensive experiments on eight real-world datasets and eighteen noisy datasets show that ECMC can guarantee faster speed and better robustness than other state-of-the-art multi-view clustering algorithms.
Collapse
Affiliation(s)
- Ben Yang
- Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China; National Engineering Laboratory for Visual Information Processing and Applications, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China
| | - Xuetao Zhang
- Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China; National Engineering Laboratory for Visual Information Processing and Applications, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China.
| | - Badong Chen
- Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China; National Engineering Laboratory for Visual Information Processing and Applications, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China
| | - Feiping Nie
- School of Computer Science, Northwestern Polytechnical University, 710072, Shaanxi, China; School of Artificial Intelligence, Optics and Electronics (iOPEN), Northwestern Polytechnical University, 710072, Shaanxi, China
| | - Zhiping Lin
- School of Electrical and Electronic Engineering, Nanyang Technology University, 639798, Singapore
| | - Zhixiong Nan
- Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China; National Engineering Laboratory for Visual Information Processing and Applications, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China
| |
Collapse
|
11
|
Xia W, Wang S, Yang M, Gao Q, Han J, Gao X. Multi-view graph embedding clustering network: Joint self-supervision and block diagonal representation. Neural Netw 2021; 145:1-9. [PMID: 34710786 DOI: 10.1016/j.neunet.2021.10.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2021] [Revised: 08/25/2021] [Accepted: 10/04/2021] [Indexed: 11/18/2022]
Abstract
Multi-view clustering has become an active topic in artificial intelligence. Yet, similar investigation for graph-structured data clustering has been absent so far. To fill this gap, we present a Multi-View Graph embedding Clustering network (MVGC). Specifically, unlike traditional multi-view construction methods, which are only suitable to describe Euclidean structure data, we leverage Euler transform to augment the node attribute, as a new view descriptor, for non-Euclidean structure data. Meanwhile, we impose block diagonal representation constraint, which is measured by the ℓ1,2-norm, on self-expression coefficient matrix to well explore the cluster structure. By doing so, the learned view-consensus coefficient matrix well encodes the discriminative information. Moreover, we make use of the learned clustering labels to guide the learnings of node representation and coefficient matrix, where the latter is used in turn to conduct the subsequent clustering. In this way, clustering and representation learning are seamlessly connected, with the aim to achieve better clustering performance. Extensive experimental results indicate that MVGC is superior to 11 state-of-the-art methods on four benchmark datasets. In particular, MVGC achieves an Accuracy of 96.17% (53.31%) on the ACM (IMDB) dataset, which is an up to 2.85% (1.97%) clustering performance improvement compared with the strongest baseline.
Collapse
Affiliation(s)
- Wei Xia
- State Key Laboratory of Integrated Services Networks, Xidian University, Shaanxi 710071, China
| | - Sen Wang
- Beijing Aerospace Automatic Control Institute, Beijing 100854, China
| | - Ming Yang
- Departments of Mathematics and Computer & Information Science, Westfield State University, Westfield, MA 01086, United States of America
| | - Quanxue Gao
- State Key Laboratory of Integrated Services Networks, Xidian University, Shaanxi 710071, China.
| | - Jungong Han
- Computer Science Department, Aberystwyth University, SY23 3FL, United Kingdom
| | - Xinbo Gao
- Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
| |
Collapse
|
12
|
Yin H, Hu W, Zhang Z, Lou J, Miao M. Incremental multi-view spectral clustering with sparse and connected graph learning. Neural Netw 2021; 144:260-270. [PMID: 34520936 DOI: 10.1016/j.neunet.2021.08.031] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Revised: 06/01/2021] [Accepted: 08/26/2021] [Indexed: 10/20/2022]
Abstract
In recent years, a lot of excellent multi-view clustering methods have been proposed. Because most of them need to fuse all views at one time, they are infeasible as the number of views increases over time. If the present multi-view clustering methods are employed directly to re-fuse all views at each time, it is too expensive to store all historical views. In this paper, we proposed an efficient incremental multi-view spectral clustering method with sparse and connected graph learning (SCGL). In our method, only one consensus similarity matrix is stored to represent the structural information of all historical views. Once the newly collected view is available, the consensus similarity matrix is reconstructed by learning from its previous version and the current new view. To further improve the incremental multi-view clustering performance, the sparse graph learning and the connected graph learning are integrated into our model, which can not only reduce the noises, but also preserve the correct connections within clusters. Experiments on several multi-view datasets demonstrate that our method is superior to traditional methods in clustering accuracy, and is more suitable to deal with the multi-view clustering with the number of views increasing over time.
Collapse
Affiliation(s)
- Hongwei Yin
- School of Information Engineering, Huzhou University, Hu'zhou 313000, China; Zhejiang Province Key Laboratory of Smart Management & Application of Modern Agricultural Resources, Huzhou University, Hu'zhou 313000, China.
| | - Wenjun Hu
- School of Information Engineering, Huzhou University, Hu'zhou 313000, China; Zhejiang Province Key Laboratory of Smart Management & Application of Modern Agricultural Resources, Huzhou University, Hu'zhou 313000, China.
| | - Zhao Zhang
- School of Computer Science and Information Engineering & Key Laboratory of Knowledge Engineering with Big Data (Ministry of Education), Hefei University of Technology, He'fei 230009, China
| | - Jungang Lou
- School of Information Engineering, Huzhou University, Hu'zhou 313000, China; Zhejiang Province Key Laboratory of Smart Management & Application of Modern Agricultural Resources, Huzhou University, Hu'zhou 313000, China
| | - Minmin Miao
- School of Information Engineering, Huzhou University, Hu'zhou 313000, China; Zhejiang Province Key Laboratory of Smart Management & Application of Modern Agricultural Resources, Huzhou University, Hu'zhou 313000, China
| |
Collapse
|
13
|
Tian J, Zhao J, Zheng C. Clustering of cancer data based on Stiefel manifold for multiple views. BMC Bioinformatics 2021; 22:268. [PMID: 34034643 PMCID: PMC8152349 DOI: 10.1186/s12859-021-04195-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2021] [Accepted: 05/12/2021] [Indexed: 12/23/2022] Open
Abstract
Background In recent years, various sequencing techniques have been used to collect biomedical omics datasets. It is usually possible to obtain multiple types of omics data from a single patient sample. Clustering of omics data plays an indispensable role in biological and medical research, and it is helpful to reveal data structures from multiple collections. Nevertheless, clustering of omics data consists of many challenges. The primary challenges in omics data analysis come from high dimension of data and small size of sample. Therefore, it is difficult to find a suitable integration method for structural analysis of multiple datasets. Results In this paper, a multi-view clustering based on Stiefel manifold method (MCSM) is proposed. The MCSM method comprises three core steps. Firstly, we established a binary optimization model for the simultaneous clustering problem. Secondly, we solved the optimization problem by linear search algorithm based on Stiefel manifold. Finally, we integrated the clustering results obtained from three omics by using k-nearest neighbor method. We applied this approach to four cancer datasets on TCGA. The result shows that our method is superior to several state-of-art methods, which depends on the hypothesis that the underlying omics cluster class is the same. Conclusion Particularly, our approach has better performance than compared approaches when the underlying clusters are inconsistent. For patients with different subtypes, both consistent and differential clusters can be identified at the same time.
Collapse
Affiliation(s)
- Jing Tian
- College of Mathematics and System Sciences, Xinjiang University, Urumqi, China
| | - Jianping Zhao
- College of Mathematics and System Sciences, Xinjiang University, Urumqi, China.
| | - Chunhou Zheng
- College of Mathematics and System Sciences, Xinjiang University, Urumqi, China.,School of Computer Science and Technology, Anhui University, Hefei, China
| |
Collapse
|
14
|
Huang Z, Ren Y, Pu X, Pan L, Yao D, Yu G. Dual self-paced multi-view clustering. Neural Netw 2021; 140:184-192. [PMID: 33770727 DOI: 10.1016/j.neunet.2021.02.022] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 01/17/2021] [Accepted: 02/18/2021] [Indexed: 10/22/2022]
Abstract
By utilizing the complementary information from multiple views, multi-view clustering (MVC) algorithms typically achieve much better clustering performance than conventional single-view methods. Although in this field, great progresses have been made in past few years, most existing multi-view clustering methods still suffer the following shortcomings: (1) most MVC methods are non-convex and thus are easily stuck into suboptimal local minima; (2) the effectiveness of these methods is sensitive to the existence of noises or outliers; and (3) the qualities of different features and views are usually ignored, which can also influence the clustering result. To address these issues, we propose dual self-paced multi-view clustering (DSMVC) in this paper. Specifically, DSMVC takes advantage of self-paced learning to tackle the non-convex issue. By applying a soft-weighting scheme of self-paced learning for instances, the negative impact caused by noises and outliers can be significantly reduced. Moreover, to alleviate the feature and view quality issues, we develop a novel feature selection approach in a self-paced manner and a weighting term for views. Experimental results on real-world data sets demonstrate the effectiveness of the proposed method.
Collapse
Affiliation(s)
- Zongmo Huang
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Yazhou Ren
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China.
| | - Xiaorong Pu
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Lili Pan
- School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China; Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| | - Dezhong Yao
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Laboratory for Neuroinformation, University of Electronic Science and Technology of China, Chengdu 611731, China; Research Unit of NeuroInformation, Chinese Academy of Medical Sciences, 2019RU035, Chengdu, China; School of Electrical Engineering, Zhengzhou University, Zhengzhou 450001, China
| | - Guoxian Yu
- School of Software, Shandong University, Jinan 250101, China
| |
Collapse
|
15
|
Zhang X, Yang Y, Li T, Zhang Y, Wang H, Fujita H. CMC: A consensus multi-view clustering model for predicting Alzheimer's disease progression. Comput Methods Programs Biomed 2021; 199:105895. [PMID: 33341477 DOI: 10.1016/j.cmpb.2020.105895] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Accepted: 11/29/2020] [Indexed: 06/12/2023]
Abstract
Machine learning has been used in the past for the auxiliary diagnosis of Alzheimer's Disease (AD). However, most existing technologies only explore single-view data, require manual parameter setting and focus on two-class (i.e., dementia or not) classification problems. Unlike single-view data, multi-view data provide more powerful feature representation capability. Learning with multi-view data is referred to as multi-view learning, which has received certain attention in recent years. In this paper, we propose a new multi-view clustering model called Consensus Multi-view Clustering (CMC) based on nonnegative matrix factorization for predicting the multiple stages of AD progression. The proposed CMC performs multi-view learning idea to fully capture data features with limited medical images, approaches similarity relations between different entities, addresses the shortcoming from multi-view fusion that requires manual setting parameters, and further acquires a consensus representation containing shared features and complementary knowledge of multiple view data. It not only can improve the predication performance of AD, but also can screen and classify the symptoms of different AD's phases. Experimental results using data with twelve views constructed by brain Magnetic Resonance Imaging (MRI) database from Alzheimer's Disease Neuroimaging Initiative expound and prove the effectiveness of the proposed model.
Collapse
Affiliation(s)
- Xiaobo Zhang
- School of Information Science and Technology, Southwest Jiaotong University, Chengdu 611756, China; Institute of Artificial Intelligence, Southwest Jiaotong University, Chengdu 611756, China; National Engineering Laboratory of Integrated Transportation Big Data Application Technology, Southwest Jiaotong University, Chengdu 611756, China
| | - Yan Yang
- School of Information Science and Technology, Southwest Jiaotong University, Chengdu 611756, China; Institute of Artificial Intelligence, Southwest Jiaotong University, Chengdu 611756, China; National Engineering Laboratory of Integrated Transportation Big Data Application Technology, Southwest Jiaotong University, Chengdu 611756, China.
| | - Tianrui Li
- School of Information Science and Technology, Southwest Jiaotong University, Chengdu 611756, China; Institute of Artificial Intelligence, Southwest Jiaotong University, Chengdu 611756, China; National Engineering Laboratory of Integrated Transportation Big Data Application Technology, Southwest Jiaotong University, Chengdu 611756, China
| | - Yiling Zhang
- School of Information Science and Technology, Southwest Jiaotong University, Chengdu 611756, China; Institute of Artificial Intelligence, Southwest Jiaotong University, Chengdu 611756, China; National Engineering Laboratory of Integrated Transportation Big Data Application Technology, Southwest Jiaotong University, Chengdu 611756, China
| | - Hao Wang
- School of Information Science and Technology, Southwest Jiaotong University, Chengdu 611756, China; Institute of Artificial Intelligence, Southwest Jiaotong University, Chengdu 611756, China; National Engineering Laboratory of Integrated Transportation Big Data Application Technology, Southwest Jiaotong University, Chengdu 611756, China
| | - Hamido Fujita
- Faculty of Software and Information Science, Iwate Prefectural University, Iwate, Japan
| |
Collapse
|
16
|
Pfeifer B, Schimek MG. A hierarchical clustering and data fusion approach for disease subtype discovery. J Biomed Inform 2020; 113:103636. [PMID: 33271342 DOI: 10.1016/j.jbi.2020.103636] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Revised: 11/10/2020] [Accepted: 11/26/2020] [Indexed: 11/17/2022]
Abstract
Recent advances in multi-omics clustering methods enable a more fine-tuned separation of cancer patients into clinical relevant clusters. These advancements have the potential to provide a deeper understanding of cancer progression and may facilitate the treatment of cancer patients. Here, we present a simple hierarchical clustering and data fusion approach, named HC-fused, for the detection of disease subtypes. Unlike other methods, the proposed approach naturally reports on the individual contribution of each single-omic to the data fusion process. We perform multi-view simulations with disjoint and disjunct cluster elements across the views to highlight fundamentally different data integration behavior of various state-of-the-art methods. HC-fused combines the strengths of some recently published methods and shows superior performance on real world cancer data from the TCGA (The Cancer Genome Atlas) database. An R implementation of our method is available on GitHub (pievos101/HC-fused).
Collapse
Affiliation(s)
- Bastian Pfeifer
- Research Unit of Statistical Bioinformatics, Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Graz, Austria.
| | - Michael G Schimek
- Research Unit of Statistical Bioinformatics, Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Graz, Austria
| |
Collapse
|
17
|
Ma Y, Zhao J, Ma Y. MHSNMF: multi-view hessian regularization based symmetric nonnegative matrix factorization for microbiome data analysis. BMC Bioinformatics 2020; 21:234. [PMID: 33203357 PMCID: PMC7672850 DOI: 10.1186/s12859-020-03555-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Accepted: 05/25/2020] [Indexed: 12/05/2022] Open
Abstract
BACKGROUND With the rapid development of high-throughput technique, multiple heterogeneous omics data have been accumulated vastly (e.g., genomics, proteomics and metabolomics data). Integrating information from multiple sources or views is challenging to obtain a profound insight into the complicated relations among micro-organisms, nutrients and host environment. In this paper we propose a multi-view Hessian regularization based symmetric nonnegative matrix factorization algorithm (MHSNMF) for clustering heterogeneous microbiome data. Compared with many existing approaches, the advantages of MHSNMF lie in: (1) MHSNMF combines multiple Hessian regularization to leverage the high-order information from the same cohort of instances with multiple representations; (2) MHSNMF utilities the advantages of SNMF and naturally handles the complex relationship among microbiome samples; (3) uses the consensus matrix obtained by MHSNMF, we also design a novel approach to predict the classification of new microbiome samples. RESULTS We conduct extensive experiments on two real-word datasets (Three-source dataset and Human Microbiome Plan dataset), the experimental results show that the proposed MHSNMF algorithm outperforms other baseline and state-of-the-art methods. Compared with other methods, MHSNMF achieves the best performance (accuracy: 95.28%, normalized mutual information: 91.79%) on microbiome data. It suggests the potential application of MHSNMF in microbiome data analysis. CONCLUSIONS Results show that the proposed MHSNMF algorithm can effectively combine the phylogenetic, transporter, and metabolic profiles into a unified paradigm to analyze the relationships among different microbiome samples. Furthermore, the proposed prediction method based on MHSNMF has been shown to be effective in judging the types of new microbiome samples.
Collapse
Affiliation(s)
- Yuanyuan Ma
- School of Computer & Information Engineering, Anyang Normal University, Anyang, China.
| | - Junmin Zhao
- School of Computer & Data Science, Henan University of Urban Construction, Pingdingshan, China
| | - Yingjun Ma
- School of Computer, Central China Normal, Wuhan, China
| |
Collapse
|
18
|
Chaari N, Akdağ HC, Rekik I. Estimation of gender-specific connectional brain templates using joint multi-view cortical morphological network integration. Brain Imaging Behav 2020; 15:2081-2100. [PMID: 33089469 PMCID: PMC8413178 DOI: 10.1007/s11682-020-00404-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/21/2020] [Indexed: 12/02/2022]
Abstract
The estimation of a connectional brain template (CBT) integrating a population of brain networks while capturing shared and differential connectional patterns across individuals remains unexplored in gender fingerprinting. This paper presents the first study to estimate gender-specific CBTs using multi-view cortical morphological networks (CMNs) estimated from conventional T1-weighted magnetic resonance imaging (MRI). Specifically, each CMN view is derived from a specific cortical attribute (e.g. thickness), encoded in a network quantifying the dissimilarity in morphology between pairs of cortical brain regions. To this aim, we propose Multi-View Clustering and Fusion Network (MVCF-Net), a novel multi-view network fusion method, which can jointly identify consistent and differential clusters of multi-view datasets in order to capture simultaneously similar and distinct connectional traits of samples. Our MVCF-Net method estimates a representative and well-centered CBTs for male and female populations, independently, to eventually identify their fingerprinting regions of interest (ROIs) in four main steps. First, we perform multi-view network clustering model based on manifold optimization which groups CMNs into shared and differential clusters while preserving their alignment across views. Second, for each view, we linearly fuse CMNs belonging to each cluster, producing local CBTs. Third, for each cluster, we non-linearly integrate the local CBTs across views, producing a cluster-specific CBT. Finally, by linearly fusing the cluster-specific centers we estimate a final CBT of the input population. MVCF-Net produced the most centered and representative CBTs for male and female populations and identified the most discriminative ROIs marking gender differences. The most two gender-discriminative ROIs involved the lateral occipital cortex and pars opercularis in the left hemisphere and the middle temporal gyrus and lingual gyrus in the right hemisphere.
Collapse
Affiliation(s)
- Nada Chaari
- BASIRA Lab, Faculty of Computer and Informatics, Istanbul Technical University, Istanbul, Turkey
| | | | - Islem Rekik
- BASIRA Lab, Faculty of Computer and Informatics, Istanbul Technical University, Istanbul, Turkey. .,Computing, School of Science and Engineering, University of Dundee, Dundee, UK.
| |
Collapse
|
19
|
Xie D, Gao Q, Deng S, Yang X, Gao X. Multiple graphs learning with a new weighted tensor nuclear norm. Neural Netw 2020; 133:57-68. [PMID: 33125918 DOI: 10.1016/j.neunet.2020.10.010] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Revised: 09/04/2020] [Accepted: 10/16/2020] [Indexed: 11/17/2022]
Abstract
As an effective convex relaxation of the rank minimization model, the tensor nuclear norm minimization based multi-view clustering methods have been attracting more and more interest in recent years. However, most existing clustering methods regularize each singular value equally, restricting their capability and flexibility in tackling many practical problems, where the singular values should be treated differently. To address this problem, we propose a novel weighted tensor nuclear norm minimization (WTNNM) based method for multi-view spectral clustering. Specifically, we firstly calculate a set of transition probability matrices from different views, and construct a 3-order tensor whose lateral slices are composed of probability matrices. Secondly, we learn a latent high-order transition probability matrix by using our proposed weighted tensor nuclear norm, which directly considers the prior knowledge of singular values. Finally, clustering is performed on the learned transition probability matrix, which well characterizes both the complementary information and high-order information embedded in multi-view data. An efficient optimization algorithm is designed to solve the optimal solution. Extensive experiments on five benchmarks demonstrate that our method outperforms the state-of-the-art methods.
Collapse
Affiliation(s)
- Deyan Xie
- Qingdao Agricultural University, Qingdao, China; State Key Laboratory of Integrated Services Networks, Xidian University, Xi'an, China.
| | - Quanxue Gao
- State Key Laboratory of Integrated Services Networks, Xidian University, Xi'an, China.
| | - Siyang Deng
- State Key Laboratory of Integrated Services Networks, Xidian University, Xi'an, China
| | - Xiaojun Yang
- Guangdong University of Technology, Guangzhou, China
| | - Xinbo Gao
- State Key Laboratory of Integrated Services Networks, Xidian University, Xi'an, China
| |
Collapse
|
20
|
Araújo AFR, Antonino VO, Ponce-Guevara KL. Self-organizing subspace clustering for high-dimensional and multi-view data. Neural Netw 2020; 130:253-268. [PMID: 32711348 DOI: 10.1016/j.neunet.2020.06.022] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Revised: 04/30/2020] [Accepted: 06/28/2020] [Indexed: 12/14/2022]
Abstract
A surge in the availability of data from multiple sources and modalities is correlated with advances in how to obtain, compress, store, transfer, and process large amounts of complex high-dimensional data. The clustering challenge increases with the growth of data dimensionality which decreases the discriminate power of the distance metrics. Subspace clustering aims to group data drawn from a union of subspaces. In such a way, there is a large number of state-of-the-art approaches and we divide them into families regarding the method used in the clustering. We introduce a soft subspace clustering algorithm, a Self-organizing Map (SOM) with a time-varying structure, to cluster data without any prior knowledge of the number of categories or of the neural network topology, both determined during the training process. The model also assigns proper relevancies (weights) to different dimensions, capturing from the learning process the influence of each dimension on uncovering clusters. We employ a number of real-world datasets to validate the model. This algorithm presents a competitive performance in a diverse range of contexts among them data mining, gene expression, multi-view, computer vision and text clustering problems which include high-dimensional data. Extensive experiments suggest that our method very often outperforms the state-of-the-art approaches in all types of problems considered.
Collapse
Affiliation(s)
- Aluizio F R Araújo
- Centro de Informática, Universidade Federal de Pernambuco, 50740560, Recife, Brazil.
| | - Victor O Antonino
- Centro de Informática, Universidade Federal de Pernambuco, 50740560, Recife, Brazil
| | | |
Collapse
|
21
|
Abstract
BACKGROUND In the field of computational biology, analyzing complex data helps to extract relevant biological information. Sample classification of gene expression data is one such popular bio-data analysis technique. However, the presence of a large number of irrelevant/redundant genes in expression data makes a sample classification algorithm working inefficiently. Feature selection is one such high-dimensionality reduction technique that helps to maximize the effectiveness of any sample classification algorithm. Recent advances in biotechnology have improved the biological data to include multi-modal or multiple views. Different 'omics' resources capture various equally important biological properties of entities. However, most of the existing feature selection methodologies are biased towards considering only one out of multiple biological resources. Consequently, some crucial aspects of available biological knowledge may get ignored, which could further improve feature selection efficiency. RESULTS In this present work, we have proposed a Consensus Multi-View Multi-objective Clustering-based feature selection algorithm called CMVMC. Three controlled genomic and proteomic resources like gene expression, Gene Ontology (GO), and protein-protein interaction network (PPIN) are utilized to build two independent views. The concept of multi-objective consensus clustering has been applied within our proposed gene selection method to satisfy both incorporated views. Gene expression data sets of Multiple tissues and Yeast from two different organisms (Homo Sapiens and Saccharomyces cerevisiae, respectively) are chosen for experimental purposes. As the end-product of CMVMC, a reduced set of relevant and non-redundant genes are found for each chosen data set. These genes finally participate in an effective sample classification. CONCLUSIONS The experimental study on chosen data sets shows that our proposed feature-selection method improves the sample classification accuracy and reduces the gene-space up to a significant level. In the case of Multiple Tissues data set, CMVMC reduces the number of genes (features) from 5565 to 41, with 92.73% of sample classification accuracy. For Yeast data set, the number of genes got reduced to 10 from 2884, with 95.84% sample classification accuracy. Two internal cluster validity indices - Silhouette and Davies-Bouldin (DB) and one external validity index Classification Accuracy (CA) are chosen for comparative study. Reported results are further validated through well-known biological significance test and visualization tool.
Collapse
Affiliation(s)
- Sudipta Acharya
- Big Data Institute, College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, PR China
| | - Laizhong Cui
- Big Data Institute, College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, PR China
| | - Yi Pan
- Department of Computer Science, Georgia State University, Atlanta, USA
| |
Collapse
|
22
|
Guo Y, Li H, Cai M, Li L. Integrative subspace clustering by common and specific decomposition for applications on cancer subtype identification. BMC Med Genomics 2019; 12:191. [PMID: 31874642 PMCID: PMC6929329 DOI: 10.1186/s12920-019-0633-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Accepted: 11/19/2019] [Indexed: 01/01/2023] Open
Abstract
Background Recent high throughput technologies have been applied for collecting heterogeneous biomedical omics datasets. Computational analysis of the multi-omics datasets could potentially reveal deep insights for a given disease. Most existing clustering methods by multi-omics data assume strong consistency among different sources of datasets, and thus may lose efficacy when the consistency is relatively weak. Furthermore, they could not identify the conflicting parts for each view, which might be important in applications such as cancer subtype identification. Methods In this work, we propose an integrative subspace clustering method (ISC) by common and specific decomposition to identify clustering structures with multi-omics datasets. The main idea of our ISC method is that the original representations for the samples in each view could be reconstructed by the concatenation of a common part and a view-specific part in orthogonal subspaces. The problem can be formulated as a matrix decomposition problem and solved efficiently by our proposed algorithm. Results The experiments on simulation and text datasets show that our method outperforms other state-of-art methods. Our method is further evaluated by identifying cancer types using a colorectal dataset. We finally apply our method to cancer subtype identification for five cancers using TCGA datasets, and the survival analysis shows that the subtypes we found are significantly better than other compared methods. Conclusion We conclude that our ISC model could not only discover the weak common information across views but also identify the view-specific information.
Collapse
Affiliation(s)
- Yin Guo
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xianning West 28, Xi'an, China
| | - Huiran Li
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xianning West 28, Xi'an, China
| | - Menglan Cai
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xianning West 28, Xi'an, China
| | - Limin Li
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xianning West 28, Xi'an, China.
| |
Collapse
|
23
|
Zong L, Zhang X, Liu X. Multi-view clustering on unmapped data via constrained non-negative matrix factorization. Neural Netw 2018; 108:155-71. [PMID: 30199782 DOI: 10.1016/j.neunet.2018.08.011] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2017] [Revised: 07/03/2018] [Accepted: 08/07/2018] [Indexed: 11/23/2022]
Abstract
Existing multi-view clustering algorithms require that the data is completely or partially mapped between each pair of views. However, this requirement could not be satisfied in many practical settings. In this paper, we tackle the problem of multi-view clustering on unmapped data in the framework of NMF based clustering. With the help of inter-view constraints, we define the disagreement between each pair of views by the fact that the indicator vectors of two samples from two different views should be similar if they belong to the same cluster and dissimilar otherwise. The overall objective of our algorithm is to minimize the loss function of NMF in each view as well as the disagreement between each pair of views. Furthermore, we provide an active inter-view constraints selection strategy which tries to query the relationships between samples that are the most influential and samples that are the farthest from the existing constraint set. Experimental results show that, with a small number of (either randomly selected or actively selected) constraints, the proposed algorithm performs well on unmapped data, and outperforms the baseline algorithms on partially mapped data and completely mapped data.
Collapse
|
24
|
Qian P, Zhou J, Jiang Y, Liang F, Zhao K, Wang S, Su KH, Muzic RF. Multi-View Maximum Entropy Clustering by Jointly Leveraging Inter-View Collaborations and Intra-View-Weighted Attributes. IEEE Access 2018; 6:28594-28610. [PMID: 31289704 PMCID: PMC6615759 DOI: 10.1109/access.2018.2825352] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
As a dedicated countermeasure for heterogeneous multi-view data, multi-view clustering is currently a hot topic in machine learning. However, many existing methods either neglect the effective collaborations among views during clustering or do not distinguish the respective importance of attributes in views, instead treating them equivalently. Motivated by such challenges, based on maximum entropy clustering (MEC), two specialized criteria-inter-view collaborative learning (IEVCL) and intra-view-weighted attributes (IAVWA)-are first devised as the bases. Then, by organically incorporating IEVCL and IAVWA into the formulation of classic MEC, a novel, collaborative multi-view clustering model and the matching algorithm referred to as the view-collaborative, attribute-weighted MEC (VC-AW-MEC) are proposed. The significance of our efforts is three-fold: 1) both IEVCL and IAVWA are dedicatedly devised based on MEC so that the proposed VC-AW-MEC is qualified to effectively handle as many multi-view data scenes as possible; 2) IEVCL is competent in seeking the consensus across all involved views throughout clustering, whereas IAVWA is capable of adaptively discriminating the individual impact regarding the attributes within each view; and 3) benefiting from jointly leveraging IEVCL and IAVWA, compared with some existing state-of-the-art approaches, the proposed VC-AW-MEC algorithm generally exhibits preferable clustering effectiveness and stability on heterogeneous multi-view data. Our efforts have been verified in many synthetic or real-world multi-view data scenes.
Collapse
Affiliation(s)
- Pengjiang Qian
- School of Digital Media, Jiangnan University, Wuxi 214122, China
- Department of Radiology, University Hospitals, Case Western Reserve University, Cleveland, OH 44106, USA
- Case Center for Imaging Research, University Hospitals, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Jiaxu Zhou
- School of Digital Media, Jiangnan University, Wuxi 214122, China
| | - Yizhang Jiang
- School of Digital Media, Jiangnan University, Wuxi 214122, China
| | - Fan Liang
- Department of Radiology, University Hospitals, Case Western Reserve University, Cleveland, OH 44106, USA
- Case Center for Imaging Research, University Hospitals, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Kaifa Zhao
- School of Digital Media, Jiangnan University, Wuxi 214122, China
| | - Shitong Wang
- School of Digital Media, Jiangnan University, Wuxi 214122, China
| | - Kuan-Hao Su
- Department of Radiology, University Hospitals, Case Western Reserve University, Cleveland, OH 44106, USA
- Case Center for Imaging Research, University Hospitals, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Raymond F Muzic
- Department of Radiology, University Hospitals, Case Western Reserve University, Cleveland, OH 44106, USA
- Case Center for Imaging Research, University Hospitals, Case Western Reserve University, Cleveland, OH 44106, USA
| |
Collapse
|
25
|
Abstract
Background The Cancer Genome Atlas (TCGA) has collected transcriptome, genome and epigenome information for over 20 cancers from thousands of patients. The availability of these diverse data types makes it necessary to combine these data to capture the heterogeneity of biological processes and phenotypes and further identify homogeneous subtypes for cancers such as breast cancer. Many multi-view clustering approaches are proposed to discover clusters across different data types. The problem is challenging when different data types show poor agreement of clustering structure. Results In this work, we first propose a multi-view clustering approach with consensus (CMC), which tries to find consensus kernels among views by using Hilbert Schmidt Independence Criterion. To tackle the problem when poor agreement among views exists, we further propose a multi-view clustering approach with enhanced consensus (ECMC) to solve this problem by decomposing the kernel information in each view into a consensus part and a disagreement part. The consensus parts for different views are supposed to be similar, and the disagreement parts should be independent with the consensus parts. Both the CMC and ECMC models can be solved by alternative updating with semi-definite programming. Our experiments on both simulation datasets and real-world benchmark datasets show that ECMC model could achieve higher clustering accuracies than other state-of-art multi-view clustering approaches. We also apply the ECMC model to integrate mRNA expression, DNA methylation and microRNA (miRNA) expression data for five cancer data sets, and the survival analysis show that our ECMC model outperforms other methods when identifying cancer subtypes. By Fisher’s combination test method, we found that three computed subtypes roughly correspond to three known breast cancer subtypes including luminal B, HER2 and basal-like subtypes. Conclusion Integrating heterogeneous TCGA datasets by our proposed multi-view clustering approach ECMC could effectively identify cancer subtypes.
Collapse
Affiliation(s)
- Menglan Cai
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xianning West 28, Xi'an, China
| | - Limin Li
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xianning West 28, Xi'an, China.
| |
Collapse
|