1
|
Shi Z, Chen L, Ding W, Zhong X, Wu Z, Chen GY, Zhang C, Wang Y, Chen CLP. IFKMHC: Implicit Fuzzy K-Means Model for High-Dimensional Data Clustering. IEEE TRANSACTIONS ON CYBERNETICS 2024; 54:7955-7968. [PMID: 38814762 DOI: 10.1109/tcyb.2024.3391274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
The graph-information-based fuzzy clustering has shown promising results in various datasets. However, its performance is hindered when dealing with high-dimensional data due to challenges related to redundant information and sensitivity to the similarity matrix design. To address these limitations, this article proposes an implicit fuzzy k-means (FKMs) model that enhances graph-based fuzzy clustering for high-dimensional data. Instead of explicitly designing a similarity matrix, our approach leverages the fuzzy partition result obtained from the implicit FKMs model to generate an effective similarity matrix. We employ a projection-based technique to handle redundant information, eliminating the need for specific feature extraction methods. By formulating the fuzzy clustering model solely based on the similarity matrix derived from the membership matrix, we mitigate issues, such as dependence on initial values and random fluctuations in clustering results. This innovative approach significantly improves the competitiveness of graph-enhanced fuzzy clustering for high-dimensional data. We present an efficient iterative optimization algorithm for our model and demonstrate its effectiveness through theoretical analysis and experimental comparisons with other state-of-the-art methods, showcasing its superior performance.
Collapse
|
2
|
Zhang H, Li P, Zhang R, Li X. Embedding Graph Auto-Encoder for Graph Clustering. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:9352-9362. [PMID: 35333721 DOI: 10.1109/tnnls.2022.3158654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Graph clustering, aiming to partition nodes of a graph into various groups via an unsupervised approach, is an attractive topic in recent years. To improve the representative ability, several graph auto-encoder (GAE) models, which are based on semisupervised graph convolution networks (GCN), have been developed and they have achieved impressive results compared with traditional clustering methods. However, all existing methods either fail to utilize the orthogonal property of the representations generated by GAE or separate the clustering and the training of neural networks. We first prove that the relaxed k -means will obtain an optimal partition in the inner-product distance used space. Driven by theoretical analysis about relaxed k -means, we design a specific GAE-based model for graph clustering to be consistent with the theory, namely Embedding GAE (EGAE). The learned representations are well explainable so that the representations can be also used for other tasks. To induce the neural network to produce deep features that are appropriate for the specific clustering model, the relaxed k -means and GAE are learned simultaneously. Meanwhile, the relaxed k -means can be equivalently regarded as a decoder that attempts to learn representations that can be linearly constructed by some centroid vectors. Accordingly, EGAE consists of one encoder and dual decoders. Extensive experiments are conducted to prove the superiority of EGAE and the corresponding theoretical analyses.
Collapse
|
3
|
Zhao P, Zhang Y, Ma Y, Zhao X, Fan X. Discriminatively embedded fuzzy K-Means clustering with feature selection strategy. APPL INTELL 2023. [DOI: 10.1007/s10489-022-04376-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2023]
|
4
|
Wang J, Xie F, Nie F, Li X. Unsupervised Adaptive Embedding for Dimensionality Reduction. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:6844-6855. [PMID: 34101602 DOI: 10.1109/tnnls.2021.3083695] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
High-dimensional data are highly correlative and redundant, making it difficult to explore and analyze. Amount of unsupervised dimensionality reduction (DR) methods has been proposed, in which constructing a neighborhood graph is the primary step of DR methods. However, there exist two problems: 1) the construction of graph is usually separate from the selection of projection direction and 2) the original data are inevitably noisy. In this article, we propose an unsupervised adaptive embedding (UAE) method for DR to solve these challenges, which is a linear graph-embedding method. First, an adaptive allocation method of neighbors is proposed to construct the affinity graph. Second, the construction of affinity graph and calculation of projection matrix are integrated together. It considers the local relationship between samples and global characteristic of high-dimensional data, in which the cleaned data matrix is originally proposed to remove noise in subspace. The relationship between our method and local preserving projections (LPPs) is also explored. Finally, an alternative iteration optimization algorithm is derived to solve our model, the convergence and computational complexity of which are also analyzed. Comprehensive experiments on synthetic and benchmark datasets illustrate the superiority of our method.
Collapse
|
5
|
Wang J, Ma Z, Nie F, Li X. Progressive Self-Supervised Clustering With Novel Category Discovery. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:10393-10406. [PMID: 33878003 DOI: 10.1109/tcyb.2021.3069836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
These days, clustering is one of the most classical themes to analyze data structures in machine learning and pattern recognition. Recently, the anchor-based graph has been widely adopted to promote the clustering accuracy of plentiful graph-based clustering techniques. In order to achieve more satisfying clustering performance, we propose a novel clustering approach referred to as the progressive self-supervised clustering method with novel category discovery (PSSCNCD), which consists of three separate procedures specifically. First, we propose a new semisupervised framework with novel category discovery to guide label propagation processing, which is reinforced by the parameter-insensitive anchor-based graph obtained from balanced K -means and hierarchical K -means (BKHK). Second, we design a novel representative point selected strategy based on our semisupervised framework to discover each representative point and endow pseudolabel progressively, where every pseudolabel hypothetically corresponds to a real category in each self-supervised label propagation. Third, when sufficient representative points have been found, the labels of all samples will be finally predicted to obtain terminal clustering results. In addition, the experimental results on several toy examples and benchmark data sets comprehensively demonstrate that our method outperforms other clustering approaches.
Collapse
|
6
|
Zhang R, Zhang H, Li X. Maximum Joint Probability With Multiple Representations for Clustering. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:4300-4310. [PMID: 33577461 DOI: 10.1109/tnnls.2021.3056420] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Classical generative models in unsupervised learning intend to maximize p(X) . In practice, samples may have multiple representations caused by various transformations, measurements, and so on. Therefore, it is crucial to integrate information from different representations, and lots of models have been developed. However, most of them fail to incorporate the prior information about data distribution p(X) to distinguish representations. In this article, we propose a novel clustering framework that attempts to maximize the joint probability of data and parameters. Under this framework, the prior distribution can be employed to measure the rationality of diverse representations. K -means is a special case of the proposed framework. Meanwhile, a specific clustering model considering both multiple kernels and multiple views is derived to verify the validity of the designed framework and model.
Collapse
|
7
|
Wang J, Ma Z, Nie F, Li X. Fast Self-Supervised Clustering With Anchor Graph. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:4199-4212. [PMID: 33587715 DOI: 10.1109/tnnls.2021.3056080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Benefit from avoiding the utilization of labeled samples, which are usually insufficient in the real world, unsupervised learning has been regarded as a speedy and powerful strategy on clustering tasks. However, clustering directly from primal data sets leads to high computational cost, which limits its application on large-scale and high-dimensional problems. Recently, anchor-based theories are proposed to partly mitigate this problem and field naturally sparse affinity matrix, while it is still a challenge to get excellent performance along with high efficiency. To dispose of this issue, we first presented a fast semisupervised framework (FSSF) combined with a balanced K -means-based hierarchical K -means (BKHK) method and the bipartite graph theory. Thereafter, we proposed a fast self-supervised clustering method involved in this crucial semisupervised framework, in which all labels are inferred from a constructed bipartite graph with exactly k connected components. The proposed method remarkably accelerates the general semisupervised learning through the anchor and consists of four significant parts: 1) obtaining the anchor set as interim through BKHK algorithm; 2) constructing the bipartite graph; 3) solving the self-supervised problem to construct a typical probability model with FSSF; and 4) selecting the most representative points regarding anchors from BKHK as an interim and conducting label propagation. The experimental results on toy examples and benchmark data sets have demonstrated that the proposed method outperforms other approaches.
Collapse
|
8
|
Yang H, Liu Q, Zhang J, Ding X, Chen C, Wang L. Community Detection in Semantic Networks: A Multi-View Approach. ENTROPY (BASEL, SWITZERLAND) 2022; 24:1141. [PMID: 36010804 PMCID: PMC9407108 DOI: 10.3390/e24081141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 08/10/2022] [Accepted: 08/15/2022] [Indexed: 06/15/2023]
Abstract
The semantic social network is a complex system composed of nodes, links, and documents. Traditional semantic social network community detection algorithms only analyze network data from a single view, and there is no effective representation of semantic features at diverse levels of granularity. This paper proposes a multi-view integration method for community detection in semantic social network. We develop a data feature matrix based on node similarity and extract semantic features from the views of word frequency, keyword, and topic, respectively. To maximize the mutual information of each view, we use the robustness of L21-norm and F-norm to construct an adaptive loss function. On this foundation, we construct an optimization expression to generate the unified graph matrix and output the community structure with multiple views. Experiments on real social networks and benchmark datasets reveal that in semantic information analysis, multi-view is considerably better than single-view, and the performance of multi-view community detection outperforms traditional methods and multi-view clustering algorithms.
Collapse
Affiliation(s)
- Hailu Yang
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150001, China
| | - Qian Liu
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150001, China
| | - Jin Zhang
- School of Automatic Control Engineering, Harbin Institute of Petroleum, Harbin 150028, China
| | - Xiaoyu Ding
- School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
| | - Chen Chen
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150001, China
| | - Lili Wang
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150001, China
| |
Collapse
|
9
|
Zhao X, Nie F, Wang R, Li X. Improving projected fuzzy K-means clustering via robust learning. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.03.043] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
10
|
de Bodt C, Mulders D, Verleysen M, Lee JA. Fast Multiscale Neighbor Embedding. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:1546-1560. [PMID: 33361004 DOI: 10.1109/tnnls.2020.3042807] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Dimension reduction (DR) computes faithful low-dimensional (LD) representations of high-dimensional (HD) data. Outstanding performances are achieved by recent neighbor embedding (NE) algorithms such as t -SNE, which mitigate the curse of dimensionality. The single-scale or multiscale nature of NE schemes drives the HD neighborhood preservation in the LD space (LDS). While single-scale methods focus on single-sized neighborhoods through the concept of perplexity, multiscale ones preserve neighborhoods in a broader range of sizes and account for the global HD organization to define the LDS. For both single-scale and multiscale methods, however, their time complexity in the number of samples is unaffordable for big data sets. Single-scale methods can be accelerated by relying on the inherent sparsity of the HD similarities they involve. On the other hand, the dense structure of the multiscale HD similarities prevents developing fast multiscale schemes in a similar way. This article addresses this difficulty by designing randomized accelerations of the multiscale methods. To account for all levels of interactions, the HD data are first subsampled at different scales, enabling to identify small and relevant neighbor sets for each data point thanks to vantage-point trees. Afterward, these sets are employed with a Barnes-Hut algorithm to cheaply evaluate the considered cost function and its gradient, enabling large-scale use of multiscale NE schemes. Extensive experiments demonstrate that the proposed accelerations are, statistically significantly, both faster than the original multiscale methods by orders of magnitude, and better preserving the HD neighborhoods than state-of-the-art single-scale schemes, leading to high-quality LD embeddings. Public codes are freely available at https://github.com/cdebodt.
Collapse
|
11
|
Yuan Y, Wang C. Bipartite graph based spectral rotation with fuzzy anchors. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.11.055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
12
|
|
13
|
|
14
|
Jia Y, Wu W, Wang R, Hou J, Kwong S. Joint Optimization for Pairwise Constraint Propagation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:3168-3180. [PMID: 32745010 DOI: 10.1109/tnnls.2020.3009953] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Constrained spectral clustering (SC) based on pairwise constraint propagation has attracted much attention due to the good performance. All the existing methods could be generally cast as the following two steps, i.e., a small number of pairwise constraints are first propagated to the whole data under the guidance of a predefined affinity matrix, and the affinity matrix is then refined in accordance with the resulting propagation and finally adopted for SC. Such a stepwise manner, however, overlooks the fact that the two steps indeed depend on each other, i.e., the two steps form a "chicken-egg" problem, leading to suboptimal performance. To this end, we propose a joint PCP model for constrained SC by simultaneously learning a propagation matrix and an affinity matrix. Especially, it is formulated as a bounded symmetric graph regularized low-rank matrix completion problem. We also show that the optimized affinity matrix by our model exhibits an ideal appearance under some conditions. Extensive experimental results in terms of constrained SC, semisupervised classification, and propagation behavior validate the superior performance of our model compared with state-of-the-art methods.
Collapse
|
15
|
Liu B, Zhang T, Li Y, Liu Z, Zhang Z. Kernel Probabilistic K-Means Clustering. SENSORS (BASEL, SWITZERLAND) 2021; 21:1892. [PMID: 33800353 PMCID: PMC7962817 DOI: 10.3390/s21051892] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 03/04/2021] [Accepted: 03/05/2021] [Indexed: 11/22/2022]
Abstract
Kernel fuzzy c-means (KFCM) is a significantly improved version of fuzzy c-means (FCM) for processing linearly inseparable datasets. However, for fuzzification parameter m=1, the problem of KFCM (kernel fuzzy c-means) cannot be solved by Lagrangian optimization. To solve this problem, an equivalent model, called kernel probabilistic k-means (KPKM), is proposed here. The novel model relates KFCM to kernel k-means (KKM) in a unified mathematic framework. Moreover, the proposed KPKM can be addressed by the active gradient projection (AGP) method, which is a nonlinear programming technique with constraints of linear equalities and linear inequalities. To accelerate the AGP method, a fast AGP (FAGP) algorithm was designed. The proposed FAGP uses a maximum-step strategy to estimate the step length, and uses an iterative method to update the projection matrix. Experiments demonstrated the effectiveness of the proposed method through a performance comparison of KPKM with KFCM, KKM, FCM and k-means. Experiments showed that the proposed KPKM is able to find nonlinearly separable structures in synthetic datasets. Ten real UCI datasets were used in this study, and KPKM had better clustering performance on at least six datsets. The proposed fast AGP requires less running time than the original AGP, and it reduced running time by 76-95% on real datasets.
Collapse
Affiliation(s)
- Bowen Liu
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; (B.L.); (T.Z.); (Z.L.); (Z.Z.)
| | - Ting Zhang
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; (B.L.); (T.Z.); (Z.L.); (Z.Z.)
| | - Yujian Li
- School of Artificial Intelligence, Guilin University of Electronic Technology, Guilin 541004, China
| | - Zhaoying Liu
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; (B.L.); (T.Z.); (Z.L.); (Z.Z.)
| | - Zhilin Zhang
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; (B.L.); (T.Z.); (Z.L.); (Z.Z.)
| |
Collapse
|
16
|
Peng Y, Zhang Y, Qin F, Kong W. Joint non-negative and fuzzy coding with graph regularization for efficient data clustering. EGYPTIAN INFORMATICS JOURNAL 2021. [DOI: 10.1016/j.eij.2020.05.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
17
|
Liu Y, Zhang R, Nie F, Li X, Ding C. Supervised Dimensionality Reduction Methods via Recursive Regression. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:3269-3279. [PMID: 31603803 DOI: 10.1109/tnnls.2019.2940088] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In this article, the recursive problems of both orthogonal linear discriminant analysis (OLDA) and orthogonal least squares regression (OLSR) are investigated. Different from other works, the associated recursive problems are addressed via a novel recursive regression method, which achieves the dimensionality reduction in the orthogonal complement space heuristically. As for the OLDA, an efficient method is developed to obtain the associated optimal subspace, which is closely related to the orthonormal basis of the optimal solution to the ridge regression. As for the OLSR, the scalable subspace is introduced to build up an original OLSR with optimal scaling (OS). Through further relaxing the proposed problem into a convex parameterized orthogonal quadratic problem, an effective approach is derived, such that not only the optimal subspace can be achieved but also the OS could be obtained automatically. Accordingly, two supervised dimensionality reduction methods are proposed via obtaining the heuristic solutions to the recursive problems of the OLDA and the OLSR.
Collapse
|
18
|
Wu T, Zhou Y, Xiao Y, Needell D, Nie F. Modified fuzzy clustering with segregated cluster centroids. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.07.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|