1
|
Chen B, Xu S, Xu H, Bian X, Guo N, Xu X, Hua X. Structure-aware deep clustering network based on contrastive learning. Neural Netw 2023; 167:118-128. [PMID: 37657251 DOI: 10.1016/j.neunet.2023.08.020] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 07/05/2023] [Accepted: 08/12/2023] [Indexed: 09/03/2023]
Abstract
Recently, deep clustering has been extensively employed for various data mining tasks, and it can be divided into auto-encoder (AE)-based and graph neural networks (GNN)-based methods. However, existing AE-based methods fall short in effectively extracting structural information, while GNN suffer from smoothing and heterophily. Although methods that combine AE and GNN achieve impressive performance, there remains an inadequate balance between preserving the raw structure and exploring the underlying structure. Accordingly, we propose a novel network named Structure-Aware Deep Clustering network (SADC). Firstly, we compute the cumulative influence of non-adjacent nodes at multiple depths and, thus, enhance the adjacency matrix. Secondly, an enhanced graph auto-encoder is designed. Thirdly, the latent space of AE is endowed with the ability to perceive the raw structure during the learning process. Besides, we design self-supervised mechanisms to achieve co-optimization of node representation learning and topology learning. A new loss function is designed to preserve the inherent structure while also allowing for exploration of latent data structure. Extensive experiments on six benchmark datasets validate that our method outperforms state-of-the-art methods.
Collapse
Affiliation(s)
- Bowei Chen
- School of Information Engineering, Yancheng Institute of Technology, Yancheng, 224051, China.
| | - Sen Xu
- School of Information Engineering, Yancheng Institute of Technology, Yancheng, 224051, China.
| | - Heyang Xu
- School of Information Engineering, Yancheng Institute of Technology, Yancheng, 224051, China
| | - Xuesheng Bian
- School of Information Engineering, Yancheng Institute of Technology, Yancheng, 224051, China
| | - Naixuan Guo
- School of Information Engineering, Yancheng Institute of Technology, Yancheng, 224051, China
| | - Xiufang Xu
- School of Information Engineering, Yancheng Institute of Technology, Yancheng, 224051, China
| | - Xiaopeng Hua
- School of Information Engineering, Yancheng Institute of Technology, Yancheng, 224051, China
| |
Collapse
|
2
|
Zhu S, Xu L, Goodman ED. Hierarchical Topology-Based Cluster Representation for Scalable Evolutionary Multiobjective Clustering. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:9846-9860. [PMID: 34106873 DOI: 10.1109/tcyb.2021.3081988] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Evolutionary multiobjective clustering (MOC) algorithms have shown promising potential to outperform conventional single-objective clustering algorithms, especially when the number of clusters k is not set before clustering. However, the computational burden becomes a tricky problem due to the extensive search space and fitness computational time of the evolving population, especially when the data size is large. This article proposes a new, hierarchical, topology-based cluster representation for scalable MOC, which can simplify the search procedure and decrease computational overhead. A coarse-to-fine-trained topological structure that fits the spatial distribution of the data is utilized to identify a set of seed points/nodes, then a tree-based graph is built to represent clusters. During optimization, a bipartite graph partitioning strategy incorporated with the graph nodes helps in performing a cluster ensemble operation to generate offspring solutions more effectively. For the determination of the final result, which is underexplored in the existing methods, the usage of a cluster ensemble strategy is also presented, whether k is provided or not. Comparison experiments are conducted on a series of different data distributions, revealing the superiority of the proposed algorithm in terms of both clustering performance and computing efficiency.
Collapse
|
3
|
Lu Y, Cheung YM, Tang YY. Self-Adaptive Multiprototype-Based Competitive Learning Approach: A k-Means-Type Algorithm for Imbalanced Data Clustering. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:1598-1612. [PMID: 31150353 DOI: 10.1109/tcyb.2019.2916196] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Class imbalance problem has been extensively studied in the recent years, but imbalanced data clustering in unsupervised environment, that is, the number of samples among clusters is imbalanced, has yet to be well studied. This paper, therefore, studies the imbalanced data clustering problem within the framework of k -means-type competitive learning. We introduce a new method called self-adaptive multiprototype-based competitive learning (SMCL) for imbalanced clusters. It uses multiple subclusters to represent each cluster with an automatic adjustment of the number of subclusters. Then, the subclusters are merged into the final clusters based on a novel separation measure. We also propose a new internal clustering validation measure to determine the number of final clusters during the merging process for imbalanced clusters. The advantages of SMCL are threefold: 1) it inherits the advantages of competitive learning and meanwhile is applicable to the imbalanced data clustering; 2) the self-adaptive multiprototype mechanism uses a proper number of subclusters to represent each cluster with any arbitrary shape; and 3) it automatically determines the number of clusters for imbalanced clusters. SMCL is compared with the existing counterparts for imbalanced clustering on the synthetic and real datasets. The experimental results show the efficacy of SMCL for imbalanced clusters.
Collapse
|
4
|
Liu Y, Hou T, Miao Y, Liu M, Liu F. IM-c-means: a new clustering algorithm for clusters with skewed distributions. Pattern Anal Appl 2020. [DOI: 10.1007/s10044-020-00932-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
5
|
Essa E, Jones JL, Xie X. Coupled s-excess HMM for vessel border tracking and segmentation. INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN BIOMEDICAL ENGINEERING 2019; 35:e3206. [PMID: 30968570 DOI: 10.1002/cnm.3206] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/04/2018] [Revised: 03/28/2019] [Accepted: 03/29/2019] [Indexed: 06/09/2023]
Abstract
In this paper, we present a novel image segmentation technique, based on hidden Markov model (HMM), which we then apply to simultaneously segment interior and exterior walls of fluorescent confocal images of lymphatic vessels. Our proposed method achieves this by tracking hidden states, which are used to indicate the locations of both the inner and outer wall borders throughout the sequence of images. We parameterize these vessel borders using radial basis functions (RBFs), thus enabling us to minimize the number of points we need to track as we progress through multiple layers and therefore reduce computational complexity. Information about each border is detected using patch-wise convolutional neural networks (CNN). We use the softmax function to infer the emission probability and use a proposed new training algorithm based on s-excess optimization to learn the transition probability. We also introduce a new optimization method to determine the optimum sequence of the hidden states. Thus, we transform the segmentation problem into one that minimizes an s-excess graph cut, where each hidden state is represented as a graph node and the weight of these nodes are defined by their emission probabilities. The transition probabilities are used to define relationships between neighboring nodes in the constructed graph. We compare our proposed method to the Viterbi and Baum-Welch algorithms. Both qualitative and quantitative analysis show superior performance of the proposed methods.
Collapse
Affiliation(s)
- Ehab Essa
- Department of Computer Science, Faculty of Computers and Information, Mansoura University, Mansoura, Egypt
- Department of Computer Science, Swansea University, Swansea, UK
| | | | - Xianghua Xie
- Department of Computer Science, Swansea University, Swansea, UK
| |
Collapse
|
6
|
|
7
|
Huang D, Wang CD, Lai JH. Locally Weighted Ensemble Clustering. IEEE TRANSACTIONS ON CYBERNETICS 2018; 48:1460-1473. [PMID: 28541232 DOI: 10.1109/tcyb.2017.2702343] [Citation(s) in RCA: 75] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Due to its ability to combine multiple base clusterings into a probably better and more robust clustering, the ensemble clustering technique has been attracting increasing attention in recent years. Despite the significant success, one limitation to most of the existing ensemble clustering methods is that they generally treat all base clusterings equally regardless of their reliability, which makes them vulnerable to low-quality base clusterings. Although some efforts have been made to (globally) evaluate and weight the base clusterings, yet these methods tend to view each base clustering as an individual and neglect the local diversity of clusters inside the same base clustering. It remains an open problem how to evaluate the reliability of clusters and exploit the local diversity in the ensemble to enhance the consensus performance, especially, in the case when there is no access to data features or specific assumptions on data distribution. To address this, in this paper, we propose a novel ensemble clustering approach based on ensemble-driven cluster uncertainty estimation and local weighting strategy. In particular, the uncertainty of each cluster is estimated by considering the cluster labels in the entire ensemble via an entropic criterion. A novel ensemble-driven cluster validity measure is introduced, and a locally weighted co-association matrix is presented to serve as a summary for the ensemble of diverse clusters. With the local diversity in ensembles exploited, two novel consensus functions are further proposed. Extensive experiments on a variety of real-world datasets demonstrate the superiority of the proposed approach over the state-of-the-art.
Collapse
|
8
|
Geng YA, Li Q, Zheng R, Zhuang F, He R, Xiong N. RECOME: A new density-based clustering algorithm using relative KNN kernel density. Inf Sci (N Y) 2018. [DOI: 10.1016/j.ins.2018.01.013] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
9
|
Chen C, Lin KY, Wang CD, Liu JB, Huang D. CCMS: A nonlinear clustering method based on crowd movement and selection. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2016.12.101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
10
|
Wu JS, Zheng WS, Lai JH. Approximate kernel competitive learning. Neural Netw 2014; 63:117-32. [PMID: 25528318 DOI: 10.1016/j.neunet.2014.11.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2014] [Revised: 09/23/2014] [Accepted: 11/14/2014] [Indexed: 11/25/2022]
Abstract
Kernel competitive learning has been successfully used to achieve robust clustering. However, kernel competitive learning (KCL) is not scalable for large scale data processing, because (1) it has to calculate and store the full kernel matrix that is too large to be calculated and kept in the memory and (2) it cannot be computed in parallel. In this paper we develop a framework of approximate kernel competitive learning for processing large scale dataset. The proposed framework consists of two parts. First, it derives an approximate kernel competitive learning (AKCL), which learns kernel competitive learning in a subspace via sampling. We provide solid theoretical analysis on why the proposed approximation modelling would work for kernel competitive learning, and furthermore, we show that the computational complexity of AKCL is largely reduced. Second, we propose a pseudo-parallelled approximate kernel competitive learning (PAKCL) based on a set-based kernel competitive learning strategy, which overcomes the obstacle of using parallel programming in kernel competitive learning and significantly accelerates the approximate kernel competitive learning for large scale clustering. The empirical evaluation on publicly available datasets shows that the proposed AKCL and PAKCL can perform comparably as KCL, with a large reduction on computational cost. Also, the proposed methods achieve more effective clustering performance in terms of clustering precision against related approximate clustering approaches.
Collapse
Affiliation(s)
- Jian-Sheng Wu
- School of Information Science and Technology, Sun Yat-sen University, Guangzhou 510006, China; SYSU-CMU Shunde International Joint Research Institute, Shunde, China.
| | - Wei-Shi Zheng
- School of Information Science and Technology, Sun Yat-sen University, Guangzhou 510006, China; Guangdong Province Key Laboratory of Computational Science, Guangzhou 510275, China.
| | - Jian-Huang Lai
- School of Information Science and Technology, Sun Yat-sen University, Guangzhou 510006, China; Guangdong Province Key Laboratory of Information Security, China.
| |
Collapse
|
11
|
Chen DW, Sheng JQ, Chen JJ, Wang CD. Stability-based preference selection in affinity propagation. Neural Comput Appl 2014. [DOI: 10.1007/s00521-014-1671-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
12
|
Tan J, Lai JH, Wang CD, Wang WX, Zuo XX. A new handwritten character segmentation method based on nonlinear clustering. Neurocomputing 2012. [DOI: 10.1016/j.neucom.2012.02.026] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|