1
|
Zhou J, Huang C, Gao C, Wang Y, Pedrycz W, Yuan G. Reweighted Subspace Clustering Guided by Local and Global Structure Preservation. IEEE TRANSACTIONS ON CYBERNETICS 2025; 55:1436-1449. [PMID: 40031165 DOI: 10.1109/tcyb.2025.3526176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Subspace clustering has attracted significant interest for its capacity to partition high-dimensional data into multiple subspaces. The current approaches to subspace clustering predominantly revolve around two key aspects: 1) the construction of an effective similarity matrix and 2) the pursuit of sparsity within the projection matrix. However, assessing whether the dimensionality of the projected subspace is the true dimensionality of the data is challenging. Therefore, the clustering performance may decrease when dealing with scenarios such as subspace overlap, insufficient projected dimensions, data noise, etc., since the defined dimensionality of the projected lower-dimensional space may deviate significantly from its true value. In this research, we introduce a novel reweighting strategy, which is applied to projected coordinates for the first time and propose a reweighted subspace clustering model guided by the preservation of the both local and global structural characteristics (RWSC). The projected subspaces are reweighted to augment or suppress the importance of different coordinates, so that data with overlapping subspaces can be better distinguished and the redundant coordinates produced by the predefined number of projected dimensions can be further removed. By introducing reweighting strategies, the bias caused by imprecise dimensionalities in subspace clustering can be alleviated. Moreover, global scatter structure preservation and adaptive local structure learning are integrated into the proposed model, which helps RWSC capture more intrinsic structures and its robustness and applicability can then be improved. Through rigorous experiments on both synthetic and real-world datasets, the effectiveness and superiority of RWSC are empirically verified.
Collapse
|
2
|
Xue J, Nie F, Liu C, Wang R, Li X. Co-Clustering by Directly Solving Bipartite Spectral Graph Partitioning. IEEE TRANSACTIONS ON CYBERNETICS 2024; 54:7590-7601. [PMID: 39255088 DOI: 10.1109/tcyb.2024.3451292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2024]
Abstract
Bipartite spectral graph partitioning (BSGP) method as a co-clustering method, has been widely used in document clustering, which simultaneously clusters documents and words by making full use of the duality between documents and words. It consists of two steps: 1) graph construction and 2) singular value decomposition on the bipartite graph to compute a continuous cluster assignment matrix, followed by post-processing to get the discrete solution. However, the generated graph is unstructured and fixed. It heavily relies on the quality of the graph construction. Moreover, the two-stage process may deviate from directly solving the primal problem. In order to tackle these defects, a novel bipartite graph partitioning method is proposed to learn a bipartite graph with exactly c connected components (c is the number of clusters), which can obtain clustering results directly. Furthermore, it is experimentally and theoretically proved that the solution of the proposed model is the discrete solution of the primal BSGP problem for a special situation. Experimental results on synthetic and real-world datasets exhibit the superiority of the proposed method.
Collapse
|
3
|
Wang B, Chen M, Li X. Robust Subcluster Search and Mergence Clustering. IEEE TRANSACTIONS ON CYBERNETICS 2024; 54:7616-7628. [PMID: 39231063 DOI: 10.1109/tcyb.2024.3446764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/06/2024]
Abstract
In recent years, graph-based clustering presents outstanding performance and has been widely investigated. It segments the data similarity graph into multiple subgraphs as final clusters. Many methods integrate graph learning and segmentation into a unified optimization problem to explore the graph structure. However, existing research 1) attempts to derive the final clusters from the learned graph directly, which relies on a highly tight internal distribution within each cluster, and is too strict for the real-world data; 2) generally constructs a holistic full sample graph, which means the outliers are involved in graph learning explicitly, and may corrupt the graph quality. To overcome the above limitations, a new clustering model called robust subcluster search and mergence (RSSM) is established in this article. Inspired by the positive-incentive noise (Pi-Noise), RSSM assumes that the outliers are useful for learning the data structure. Considering a few samples with large errors as outliers, RSSM finds the subcentroids by searching an imbalanced residue distribution. In this way, the subcentroids pull the normal samples together and push the outliers far away. Compared with the traditional clusters, the subclusters indicated by the subcentroids are more explicit, where the normal samples are tightly connected. After that, a subcluster similarity graph is constructed to guide the mergence of subclusters. To sum up, RSSM performs the search and mergence of subclusters simultaneously with the help of outliers, and generates a graph that is more suitable for clustering. Experiments on several datasets demonstrate the rationality and superiority of RSSM.
Collapse
|
4
|
Lu J, Nie F, Wang R, Li X. Fast Multiview Clustering by Optimal Graph Mining. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:13071-13077. [PMID: 37030843 DOI: 10.1109/tnnls.2023.3256066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Multiview clustering (MVC) aims to exploit heterogeneous information from different sources and was extensively investigated in the past decade. However, far less attention has been paid to handling large-scale multiview data. In this brief, we fill this gap and propose a fast multiview clustering by an optimal graph mining model to handle large-scale data. We mine a consistent clustering structure from landmark-based graphs of different views, from which the optimal graph based on the one-hot encoding of cluster labels is recovered. Our model is parameter-free, so intractable hyperparameter tuning is avoided. An efficient algorithm of linear complexity to the number of samples is developed to solve the optimization problems. Extensive experiments on real-world datasets of various scales demonstrate the superiority of our proposal.
Collapse
|
5
|
Chen Y, Zhou S. Revisiting Possibilistic Fuzzy C-Means Clustering Using the Majorization-Minimization Method. ENTROPY (BASEL, SWITZERLAND) 2024; 26:670. [PMID: 39202140 PMCID: PMC11353294 DOI: 10.3390/e26080670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Revised: 08/03/2024] [Accepted: 08/04/2024] [Indexed: 09/03/2024]
Abstract
Possibilistic fuzzy c-means (PFCM) clustering is a kind of hybrid clustering method based on fuzzy c-means (FCM) and possibilistic c-means (PCM), which not only has the stability of FCM but also partly inherits the robustness of PCM. However, as an extension of FCM on the objective function, PFCM tends to find a suboptimal local minimum, which affects its performance. In this paper, we rederive PFCM using the majorization-minimization (MM) method, which is a new derivation approach not seen in other studies. In addition, we propose an effective optimization method to solve the above problem, called MMPFCM. Firstly, by eliminating the variable V∈Rp×c, the original optimization problem is transformed into a simplified model with fewer variables but a proportional term. Therefore, we introduce a new intermediate variable s∈Rc to convert the model with the proportional term into an easily solvable equivalent form. Subsequently, we design an iterative sub-problem using the MM method. The complexity analysis indicates that MMPFCM and PFCM share the same computational complexity. However, MMPFCM requires less memory per iteration. Extensive experiments, including objective function value comparison and clustering performance comparison, demonstrate that MMPFCM converges to a better local minimum compared to PFCM.
Collapse
Affiliation(s)
| | - Shuisheng Zhou
- School of Mathematics and Statistics, Xidian University, Xi’an 710071, China;
| |
Collapse
|
6
|
Nie F, Xie F, Yu W, Li X. Parameter-Insensitive Min Cut Clustering With Flexible Size Constrains. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:5479-5492. [PMID: 38376965 DOI: 10.1109/tpami.2024.3367912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/22/2024]
Abstract
Clustering is a fundamental topic in machine learning and various methods are proposed, in which K-Means (KM) and min cut clustering are typical ones. However, they may produce empty or skewed clustering results, which are not as expected. In KM, the constrained clustering methods have been fully studied while in min cut clustering, it still needs to be developed. In this paper, we propose a parameter-insensitive min cut clustering with flexible size constraints. Specifically, we add lower limitations on the number of samples for each cluster, which can perfectly avoid the trivial solution in min cut clustering. As far as we are concerned, this is the first attempt of directly incorporating size constraints into min cut. However, it is a NP-hard problem and difficult to solve. Thus, the upper limits is also added in but it is still difficult to solve. Therefore, an additional variable that is equivalent to label matrix is introduced in and the augmented Lagrangian multiplier (ALM) is used to decouple the constraints. In the experiments, we find that the our algorithm is less sensitive to lower bound and is practical in image segmentation. A large number of experiments demonstrate the effectiveness of our proposed algorithm.
Collapse
|
7
|
Wang Z, Wu D, Wang R, Nie F, Wang F. Joint Anchor Graph Embedding and Discrete Feature Scoring for Unsupervised Feature Selection. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:7974-7987. [PMID: 36417731 DOI: 10.1109/tnnls.2022.3222466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The success of existing unsupervised feature selection (UFS) methods heavily relies on the assumption that the intrinsic relationships among original high-dimensional (HD) data samples exist in the discriminative low-dimension (LD) subspace. However, previous UFS methods commonly construct pairwise graphs and employ l2,1 -norm regularization to severally preserve the local structure and calculate the score of features, which is computationally complex and easy to get stuck into local optimum, so that those approaches cannot be applied in dealing with large-scale datasets in practice. To overcome this challenge, we propose a novel UFS method, in which a novel anchor graph embedding paradigm is designed to extract the local neighborhood relationships among data samples by reducing the computational complexity of graph construction to be linear in the number of data. Moreover, to improve the optimality of selected features as well as the performance of downstream tasks, we propose a discrete feature scoring mechanism, which imposes orthogonal l2,0 -norm constraints on learned projections, in order to enhance the distinction of feature scores as well as reduce the probability of falling into local optimum. In addition, solving the proposed nonconvex and nonsmooth NP-hard problem is challenging, and we present an efficient optimization algorithm to address it and acquire a closed-form solution of the transformation matrix. Extensive experiments demonstrate the effectiveness and efficiency of the proposed UFS by comparison with several state-of-the-art approaches to clustering and image segmentation tasks.
Collapse
|
8
|
Zhang MZ, Sun Y, Chen YM, Guo F, Gao PY, Tan L, Tan MS. Associations of Multimorbidity with Cerebrospinal Fluid Biomarkers for Neurodegenerative Disorders in Early Parkinson's Disease: A Crosssectional and Longitudinal Study. Curr Alzheimer Res 2024; 21:201-213. [PMID: 39041277 DOI: 10.2174/0115672050314397240708060314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2024] [Revised: 05/06/2024] [Accepted: 05/09/2024] [Indexed: 07/24/2024]
Abstract
OBJECT The study aims to determine whether multimorbidity status is associated with cerebrospinal fluid (CSF) biomarkers for neurodegenerative disorders. METHODS A total of 827 patients were enrolled from the Parkinson's Progression Markers Initiative (PPMI) database, including 638 patients with early-stage Parkinson's disease (PD) and 189 healthy controls (HCs). Multimorbidity status was evaluated based on the count of long-term conditions (LTCs) and the multimorbidity pattern. Using linear regression models, cross-sectional and longitudinal analyses were conducted to assess the associations of multimorbidity status with CSF biomarkers for neurodegenerative disorders, including α-synuclein (αSyn), amyloid-β42 (Aβ42), total tau (t-tau), phosphorylated tau (p-tau), glial fibrillary acidic protein (GFAP), and neurofilament light chain protein (NfL). RESULTS At baseline, the CSF t-tau (p = 0.010), p-tau (p = 0.034), and NfL (p = 0.049) levels showed significant differences across the three categories of LTC counts. In the longitudinal analysis, the presence of LTCs was associated with lower Aβ42 (β < -0.001, p = 0.020), and higher t-tau (β = 0.007, p = 0.026), GFAP (β = 0.013, p = 0.022) and NfL (β = 0.020, p = 0.012); Participants with tumor/musculoskeletal/mental disorders showed higher CSF levels of t-tau (β = 0.016, p = 0.011) and p-tau (β = 0.032, p = 0.044) than those without multimorbidity. CONCLUSION Multimorbidity, especially severe multimorbidity and the pattern of mental/musculoskeletal/ tumor disorders, was associated with CSF biomarkers for neurodegenerative disorders in early-stage PD patients, suggesting that multimorbidity might play a crucial role in aggravating neuronal damage in neurodegenerative diseases.
Collapse
Affiliation(s)
- Ming-Zhan Zhang
- School of Clinical Medicine, Shandong Second Medical University (formerly Weifang Medical University), Weifang 261000, Shandong, China
| | - Yan Sun
- Department of Neurology, Qingdao Municipal Hospital, Qingdao University, Qingdao, China
| | - Yan-Ming Chen
- Department of Neurology, Qingdao Municipal Hospital, Qingdao University, Qingdao, China
| | - Fan Guo
- Department of Neurology, Qingdao Municipal Hospital, Qingdao University, Qingdao, China
| | - Pei-Yang Gao
- Department of Neurology, Qingdao Municipal Hospital, Qingdao University, Qingdao, China
| | - Lan Tan
- Department of Neurology, Qingdao Municipal Hospital, University of Health and Rehabilitation Sciences, Qingdao, China
| | - Meng-Shan Tan
- School of Clinical Medicine, Shandong Second Medical University (formerly Weifang Medical University), Weifang 261000, Shandong, China
- Department of Neurology, Qingdao Municipal Hospital, University of Health and Rehabilitation Sciences, Qingdao, China
| |
Collapse
|
9
|
Wang C, Li Y, Wang J, Dong K, Li C, Wang G, Lin X, Zhao H. Unsupervised cluster analysis of clinical and metabolite characteristics in patients with chronic complications of T2DM: an observational study of real data. Front Endocrinol (Lausanne) 2023; 14:1230921. [PMID: 37929026 PMCID: PMC10623421 DOI: 10.3389/fendo.2023.1230921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 09/26/2023] [Indexed: 11/07/2023] Open
Abstract
Introduction The aim of this study was to cluster patients with chronic complications of type 2 diabetes mellitus (T2DM) by cluster analysis in Dalian, China, and examine the variance in risk of different chronic complications and metabolic levels among the various subclusters. Methods 2267 hospitalized patients were included in the K-means cluster analysis based on 11 variables [Body Mass Index (BMI), Systolic Blood Pressure (SBP), Diastolic Blood Pressure (DBP), Glucose, Triglycerides (TG), Total Cholesterol (TC), Uric Acid (UA), microalbuminuria (mAlb), Insulin, Insulin Sensitivity Index (ISI) and Homa Insulin-Resistance (Homa-IR)]. The risk of various chronic complications of T2DM in different subclusters was analyzed by multivariate logistic regression, and the Kruskal-Wallis H test and the Nemenyi test examined the differences in metabolites among different subclusters. Results Four subclusters were identified by clustering analysis, and each subcluster had significant features and was labeled with a different level of risk. Cluster 1 contained 1112 inpatients (49.05%), labeled as "Low-Risk"; cluster 2 included 859 (37.89%) inpatients, the label characteristics as "Medium-Low-Risk"; cluster 3 included 134 (5.91%) inpatients, labeled "Medium-Risk"; cluster 4 included 162 (7.15%) inpatients, and the label feature was "High-Risk". Additionally, in different subclusters, the proportion of patients with multiple chronic complications was different, and the risk of the same chronic complication also had significant differences. Compared to the "Low-Risk" cluster, the other three clusters exhibit a higher risk of microangiopathy. After additional adjustment for 20 covariates, the odds ratios (ORs) and 95% confidence intervals (95%CI) of the "Medium-Low-Risk" cluster, the "Medium-Risk" cluster, and the"High-Risk" cluster are 1.369 (1.042, 1.799), 2.188 (1.496, 3.201), and 9.644 (5.851, 15.896) (all p<0.05). Representatively, the "High-Risk" cluster had the highest risk of DN [OR (95%CI): 11.510(7.139,18.557), (p<0.05)] and DR [OR (95%CI): 3.917(2.526,6.075), (p<0.05)] after 20 variables adjusted. Four metabolites with statistically significant distribution differences when compared with other subclusters [Threonine (Thr), Tyrosine (Tyr), Glutaryl carnitine (C5DC), and Butyryl carnitine (C4)]. Conclusion Patients with chronic complications of T2DM had significant clustering characteristics, and the risk of target organ damage in different subclusters was significantly different, as were the levels of metabolites. Which may become a new idea for the prevention and treatment of chronic complications of T2DM.
Collapse
Affiliation(s)
- Cuicui Wang
- Department of Health Examination Center, The Second Affiliated Hospital of Dalian Medical University, Dalian, China
- Department of Gastroenterology, The 986th Hospital of Xijing Hospital, Air Force Military Medical University, Xi’an, China
| | - Yan Li
- State Key Laboratory of Molecular Reaction Dynamics, Dalian Institute of Chemical Physics, Chinese Academy of Science, Dalian, China
| | - Jun Wang
- Department of Gastroenterology, The 986th Hospital of Xijing Hospital, Air Force Military Medical University, Xi’an, China
| | - Kunjie Dong
- School of Computer Science & Technology, Dalian University of Technology, Dalian, China
| | - Chenxiang Li
- School of Computer Science & Technology, Dalian University of Technology, Dalian, China
| | - Guiyan Wang
- School of Information Engineering, Dalian Ocean University, Dalian, China
| | - Xiaohui Lin
- School of Computer Science & Technology, Dalian University of Technology, Dalian, China
| | - Hui Zhao
- Department of Health Examination Center, The Second Affiliated Hospital of Dalian Medical University, Dalian, China
| |
Collapse
|
10
|
Pei S, Chen H, Nie F, Wang R, Li X. Centerless Clustering. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:167-181. [PMID: 35157578 DOI: 10.1109/tpami.2022.3150981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Although lots of clustering models have been proposed recently, k-means and the family of spectral clustering methods are both still drawing a lot of attention due to their simplicity and efficacy. We first reviewed the unified framework of k-means and graph cut models, and then proposed a clustering method called k-sums where a k-nearest neighbor ( k-NN) graph is adopted. The main idea of k-sums is to minimize directly the sum of the distances between points in the same cluster. To deal with the situation where the graph is unavailable, we proposed k-sums-x that takes features as input. The computational and memory overhead of k-sums are both O(nk), indicating that it can scale linearly w.r.t. the number of objects to group. Moreover, the costs of computational and memory are Irrelevant to the product of the number of points and clusters. The computational and memory complexity of k-sums-x are both linear w.r.t. the number of points. To validate the advantage of k-sums and k-sums-x on facial datasets, extensive experiments have been conducted on 10 synthetic datasets and 17 benchmark datasets. While having a low time complexity, the performance of k-sums is comparable with several state-of-the-art clustering methods.
Collapse
|
11
|
Chen Y, Zhou S, Zhang X, Li D, Fu C. Improved fuzzy c-means clustering by varying the fuzziness parameter. Pattern Recognit Lett 2022. [DOI: 10.1016/j.patrec.2022.03.017] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
12
|
|
13
|
Huang J, Huang B, Kong Y, Yang Y, Tian C, Chen L, Liao Y, Ma L. Polycystic ovary syndrome: Identification of novel and hub biomarkers in the autophagy-associated mRNA-miRNA-lncRNA network. Front Endocrinol (Lausanne) 2022; 13:1032064. [PMID: 36523600 PMCID: PMC9745174 DOI: 10.3389/fendo.2022.1032064] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 11/14/2022] [Indexed: 12/03/2022] Open
Abstract
INTRODUCTION Polycystic ovary syndrome (PCOS) is a common metabolic and endocrine disorder prevalent among women of reproductive age. Recent studies show that autophagy participated in the pathogenesis of PCOS, including anovulation, hyperandrogenism, and metabolic disturbances. This study was designed to screen autophagy-related genes (ATGs) that may play a pivotal role in PCOS, providing potential biomarkers and identifying new molecular subgroups for therapeutic intervention. METHODS Gene expression profiles of the PCOS and control samples were obtained from the publicly available Gene Expression Omnibus database. The gene lists of ATGs from databases were integrated. Then, the weighted gene co-expression network analysis was conducted to obtain functional modules and construct a multifactorial co-expression network. Gene Ontology and KEGG pathway enrichment analyses were performed for further exploration of ATG's function in the key modules. Differentially expressed ATGs were identified and validated in external datasets with the Limma R package. To provide guidance on PCOS phenotyping, the dysfunction module consists of a co-expression network mapped to PCOS patients. A PCOS-Autophagy-related co-expression network was established using Cytoscape, followed by identifying molecular subgroups using the Limma R package. ps. RNA-sequencing analysis was used to confirm the differential expression of hub ATGs, and the diagnostic value of hub ATGs was assessed by receiver operating characteristic curve analysis. RESULTS Three modules (Brown, Turquoise, and Green) in GSE8157, three modules (Blue, Red, and Green) in GSE43264, and four modules (Blue, Green, Black, and Yellow) in GSE106724 were identified to be PCOS-related by WGCNA analysis. 29 ATGs were found to be the hub genes that strongly correlated with PCOS. These hub ATGs were mainly enriched in autophagy-related functions and pathways such as autophagy, endocytosis, apoptosis, and mTOR signaling pathways. The mRNA-miRNA-lncRNA multifactorial network was successfully constructed. And three new molecular subgroups were identified via the K-means algorithm. DISCUSSION We provide a novel insight into the mechanisms behind autophagy in PCOS. BRCA1, LDLR, MAP1B, hsa-miR-92b-3p, hsa-miR-20b-5p, and NEAT1 might play a considerably important role in PCOS dysfunction. As a result, new potential biomarkers can be evaluated for use in PCOS diagnosis and treatment in the future.
Collapse
|