1
|
Jing W, Lu L, Ou W. Semi-supervised non-negative matrix factorization with structure preserving for image clustering. Neural Netw 2025; 187:107340. [PMID: 40101552 DOI: 10.1016/j.neunet.2025.107340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2024] [Revised: 12/12/2024] [Accepted: 02/28/2025] [Indexed: 03/20/2025]
Abstract
Semi-supervised learning methods have wide applications thanks to the reasonable utilization for a part of label information of data. In recent years, non-negative matrix factorization (NMF) has received considerable attention because of its interpretability and practicality. Based on the advantages of semi-supervised learning and NMF, many semi-supervised NMF methods have been presented. However, these existing semi-supervised NMF methods construct a label matrix only containing elements 1 and 0 to represent the labeled data and further construct a label regularization, which neglects an intrinsic structure of NMF. To address the deficiency, in this paper, we propose a novel semi-supervised NMF method with structure preserving. Specifically, we first construct a new label matrix with weights and further construct a label constraint regularizer to both utilize the label information and maintain the intrinsic structure of NMF. Then, based on the label constraint regularizer, the basis images of labeled data are extracted for monitoring and modifying the basis images learning of all data by establishing a basis regularizer. Finally, incorporating the label constraint regularizer and the basis regularizer into NMF, we propose a new semi-supervised NMF method. To solve the optimization problem, a multiplicative updating algorithm is developed. The proposed method is applied to image clustering to test its performance. Experimental results on eight data sets demonstrate the effectiveness of the proposed method in contrast with state-of-the-art unsupervised and semi-supervised algorithms.
Collapse
Affiliation(s)
- Wenjing Jing
- School of Mathematical Sciences, Guizhou Normal University, Guiyang, 550025, People's Republic of China.
| | - Linzhang Lu
- School of Mathematical Sciences, Guizhou Normal University, Guiyang, 550025, People's Republic of China; School of Mathematical Sciences, Xiamen University, Xiamen, 361005, People's Republic of China.
| | - Weihua Ou
- School of Big Data and Computer Science, Guizhou Normal University, Guiyang, 550025, People's Republic of China.
| |
Collapse
|
2
|
Ma C, Zhang Y, Su CY. Graph-Based Multicentroid Nonnegative Matrix Factorization. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:1133-1144. [PMID: 38015683 DOI: 10.1109/tnnls.2023.3332360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2023]
Abstract
Nonnegative matrix factorization (NMF) is a widely recognized approach for data representation. When it comes to clustering, NMF fails to handle data points located in complex geometries, as each sample cluster is represented by a centroid. In this article, a novel multicentroid-based clustering method called graph-based multicentroid NMF (MCNMF) is proposed. Because the method constructs the neighborhood connection graph between data points and centroids, each data point is represented by adjacent centroids, which preserves the local geometric structure. Second, because the method constructs an undirected connected graph with centroids as nodes, in which the centroids are divided into different centroid clusters, a novel data clustering method based on MCNMF is proposed. In addition, the membership index matrix is reconstructed based on the obtained centroid clusters, which solves the problem of membership identification of the final sample. Extensive experiments conducted on synthetic datasets and real benchmark datasets illustrate the effectiveness of the proposed MCNMF method. Compared with single-centroid-based methods, the MCNMF can obtain the best experimental results.
Collapse
|
3
|
Xu X, He P. Manifold Peaks Nonnegative Matrix Factorization. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:6850-6862. [PMID: 36279340 DOI: 10.1109/tnnls.2022.3212922] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Nonnegative matrix factorization (NMF) has attracted increasing interest for its high interpretability in recent years. It is shown that the NMF is closely related to fuzzy k -means clustering, where the basis matrix represents the cluster centroids. However, most of the existing NMF-based clustering algorithms often have their decomposed centroids deviate away from the data manifold, which potentially undermines the clustering results, especially when the datasets lie on complicated geometric structures. In this article, we present a manifold peaks NMF (MPNMF) for data clustering. The proposed approach has the following advantages: 1) it selects a number of MPs to characterize the backbone of the data manifold; 2) it enforces the centroids to lie on the original data manifold, by restricting each centroid to be a conic combination of a small number of nearby MPs; 3) it generalizes the graph smoothness regularization to guide the local graph construction; and 4) it solves a general problem of quadratic regularized nonnegative least squares (NNLSs) with group l0 -norm constraint and further develops an efficient optimization algorithm to solve the objective function of the MPNMF. Extensive experiments on both synthetic and real-world datasets demonstrate the effectiveness of the proposed approach.
Collapse
|
4
|
Lu G, Leng C, Li B, Jiao L, Basu A. Robust dual-graph discriminative NMF for data classification. Knowl Based Syst 2023. [DOI: 10.1016/j.knosys.2023.110465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2023]
|
5
|
Shu Z, Long Q, Zhang L, Yu Z, Wu XJ. Robust Graph Regularized NMF with Dissimilarity and Similarity Constraints for ScRNA-seq Data Clustering. J Chem Inf Model 2022; 62:6271-6286. [PMID: 36459053 DOI: 10.1021/acs.jcim.2c01305] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
The notable progress in single-cell RNA sequencing (ScRNA-seq) technology is beneficial to accurately discover the heterogeneity and diversity of cells. Clustering is an extremely important step during the ScRNA-seq data analysis. However, it cannot achieve satisfactory performances by directly clustering ScRNA-seq data due to its high dimensionality and noise. To address these issues, we propose a novel ScRNA-seq data representation model, termed Robust Graph regularized Non-Negative Matrix Factorization with Dissimilarity and Similarity constraints (RGNMF-DS), for ScRNA-seq data clustering. To accurately characterize the structure information of the labeled samples and the unlabeled samples, respectively, the proposed RGNMF-DS model adopts a couple of complementary regularizers (i.e., similarity and dissimilar regularizers) to guide matrix decomposition. In addition, we construct a graph regularizer to discover the local geometric structure hidden in ScRNA-seq data. Moreover, we adopt the l2,1-norm to measure the reconstruction error and thereby effectively improve the robustness of the proposed RGNMF-DS model to the noises. Experimental results on several ScRNA-seq datasets have demonstrated that our proposed RGNMF-DS model outperforms other state-of-the-art competitors in clustering.
Collapse
Affiliation(s)
- Zhenqiu Shu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650093, China
| | - Qinghan Long
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650093, China
| | - Luping Zhang
- Library of Kunming Medical University, Kunming 650031, China
| | - Zhengtao Yu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650093, China
| | - Xiao-Jun Wu
- Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Jiangnan University, Wuxi 214122, China
| |
Collapse
|
6
|
Wu W, Chen Y, Wang R, Ou-Yang L. Self-representative kernel concept factorization. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.110051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
7
|
Jia Y, Liu H, Hou J, Kwong S, Zhang Q. Semisupervised Affinity Matrix Learning via Dual-Channel Information Recovery. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:7919-7930. [PMID: 33417578 DOI: 10.1109/tcyb.2020.3041493] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
This article explores the problem of semisupervised affinity matrix learning, that is, learning an affinity matrix of data samples under the supervision of a small number of pairwise constraints (PCs). By observing that both the matrix encoding PCs, called pairwise constraint matrix (PCM) and the empirically constructed affinity matrix (EAM), express the similarity between samples, we assume that both of them are generated from a latent affinity matrix (LAM) that can depict the ideal pairwise relation between samples. Specifically, the PCM can be thought of as a partial observation of the LAM, while the EAM is a fully observed one but corrupted with noise/outliers. To this end, we innovatively cast the semisupervised affinity matrix learning as the recovery of the LAM guided by the PCM and EAM, which is technically formulated as a convex optimization problem. We also provide an efficient algorithm for solving the resulting model numerically. Extensive experiments on benchmark datasets demonstrate the significant superiority of our method over state-of-the-art ones when used for constrained clustering and dimensionality reduction. The code is publicly available at https://github.com/jyh-learning/LAM.
Collapse
|
8
|
Li N, Leng C, Cheng I, Basu A, Jiao L. Dual-Graph Global and Local Concept Factorization for Data Clustering. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; PP:803-816. [PMID: 35653444 DOI: 10.1109/tnnls.2022.3177433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Considering a wide range of applications of nonnegative matrix factorization (NMF), many NMF and their variants have been developed. Since previous NMF methods cannot fully describe complex inner global and local manifold structures of the data space and extract complex structural information, we propose a novel NMF method called dual-graph global and local concept factorization (DGLCF). To properly describe the inner manifold structure, DGLCF introduces the global and local structures of the data manifold and the geometric structure of the feature manifold into CF. The global manifold structure makes the model more discriminative, while the two local regularization terms simultaneously preserve the inherent geometry of data and features. Finally, we analyze convergence and the iterative update rules of DGLCF. We illustrate clustering performance by comparing it with latest algorithms on four real-world datasets.
Collapse
|
9
|
Guided Semi-Supervised Non-Negative Matrix Factorization. ALGORITHMS 2022. [DOI: 10.3390/a15050136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Classification and topic modeling are popular techniques in machine learning that extract information from large-scale datasets. By incorporating a priori information such as labels or important features, methods have been developed to perform classification and topic modeling tasks; however, most methods that can perform both do not allow for guidance of the topics or features. In this paper, we propose a novel method, namely Guided Semi-Supervised Non-negative Matrix Factorization (GSSNMF), that performs both classification and topic modeling by incorporating supervision from both pre-assigned document class labels and user-designed seed words. We test the performance of this method on legal documents provided by the California Innocence Project and the 20 Newsgroups dataset. Our results show that the proposed method improves both classification accuracy and topic coherence in comparison to past methods such as Semi-Supervised Non-negative Matrix Factorization (SSNMF), Guided Non-negative Matrix Factorization (Guided NMF), and Topic Supervised NMF.
Collapse
|
10
|
Ran R, Feng J, Zhang S, Fang B. A General Matrix Function Dimensionality Reduction Framework and Extension for Manifold Learning. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:2137-2148. [PMID: 32697725 DOI: 10.1109/tcyb.2020.3003620] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Many dimensionality reduction methods in the manifold learning field have the so-called small-sample-size (SSS) problem. Starting from solving the SSS problem, we first summarize the existing dimensionality reduction methods and construct a unified criterion function of these methods. Then, combining the unified criterion with the matrix function, we propose a general matrix function dimensionality reduction framework. This framework is configurable, that is, one can select suitable functions to construct such a matrix transformation framework, and then a series of new dimensionality reduction methods can be derived from this framework. In this article, we discuss how to choose suitable functions from two aspects: 1) solving the SSS problem and 2) improving pattern classification ability. As an extension, with the inverse hyperbolic tangent function and linear function, we propose a new matrix function dimensionality reduction framework. Compared with the existing methods to solve the SSS problem, these new methods can obtain better pattern classification ability and have less computational complexity. The experimental results on handwritten digit, letters databases, and two face databases show the superiority of the new methods.
Collapse
|
11
|
|
12
|
Jia Y, Liu H, Hou J, Kwong S. Semisupervised Adaptive Symmetric Non-Negative Matrix Factorization. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:2550-2562. [PMID: 32112689 DOI: 10.1109/tcyb.2020.2969684] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
As a variant of non-negative matrix factorization (NMF), symmetric NMF (SymNMF) can generate the clustering result without additional post-processing, by decomposing a similarity matrix into the product of a clustering indicator matrix and its transpose. However, the similarity matrix in the traditional SymNMF methods is usually predefined, resulting in limited clustering performance. Considering that the quality of the similarity graph is crucial to the final clustering performance, we propose a new semisupervised model, which is able to simultaneously learn the similarity matrix with supervisory information and generate the clustering results, such that the mutual enhancement effect of the two tasks can produce better clustering performance. Our model fully utilizes the supervisory information in the form of pairwise constraints to propagate it for obtaining an informative similarity matrix. The proposed model is finally formulated as a non-negativity-constrained optimization problem. Also, we propose an iterative method to solve it with the convergence theoretically proven. Extensive experiments validate the superiority of the proposed model when compared with nine state-of-the-art NMF models.
Collapse
|
13
|
Devarajan K. A statistical framework for non-negative matrix factorization based on generalized dual divergence. Neural Netw 2021; 140:309-324. [PMID: 33892302 DOI: 10.1016/j.neunet.2021.03.020] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Revised: 01/11/2021] [Accepted: 03/12/2021] [Indexed: 11/30/2022]
Abstract
A statistical framework for non-negative matrix factorization based on generalized dual Kullback-Leibler divergence, which includes members of the exponential family of models, is proposed. A family of algorithms is developed using this framework, including under sparsity constraints, and its convergence proven using the Expectation-Maximization algorithm. The framework generalizes some existing methods for different noise structures and contrasts with the recently developed quasi-likelihood approach, thus providing a useful alternative for non-negative matrix factorization. A measure to evaluate the goodness-of-fit of the resulting factorization is described. The performance of the proposed methods is evaluated extensively using real life and simulated data and their utility in unsupervised and semi-supervised learning is illustrated using an application in cancer genomics. This framework can be viewed from the perspective of reinforcement learning, and can be adapted to incorporate discriminant functions and multi-layered neural networks within a deep learning paradigm.
Collapse
Affiliation(s)
- Karthik Devarajan
- Department of Biostatistics & Bioinformatics, Fox Chase Cancer Center, Temple University Health System, Philadelphia, PA 19111, United States of America.
| |
Collapse
|
14
|
Jia Y, Liu H, Hou J, Kwong S. Pairwise Constraint Propagation With Dual Adversarial Manifold Regularization. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:5575-5587. [PMID: 32092017 DOI: 10.1109/tnnls.2020.2970195] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Pairwise constraints (PCs) composed of must-links (MLs) and cannot-links (CLs) are widely used in many semisupervised tasks. Due to the limited number of PCs, pairwise constraint propagation (PCP) has been proposed to augment them. However, the existing PCP algorithms only adopt a single matrix to contain all the information, which overlooks the differences between the two types of links such that the discriminability of the propagated PCs is compromised. To this end, this article proposes a novel PCP model via dual adversarial manifold regularization to fully explore the potential of the limited initial PCs. Specifically, we propagate MLs and CLs with two separated variables, called similarity and dissimilarity matrices, under the guidance of the graph structure constructed from data samples. At the same time, the adversarial relationship between the two matrices is taken into consideration. The proposed model is formulated as a nonnegative constrained minimization problem, which can be efficiently solved with convergence theoretically guaranteed. We conduct extensive experiments to evaluate the proposed model, including propagation effectiveness and applications on constrained clustering and metric learning, all of which validate the superior performance of our model to state-of-the-art PCP models.
Collapse
|
15
|
|
16
|
Jia Y, Kwong S, Hou J, Wu W. Semi-Supervised Non-Negative Matrix Factorization With Dissimilarity and Similarity Regularization. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:2510-2521. [PMID: 31484134 DOI: 10.1109/tnnls.2019.2933223] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In this article, we propose a semi-supervised non-negative matrix factorization (NMF) model by means of elegantly modeling the label information. The proposed model is capable of generating discriminable low-dimensional representations to improve clustering performance. Specifically, a pair of complementary regularizers, i.e., similarity and dissimilarity regularizers, is incorporated into the conventional NMF to guide the factorization. And, they impose restrictions on both the similarity and dissimilarity of the low-dimensional representations of data samples with labels as well as a small number of unlabeled ones. The proposed model is formulated as a well-posed constrained optimization problem and further solved with an efficient alternating iterative algorithm. Moreover, we theoretically prove that the proposed algorithm can converge to a limiting point that meets the Karush-Kuhn-Tucker conditions. Extensive experiments as well as comprehensive analysis demonstrate that the proposed model outperforms the state-of-the-art NMF methods to a large extent over five benchmark data sets, i.e., the clustering accuracy increases to 82.2% from 57.0%.
Collapse
|