1
|
Xu J, Wu J, Li T, Nan Y. Divergence-Based Locally Weighted Ensemble Clustering with Dictionary Learning and L2,1-Norm. Entropy (Basel) 2022; 24:1324. [PMID: 37420344 DOI: 10.3390/e24101324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 09/11/2022] [Accepted: 09/19/2022] [Indexed: 07/09/2023]
Abstract
Accurate clustering is a challenging task with unlabeled data. Ensemble clustering aims to combine sets of base clusterings to obtain a better and more stable clustering and has shown its ability to improve clustering accuracy. Dense representation ensemble clustering (DREC) and entropy-based locally weighted ensemble clustering (ELWEC) are two typical methods for ensemble clustering. However, DREC treats each microcluster equally and hence, ignores the differences between each microcluster, while ELWEC conducts clustering on clusters rather than microclusters and ignores the sample-cluster relationship. To address these issues, a divergence-based locally weighted ensemble clustering with dictionary learning (DLWECDL) is proposed in this paper. Specifically, the DLWECDL consists of four phases. First, the clusters from the base clustering are used to generate microclusters. Second, a Kullback-Leibler divergence-based ensemble-driven cluster index is used to measure the weight of each microcluster. With these weights, an ensemble clustering algorithm with dictionary learning and the L2,1-norm is employed in the third phase. Meanwhile, the objective function is resolved by optimizing four subproblems and a similarity matrix is learned. Finally, a normalized cut (Ncut) is used to partition the similarity matrix and the ensemble clustering results are obtained. In this study, the proposed DLWECDL was validated on 20 widely used datasets and compared to some other state-of-the-art ensemble clustering methods. The experimental results demonstrated that the proposed DLWECDL is a very promising method for ensemble clustering.
Collapse
Affiliation(s)
- Jiaxuan Xu
- School of Computing and Artificial Intelligence, Southwestern University of Finance and Economics, Chengdu 611130, China
| | - Jiang Wu
- School of Computing and Artificial Intelligence, Southwestern University of Finance and Economics, Chengdu 611130, China
| | - Taiyong Li
- School of Computing and Artificial Intelligence, Southwestern University of Finance and Economics, Chengdu 611130, China
| | - Yang Nan
- Department of Computer Science, Harbin Finance University, Harbin 150030, China
| |
Collapse
|
2
|
Zhang J, Liu L, Zhen L, Jing L. A unified robust framework for multi-view feature extraction with L2,1-norm constraint. Neural Netw 2020; 128:126-141. [PMID: 32446190 DOI: 10.1016/j.neunet.2020.04.024] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 03/28/2020] [Accepted: 04/27/2020] [Indexed: 11/19/2022]
Abstract
Multi-view feature extraction methods mainly focus on exploiting the consistency and complementary information between multi-view samples, and most of the current methods apply the F-norm or L2-norm as the metric, which are sensitive to the outliers or noises. In this paper, based on L2,1-norm, we propose a unified robust feature extraction framework, which includes four special multi-view feature extraction methods, and extends the state-of-art methods to a more generalized form. The proposed methods are less sensitive to outliers or noises. An efficient iterative algorithm is designed to solve L2,1-norm based methods. Comprehensive analyses, such as convergence analysis, rotational invariance analysis and relationship between our methods and previous F-norm based methods illustrate the effectiveness of our proposed methods. Experiments on two artificial datasets and six real datasets demonstrate that the proposed L2,1-norm based methods have better performance than the related methods.
Collapse
Affiliation(s)
- Jinxin Zhang
- College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China.
| | - Liming Liu
- School of Statistics, Capital University of Economics and Business, Beijing, 100070, China.
| | - Ling Zhen
- College of Science, China Agricultural University, Beijing, 100083, China.
| | - Ling Jing
- College of Science, China Agricultural University, Beijing, 100083, China.
| |
Collapse
|
3
|
Abstract
Background Predicting miRNA-disease associations (MDAs) is time-consuming and expensive. It is imminent to improve the accuracy of prediction results. So it is crucial to develop a novel computing technology to predict new MDAs. Although some existing methods can effectively predict novel MDAs, there are still some shortcomings. Especially when the disease matrix is processed, its sparsity is an important factor affecting the final results. Results A robust collaborative matrix factorization (RCMF) is proposed to predict novel MDAs. The L2,1-norm are introduced to our method to achieve the highest AUC value than other advanced methods. Conclusions 5-fold cross validation is used to evaluate our method, and simulation experiments are used to predict novel associations on Gold Standard Dataset. Finally, our prediction accuracy is better than other existing advanced methods. Therefore, our approach is effective and feasible in predicting novel MDAs.
Collapse
Affiliation(s)
- Zhen Cui
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826, China
| | - Jin-Xing Liu
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826, China. .,Co-Innovation Center for Information Supply & Assurance Technology, Anhui University, Hefei, 230601, China.
| | - Ying-Lian Gao
- Qufu Normal University Library, Qufu Normal University, Rizhao, 276826, China
| | - Chun-Hou Zheng
- Co-Innovation Center for Information Supply & Assurance Technology, Anhui University, Hefei, 230601, China
| | - Juan Wang
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826, China.
| |
Collapse
|
4
|
Yu N, Gao YL, Liu JX, Wang J, Shang J. Robust hypergraph regularized non-negative matrix factorization for sample clustering and feature selection in multi-view gene expression data. Hum Genomics 2019; 13:46. [PMID: 31639067 PMCID: PMC6805321 DOI: 10.1186/s40246-019-0222-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Background As one of the most popular data representation methods, non-negative matrix decomposition (NMF) has been widely concerned in the tasks of clustering and feature selection. However, most of the previously proposed NMF-based methods do not adequately explore the hidden geometrical structure in the data. At the same time, noise and outliers are inevitably present in the data. Results To alleviate these problems, we present a novel NMF framework named robust hypergraph regularized non-negative matrix factorization (RHNMF). In particular, the hypergraph Laplacian regularization is imposed to capture the geometric information of original data. Unlike graph Laplacian regularization which captures the relationship between pairwise sample points, it captures the high-order relationship among more sample points. Moreover, the robustness of the RHNMF is enhanced by using the L2,1-norm constraint when estimating the residual. This is because the L2,1-norm is insensitive to noise and outliers. Conclusions Clustering and common abnormal expression gene (com-abnormal expression gene) selection are conducted to test the validity of the RHNMF model. Extensive experimental results on multi-view datasets reveal that our proposed model outperforms other state-of-the-art methods.
Collapse
Affiliation(s)
- Na Yu
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826, China
| | - Ying-Lian Gao
- Library of Qufu Normal University, Qufu Normal University, Rizhao, 276826, China
| | - Jin-Xing Liu
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826, China.
| | - Juan Wang
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826, China.
| | - Junliang Shang
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826, China
| |
Collapse
|
5
|
Abstract
Background Predicting drug-target interactions is time-consuming and expensive. It is important to present the accuracy of the calculation method. There are many algorithms to predict global interactions, some of which use drug-target networks for prediction (ie, a bipartite graph of bound drug pairs and targets known to interact). Although these algorithms can predict some drug-target interactions to some extent, there is little effect for some new drugs or targets that have no known interaction. Results Since the datasets are usually located at or near low-dimensional nonlinear manifolds, we propose an improved GRMF (graph regularized matrix factorization) method to learn these flow patterns in combination with the previous matrix-decomposition method. In addition, we use one of the pre-processing steps previously proposed to improve the accuracy of the prediction. Conclusions Cross-validation is used to evaluate our method, and simulation experiments are used to predict new interactions. In most cases, our method is superior to other methods. Finally, some examples of new drugs and new targets are predicted by performing simulation experiments. And the improved GRMF method can better predict the remaining drug-target interactions.
Collapse
Affiliation(s)
- Zhen Cui
- School of Information Science and Engineering, Qufu Normal University, Rizhao, China
| | - Ying-Lian Gao
- Library of Qufu Normal University, Qufu Normal University, Rizhao, China
| | - Jin-Xing Liu
- School of Information Science and Engineering, Qufu Normal University, Rizhao, China. .,Co-Innovation Center for Information Supply & Assurance Technology, Anhui University, Hefei, China.
| | - Ling-Yun Dai
- School of Information Science and Engineering, Qufu Normal University, Rizhao, China
| | - Sha-Sha Yuan
- School of Information Science and Engineering, Qufu Normal University, Rizhao, China
| |
Collapse
|
6
|
Cui Z, Gao YL, Liu JX, Wang J, Shang J, Dai LY. The computational prediction of drug-disease interactions using the dual-network L 2,1-CMF method. BMC Bioinformatics 2019; 20:5. [PMID: 30611214 PMCID: PMC6320570 DOI: 10.1186/s12859-018-2575-6] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Accepted: 12/10/2018] [Indexed: 12/22/2022] Open
Abstract
Background Predicting drug-disease interactions (DDIs) is time-consuming and expensive. Improving the accuracy of prediction results is necessary, and it is crucial to develop a novel computing technology to predict new DDIs. The existing methods mostly use the construction of heterogeneous networks to predict new DDIs. However, the number of known interacting drug-disease pairs is small, so there will be many errors in this heterogeneous network that will interfere with the final results. Results A novel method, known as the dual-network L2,1-collaborative matrix factorization, is proposed to predict novel DDIs. The Gaussian interaction profile kernels and L2,1-norm are introduced in our method to achieve better results than other advanced methods. The network similarities of drugs and diseases with their chemical and semantic similarities are combined in this method. Conclusions Cross validation is used to evaluate our method, and simulation experiments are used to predict new interactions using two different datasets. Finally, our prediction accuracy is better than other existing methods. This proves that our method is feasible and effective.
Collapse
Affiliation(s)
- Zhen Cui
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826, China
| | - Ying-Lian Gao
- Library of Qufu Normal University, Qufu Normal University, Rizhao, China
| | - Jin-Xing Liu
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826, China.
| | - Juan Wang
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826, China
| | - Junliang Shang
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826, China
| | - Ling-Yun Dai
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826, China
| |
Collapse
|
7
|
Wang DQ, Gao YL, Liu JX, Zheng CH, Kong XZ. Identifying drug-pathway association pairs based on L1L2,1-integrative penalized matrix decomposition. Oncotarget 2017; 8:48075-48085. [PMID: 28624800 PMCID: PMC5564627 DOI: 10.18632/oncotarget.18254] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2017] [Accepted: 05/01/2017] [Indexed: 01/27/2023] Open
Abstract
The traditional methods of drug discovery follow the "one drug-one target" approach, which ignores the cellular and physiological environment of the action mechanism of drugs. However, pathway-based drug discovery methods can overcome this limitation. This kind of method, such as the Integrative Penalized Matrix Decomposition (iPaD) method, identifies the drug-pathway associations by taking the lasso-type penalty on the regularization term. Moreover, instead of imposing the L1-norm regularization, the L2,1-Integrative Penalized Matrix Decomposition (L2,1-iPaD) method imposes the L2,1-norm penalty on the regularization term. In this paper, based on the iPaD and L2,1-iPaD methods, we propose a novel method named L1L2,1-iPaD (L1L2,1-Integrative Penalized Matrix Decomposition), which takes the sum of the L1-norm and L2,1-norm penalties on the regularization term. Besides, we perform permutation test to assess the significance of the identified drug-pathway association pairs and compute the P-values. Compared with the existing methods, our method can identify more drug-pathway association pairs which have been validated in the CancerResource database. In order to identify drug-pathway associations which are not validated in the CancerResource database, we retrieve published papers to prove these associations. The results on two real datasets prove that our method can achieve better enrichment for identified association pairs than the iPaD and L2,1-iPaD methods.
Collapse
Affiliation(s)
- Dong-Qin Wang
- 1 School of Information Science and Engineering, Qufu Normal University, Rizhao, China
| | - Ying-Lian Gao
- 2 Library of Qufu Normal University, Qufu Normal University, Rizhao, China
| | - Jin-Xing Liu
- 1 School of Information Science and Engineering, Qufu Normal University, Rizhao, China
| | - Chun-Hou Zheng
- 1 School of Information Science and Engineering, Qufu Normal University, Rizhao, China
| | - Xiang-Zhen Kong
- 1 School of Information Science and Engineering, Qufu Normal University, Rizhao, China
| |
Collapse
|