1
|
Ma Y, Shen X, Wu D, Cao J, Nie F. Cross-View Approximation on Grassmann Manifold for Multiview Clustering. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:7772-7777. [PMID: 38700968 DOI: 10.1109/tnnls.2024.3388192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2024]
Abstract
In existing multiview clustering research, the comprehensive learning from multiview graph and feature spaces simultaneously remains insufficient when achieving a consistent clustering structure. In addition, a postprocessing step is often required. In light of these considerations, a cross-view approximation on Grassman manifold (CAGM) model is proposed to address inconsistencies within multiview adjacency matrices, feature matrices, and cross-view combinations from the two sources. The model uses a ratio-formed objective function, enabling parameter-free bidirectional fusion. Furthermore, the CAGM model incorporates a paired encoding mechanism to generate low-dimensional and orthogonal cross-view embeddings. Through the approximation of two measurable subspaces on the Grassmann manifold, the direct acquisition of the indicator matrix is realized. Furthermore, an effective optimization algorithm corresponding to the CAGM model is derived. Comprehensive experiments on four real-world datasets are conducted to substantiate the effectiveness of our proposed method.
Collapse
|
2
|
Wan X, Liu J, Gan X, Liu X, Wang S, Wen Y, Wan T, Zhu E. One-Step Multi-View Clustering With Diverse Representation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:5774-5786. [PMID: 38557633 DOI: 10.1109/tnnls.2024.3378194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Multi-View clustering has attracted broad attention due to its capacity to utilize consistent and complementary information among views. Although tremendous progress has been made recently, most existing methods undergo high complexity, preventing them from being applied to large-scale tasks. Multi-View clustering via matrix factorization is a representative to address this issue. However, most of them map the data matrices into a fixed dimension, limiting the model's expressiveness. Moreover, a range of methods suffers from a two-step process, i.e., multimodal learning and the subsequent k-means, inevitably causing a suboptimal clustering result. In light of this, we propose a one-step multi-view clustering with diverse representation (OMVCDR) method, which incorporates multi-view learning and k-means into a unified framework. Specifically, we first project original data matrices into various latent spaces to attain comprehensive information and auto-weight them in a self-supervised manner. Then, we directly use the information matrices under diverse dimensions to obtain consensus discrete clustering labels. The unified work of representation learning and clustering boosts the quality of the final results. Furthermore, we develop an efficient optimization algorithm with proven convergence to solve the resultant problem. Comprehensive experiments on various datasets demonstrate the promising clustering performance of our proposed method. The code is publicly available at https://github.com/wanxinhang/OMVCDR.
Collapse
|
3
|
Lu Q, Ding J, Li L, Chang Y. Graph contrastive learning of subcellular-resolution spatial transcriptomics improves cell type annotation and reveals critical molecular pathways. Brief Bioinform 2024; 26:bbaf020. [PMID: 39883515 PMCID: PMC11781232 DOI: 10.1093/bib/bbaf020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2024] [Revised: 12/12/2024] [Accepted: 01/10/2025] [Indexed: 01/31/2025] Open
Abstract
Imaging-based spatial transcriptomics (iST), such as MERFISH, CosMx SMI, and Xenium, quantify gene expression level across cells in space, but more importantly, they directly reveal the subcellular distribution of RNA transcripts at the single-molecule resolution. The subcellular localization of RNA molecules plays a crucial role in the compartmentalization-dependent regulation of genes within individual cells. Understanding the intracellular spatial distribution of RNA for a particular cell type thus not only improves the characterization of cell identity but also is of paramount importance in elucidating unique subcellular regulatory mechanisms specific to the cell type. However, current cell type annotation approaches of iST primarily utilize gene expression information while neglecting the spatial distribution of RNAs within cells. In this work, we introduce a semi-supervised graph contrastive learning method called Focus, the first method, to the best of our knowledge, that explicitly models RNA's subcellular distribution and community to improve cell type annotation. Focus demonstrates significant improvements over state-of-the-art algorithms across a range of spatial transcriptomics platforms, achieving improvements up to 27.8% in terms of accuracy and 51.9% in terms of F1-score for cell type annotation. Furthermore, Focus enjoys the advantages of intricate cell type-specific subcellular spatial gene patterns and providing interpretable subcellular gene analysis, such as defining the gene importance score. Importantly, with the importance score, Focus identifies genes harboring strong relevance to cell type-specific pathways, indicating its potential in uncovering novel regulatory programs across numerous biological systems.
Collapse
Affiliation(s)
- Qiaolin Lu
- School of Artificial Intelligence, Jilin University, Qianjin Street 2699, 130010 Changchun, China
| | - Jiayuan Ding
- Department of Computer Science and Engineering, Michigan State University, 220 Trowbridge Rd, East Lansing, MI 48824, United States
| | - Lingxiao Li
- Department, Boston University, Commonwealth Ave, Boston, MA 02215, United States
| | - Yi Chang
- School of Artificial Intelligence, Jilin University, Qianjin Street 2699, 130010 Changchun, China
- International Center of Future Science, Jilin University, Qianjin Street 2699, 130010 Changchun, China
- Engineering Research Center of Knowledge-Driven Human-Machine Intelligence, Jilin University, Qianjin Street 2699, 130010 Changchun, China
| |
Collapse
|
4
|
Ning Q, Zhao Y, Gao J, Chen C, Yin M. Hierarchical Hypergraph Learning in Association- Weighted Heterogeneous Network for miRNA- Disease Association Identification. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:2531-2542. [PMID: 39475747 DOI: 10.1109/tcbb.2024.3485788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
MicroRNAs (miRNAs) play a significant role in cell differentiation, biological development as well as the occurrence and growth of diseases. Although many computational methods contribute to predicting the association between miRNAs and diseases, they do not fully explore the attribute information contained in associated edges between miRNAs and diseases. In this study, we propose a new method, Hierarchical Hypergraph learning in Association-Weighted heterogeneous network for MiRNA-Disease association identification (HHAWMD). HHAWMD first adaptively fuses multi-view similarities based on channel attention and distinguishes the relevance of different associated relationships according to changes in expression levels of disease-related miRNAs, miRNA similarity information, and disease similarity information. Then, HHAWMD assigns edge weights and attribute features according to the association level to construct an association-weighted heterogeneous graph. Next, HHAWMD extracts the subgraph of the miRNA-disease node pair from the heterogeneous graph and builds the hyperedge (a kind of virtual edge) between the node pair to generate the hypergraph. Finally, HHAWMD proposes a hierarchical hypergraph learning approach, including node-aware attention and hyperedge-aware attention, which aggregates the abundant semantic information contained in deep and shallow neighborhoods to the hyperedge in the hypergraph. Our experiment results suggest that HHAWMD has better performance and can be used as a powerful tool for miRNA-disease association identification.
Collapse
|
5
|
Wang R, Guo W, Wang Y, Zhou X, Leung JC, Yan S, Cui L. Hybrid multimodal fusion for graph learning in disease prediction. Methods 2024; 229:41-48. [PMID: 38880433 DOI: 10.1016/j.ymeth.2024.06.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Revised: 06/06/2024] [Accepted: 06/12/2024] [Indexed: 06/18/2024] Open
Abstract
Graph neural networks (GNNs) have gained significant attention in disease prediction where the latent embeddings of patients are modeled as nodes and the similarities among patients are represented through edges. The graph structure, which determines how information is aggregated and propagated, plays a crucial role in graph learning. Recent approaches typically create graphs based on patients' latent embeddings, which may not accurately reflect their real-world closeness. Our analysis reveals that raw data, such as demographic attributes and laboratory results, offers a wealth of information for assessing patient similarities and can serve as a compensatory measure for graphs constructed exclusively from latent embeddings. In this study, we first construct adaptive graphs from both latent representations and raw data respectively, and then merge these graphs via weighted summation. Given that the graphs may contain extraneous and noisy connections, we apply degree-sensitive edge pruning and kNN sparsification techniques to selectively sparsify and prune these edges. We conducted intensive experiments on two diagnostic prediction datasets, and the results demonstrate that our proposed method surpasses current state-of-the-art techniques.
Collapse
Affiliation(s)
| | - Wei Guo
- Shandong University, Jinan, 250210, China.
| | | | - Xin Zhou
- Nanyang Technological University, Singapore.
| | | | - Shuo Yan
- Shandong University, Jinan, 250210, China.
| | - Lizhen Cui
- Shandong University, Jinan, 250210, China.
| |
Collapse
|
6
|
Lan W, Li C, Chen Q, Yu N, Pan Y, Zheng Y, Chen YPP. LGCDA: Predicting CircRNA-Disease Association Based on Fusion of Local and Global Features. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1413-1422. [PMID: 38607720 DOI: 10.1109/tcbb.2024.3387913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/14/2024]
Abstract
CircRNA has been shown to be involved in the occurrence of many diseases. Several computational frameworks have been proposed to identify circRNA-disease associations. Despite the existing computational methods have obtained considerable successes, these methods still require to be improved as their performance may degrade due to the sparsity of the data and the problem of memory overflow. We develop a novel computational framework called LGCDA to predict circRNA-disease associations by fusing local and global features to solve the above mentioned problems. First, we construct closed local subgraphs by using k-hop closed subgraph and label the subgraphs to obtain rich graph pattern information. Then, the local features are extracted by using graph neural network (GNN). In addition, we fuse Gaussian interaction profile (GIP) kernel and cosine similarity to obtain global features. Finally, the score of circRNA-disease associations is predicted by using the multilayer perceptron (MLP) based on local and global features. We perform five-fold cross validation on five datasets for model evaluation and our model surpasses other advanced methods.
Collapse
|
7
|
Wan X, Xiao B, Liu X, Liu J, Liang W, Zhu E. Fast Continual Multi-View Clustering With Incomplete Views. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:2995-3008. [PMID: 38640047 DOI: 10.1109/tip.2024.3388974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/21/2024]
Abstract
Multi-view clustering (MVC) has attracted broad attention due to its capacity to exploit consistent and complementary information across views. This paper focuses on a challenging issue in MVC called the incomplete continual data problem (ICDP). Specifically, most existing algorithms assume that views are available in advance and overlook the scenarios where data observations of views are accumulated over time. Due to privacy considerations or memory limitations, previous views cannot be stored in these situations. Some works have proposed ways to handle this problem, but all of them fail to address incomplete views. Such an incomplete continual data problem (ICDP) in MVC is difficult to solve since incomplete information with continual data increases the difficulty of extracting consistent and complementary knowledge among views. We propose Fast Continual Multi-View Clustering with Incomplete Views (FCMVC-IV) to address this issue. Specifically, the method maintains a scalable consensus coefficient matrix and updates its knowledge with the incoming incomplete view rather than storing and recomputing all the data matrices. Considering that the given views are incomplete, the newly collected view might contain samples that have yet to appear; two indicator matrices and a rotation matrix are developed to match matrices with different dimensions. In addition, we design a three-step iterative algorithm to solve the resultant problem with linear complexity and proven convergence. Comprehensive experiments conducted on various datasets demonstrate the superiority of FCMVC-IV over the competing approaches. The code is publicly available at https://github.com/wanxinhang/FCMVC-IV.
Collapse
|
8
|
Zheng T, Zheng Z, Zhou H, Guo Y, Li S. The multifaceted roles of COL4A4 in lung adenocarcinoma: An integrated bioinformatics and experimental study. Comput Biol Med 2024; 170:107896. [PMID: 38217972 DOI: 10.1016/j.compbiomed.2023.107896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Revised: 12/03/2023] [Accepted: 12/23/2023] [Indexed: 01/15/2024]
Abstract
BACKGROUND Abnormal expression of collagen IV subunits has been reported in cancers, but the significance is not clear. No study has reported the significance of COL4A4 in lung adenocarcinoma (LUAD). METHODS COL4A4 expression data, single-cell sequencing data and clinical data were downloaded from public databases. A range of bioinformatics and experimental methods were adopted to analyze the association of COL4A4 expression with clinical parameters, tumor microenvironment (TME), drug resistance and immunotherapy response, and to investigate the roles and underlying mechanism of COL4A4 in LUAD. RESULTS COL4A4 is differentially expressed in most of cancers analyzed, being associated with prognosis, tumor stemness, immune checkpoint gene expression and TME parameters. In LUAD, COL4A4 expression is down-regulated and associated with various TME parameters, response to immunotherapy and drug resistance. LUAD patients with lower COL4A4 have worse prognosis. Knockdown of COL4A4 significantly inhibited the expression of cell-cycle associated genes, and the expression and activation of signaling pathways including JAK/STAT3, p38, and ERK pathways, and induced quiescence in LUAD cells. Besides, it significantly induced the expression of a range of bioactive molecule genes that have been shown to have critical roles in TME remodeling and immune regulation. CONCLUSIONS COL4A4 is implicated in the pathogenesis of cancers including LUAD. Its function may be multifaceted. It can modulate the activity of LUAD cells, TME remodeling and tumor stemness, thus affecting the pathological process of LUAD. COL4A4 may be a prognostic molecular marker and a potential therapeutic target.
Collapse
Affiliation(s)
- Tiaozhan Zheng
- Department of Thoracic and Cardiovascular Surgery, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, Zhuang Autonomous Region, 530021, PR China
| | - Zhiwen Zheng
- Department of Thoracic and Cardiovascular Surgery, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, Zhuang Autonomous Region, 530021, PR China
| | - Hanxi Zhou
- Department of Pathology, Taizhou Hospital, Wenzhou Medical University, Linhai, Zhejiang Province, PR China
| | - Yiqing Guo
- Department of Pathology, Taizhou Hospital, Wenzhou Medical University, Linhai, Zhejiang Province, PR China
| | - Shikang Li
- Department of Thoracic and Cardiovascular Surgery, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, Zhuang Autonomous Region, 530021, PR China.
| |
Collapse
|
9
|
Lan W, Liu M, Chen J, Ye J, Zheng R, Zhu X, Peng W. JLONMFSC: Clustering scRNA-seq data based on joint learning of non-negative matrix factorization and subspace clustering. Methods 2024; 222:1-9. [PMID: 38128706 DOI: 10.1016/j.ymeth.2023.11.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 11/07/2023] [Accepted: 11/29/2023] [Indexed: 12/23/2023] Open
Abstract
The development of single cell RNA sequencing (scRNA-seq) has provided new perspectives to study biological problems at the single cell level. One of the key issues in scRNA-seq data analysis is to divide cells into several clusters for discovering the heterogeneity and diversity of cells. However, the existing scRNA-seq data are high-dimensional, sparse, and noisy, which challenges the existing single-cell clustering methods. In this study, we propose a joint learning framework (JLONMFSC) for clustering scRNA-seq data. In our method, the dimension of the original data is reduced to minimize the effect of noise. In addition, the graph regularized matrix factorization is used to learn the local features. Further, the Low-Rank Representation (LRR) subspace clustering is utilized to learn the global features. Finally, the joint learning of local features and global features is performed to obtain the results of clustering. We compare the proposed algorithm with eight state-of-the-art algorithms for clustering performance on six datasets, and the experimental results demonstrate that the JLONMFSC achieves better performance in all datasets. The code is avalable at https://github.com/lanbiolab/JLONMFSC.
Collapse
Affiliation(s)
- Wei Lan
- School of Computer, Electronic and Information, Guangxi University, Nanning, China; Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning, China.
| | - Mingyang Liu
- School of Computer, Electronic and Information, Guangxi University, Nanning, China
| | - Jianwei Chen
- School of Computer, Electronic and Information, Guangxi University, Nanning, China
| | - Jin Ye
- School of Computer, Electronic and Information, Guangxi University, Nanning, China
| | - Ruiqing Zheng
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Xiaoshu Zhu
- School of Computer Science and Information Security, Guilin University of Science and Technology, Guilin, China
| | - Wei Peng
- School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China
| |
Collapse
|
10
|
Wang J, Liao N, Du X, Chen Q, Wei B. A semi-supervised approach for the integration of multi-omics data based on transformer multi-head self-attention mechanism and graph convolutional networks. BMC Genomics 2024; 25:86. [PMID: 38254021 PMCID: PMC10802018 DOI: 10.1186/s12864-024-09985-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 01/07/2024] [Indexed: 01/24/2024] Open
Abstract
BACKGROUND AND OBJECTIVES Comprehensive analysis of multi-omics data is crucial for accurately formulating effective treatment plans for complex diseases. Supervised ensemble methods have gained popularity in recent years for multi-omics data analysis. However, existing research based on supervised learning algorithms often fails to fully harness the information from unlabeled nodes and overlooks the latent features within and among different omics, as well as the various associations among features. Here, we present a novel multi-omics integrative method MOSEGCN, based on the Transformer multi-head self-attention mechanism and Graph Convolutional Networks(GCN), with the aim of enhancing the accuracy of complex disease classification. MOSEGCN first employs the Transformer multi-head self-attention mechanism and Similarity Network Fusion (SNF) to separately learn the inherent correlations of latent features within and among different omics, constructing a comprehensive view of diseases. Subsequently, it feeds the learned crucial information into a self-ensembling Graph Convolutional Network (SEGCN) built upon semi-supervised learning methods for training and testing, facilitating a better analysis and utilization of information from multi-omics data to achieve precise classification of disease subtypes. RESULTS The experimental results show that MOSEGCN outperforms several state-of-the-art multi-omics integrative analysis approaches on three types of omics data: mRNA expression data, microRNA expression data, and DNA methylation data, with accuracy rates of 83.0% for Alzheimer's disease and 86.7% for breast cancer subtyping. Furthermore, MOSEGCN exhibits strong generalizability on the GBM dataset, enabling the identification of important biomarkers for related diseases. CONCLUSION MOSEGCN explores the significant relationship information among different omics and within each omics' latent features, effectively leveraging labeled and unlabeled information to further enhance the accuracy of complex disease classification. It also provides a promising approach for identifying reliable biomarkers, paving the way for personalized medicine.
Collapse
Affiliation(s)
- Jiahui Wang
- School of Computer and Information Security, Guilin University of Electronic Technology, No. 1 Jinji Road, Guilin City, 541004, Guangxi Zhuang Autonomous Region, China
| | - Nanqing Liao
- School of Medical, Guangxi University, No. 100 East University Road, Nanning, 530004, Guangxi, China
| | - Xiaofei Du
- School of Computer and Information Security, Guilin University of Electronic Technology, No. 1 Jinji Road, Guilin City, 541004, Guangxi Zhuang Autonomous Region, China
| | - Qingfeng Chen
- School of Computer, Electronics and Information, Guangxi University, No. 100 East University Road, Nanning, 530004, Guangxi, China.
| | - Bizhong Wei
- School of Computer and Information Security, Guilin University of Electronic Technology, No. 1 Jinji Road, Guilin City, 541004, Guangxi Zhuang Autonomous Region, China.
| |
Collapse
|