1
|
Yuan R, Tang Y, Wu Y, Zhang W. Clustering Enhanced Multiplex Graph Contrastive Representation Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:1341-1355. [PMID: 38015684 DOI: 10.1109/tnnls.2023.3334751] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2023]
Abstract
Multiplex graph representation learning has attracted considerable attention due to its powerful capacity to depict multiple relation types between nodes. Previous methods generally learn representations of each relation-based subgraph and then aggregate them into final representations. Despite the enormous success, they commonly encounter two challenges: 1) the latent community structure is overlooked and 2) consistent and complementary information across relation types remains largely unexplored. To address these issues, we propose a clustering-enhanced multiplex graph contrastive representation learning model (CEMR). In CEMR, by formulating each relation type as a view, we propose a multiview graph clustering framework to discover the potential community structure, which promotes representations to incorporate global semantic correlations. Moreover, under the proposed multiview clustering framework, we develop cross-view contrastive learning and cross-view cosupervision modules to explore consistent and complementary information in different views, respectively. Specifically, the cross-view contrastive learning module equipped with a novel negative pairs selecting mechanism enables the view-specific representations to extract common knowledge across views. The cross-view cosupervision module exploits the high-confidence complementary information in one view to guide low-confidence clustering in other views by contrastive learning. Comprehensive experiments on four datasets confirm the superiority of our CEMR when compared to the state-of-the-art rivals.
Collapse
|
2
|
Liu T, Fang ZY, Zhang Z, Yu Y, Li M, Yin MZ. A comprehensive overview of graph neural network-based approaches to clustering for spatial transcriptomics. Comput Struct Biotechnol J 2024; 23:106-128. [PMID: 38089467 PMCID: PMC10714345 DOI: 10.1016/j.csbj.2023.11.055] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 11/24/2023] [Accepted: 11/27/2023] [Indexed: 10/16/2024] Open
Abstract
Spatial transcriptomics technologies enable researchers to accurately quantify and localize messenger ribonucleic acid (mRNA) transcripts at a high resolution while preserving their spatial context. The identification of spatial domains, or the task of spatial clustering, plays a crucial role in investigating data on spatial transcriptomes. One promising approach for classifying spatial domains involves the use of graph neural networks (GNNs) by leveraging gene expressions, spatial locations, and histological images. This study provided a comprehensive overview of the most recent GNN-based methods of spatial clustering methods for the analysis of data on spatial transcriptomics. We extensively evaluated the performance of current methods on prevalent datasets of spatial transcriptomics by considering their accuracy of clustering, robustness, data stabilization, relevant requirements, computational efficiency, and memory use. To this end, we explored 60 clustering scenarios by extending the essential frameworks of spatial clustering for the selection of the GNNs, algorithms of downstream clustering, principal component analysis (PCA)-based reduction, and refined methods of correction. We comparatively analyzed the performance of the methods in terms of spatial clustering to identify their limitations and outline future directions of research in the field. Our survey yielded novel insights, and provided motivation for further investigating spatial transcriptomics.
Collapse
Affiliation(s)
- Teng Liu
- Clinical Research Center (CRC), Clinical Pathology Center (CPC), Cancer Early Detection and Treatment Center (CEDTC) and Translational Medicine Research Center (TMRC), Chongqing University Three Gorges Hospital, Chongqing University, Wanzhou, Chongqing, China
- Chongqing Technical Innovation Center for Quality Evaluation and Identification of Authentic Medicinal Herbs, Chongqing, China
| | - Zhao-Yu Fang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Zongbo Zhang
- Clinical Research Center (CRC), Clinical Pathology Center (CPC), Cancer Early Detection and Treatment Center (CEDTC) and Translational Medicine Research Center (TMRC), Chongqing University Three Gorges Hospital, Chongqing University, Wanzhou, Chongqing, China
| | - Yongxiang Yu
- Clinical Research Center (CRC), Clinical Pathology Center (CPC), Cancer Early Detection and Treatment Center (CEDTC) and Translational Medicine Research Center (TMRC), Chongqing University Three Gorges Hospital, Chongqing University, Wanzhou, Chongqing, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
- Hunan Provincial Engineering Research Center of Intelligent Computing in Biology and Medicine, Central South University, Changsha 410083, China
| | - Ming-Zhu Yin
- Clinical Research Center (CRC), Clinical Pathology Center (CPC), Cancer Early Detection and Treatment Center (CEDTC) and Translational Medicine Research Center (TMRC), Chongqing University Three Gorges Hospital, Chongqing University, Wanzhou, Chongqing, China
- Chongqing Technical Innovation Center for Quality Evaluation and Identification of Authentic Medicinal Herbs, Chongqing, China
| |
Collapse
|
3
|
He D, Liang C, Huo C, Feng Z, Jin D, Yang L, Zhang W. Analyzing Heterogeneous Networks With Missing Attributes by Unsupervised Contrastive Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:4438-4450. [PMID: 35235523 DOI: 10.1109/tnnls.2022.3149997] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Heterogeneous information networks (HINs) are potent models of complex systems. In practice, many nodes in an HIN have their attributes unspecified, resulting in significant performance degradation for supervised and unsupervised representation learning. We developed an unsupervised heterogeneous graph contrastive learning approach for analyzing HINs with missing attributes (HGCA). HGCA adopts a contrastive learning strategy to unify attribute completion and representation learning in an unsupervised heterogeneous framework. To deal with a large number of missing attributes and the absence of labels in unsupervised scenarios, we proposed an augmented network to capture the semantic relations between nodes and attributes to achieve a fine-grained attribute completion. Extensive experiments on three large real-world HINs demonstrated the superiority of HGCA over several state-of-the-art methods. The results also showed that the complemented attributes by HGCA can improve the performance of existing HIN models.
Collapse
|
4
|
Yang C, Wen H, Hooi B, Zhou L. CapMax: A Framework for Dynamic Network Representation Learning From the View of Multiuser Communication. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:4554-4566. [PMID: 36417735 DOI: 10.1109/tnnls.2022.3222165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
In this article, a modified mutual information maximization (InfoMax) framework, named channel capacity maximization (CapMax), is proposed and applied to learn informative representations for dynamic networks with time-varying topology and/or time-evolving node attributes. The CapMax is based on the network information theory for multiuser communication, where the representation model is treated as a multiaccess communication channel with memory and feedback. Without requirements of the backbone structure, the learning objective of our CapMax is maximizing the channel capacity, which is measured by directed information (DI) rather than mutual information. For efficient implementation, we design an estimator of the channel capacity through the combination of graph neural networks (GNNs) and recurrent neural networks (RNNs). Under some mild conditions, we theoretically prove that DI is a better measure than mutual information in capturing useful information. The experiments are conducted on multiple real-world dynamic network datasets, and the outperformance of our CapMax on different backbone models on link detection and prediction validates the effectiveness of modeling the representation model as a communication channel.
Collapse
|
5
|
Pan S, Xia L, Xu L, Li Z. SubMDTA: drug target affinity prediction based on substructure extraction and multi-scale features. BMC Bioinformatics 2023; 24:334. [PMID: 37679724 PMCID: PMC10485962 DOI: 10.1186/s12859-023-05460-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 08/31/2023] [Indexed: 09/09/2023] Open
Abstract
BACKGROUND Drug-target affinity (DTA) prediction is a critical step in the field of drug discovery. In recent years, deep learning-based methods have emerged for DTA prediction. In order to solve the problem of fusion of substructure information of drug molecular graphs and utilize multi-scale information of protein, a self-supervised pre-training model based on substructure extraction and multi-scale features is proposed in this paper. RESULTS For drug molecules, the model obtains substructure information through the method of probability matrix, and the contrastive learning method is implemented on the graph-level representation and subgraph-level representation to pre-train the graph encoder for downstream tasks. For targets, a BiLSTM method that integrates multi-scale features is used to capture long-distance relationships in the amino acid sequence. The experimental results showed that our model achieved better performance for DTA prediction. CONCLUSIONS The proposed model improves the performance of the DTA prediction, which provides a novel strategy based on substructure extraction and multi-scale features.
Collapse
Affiliation(s)
- Shourun Pan
- College of Computer Science and Technology, Qingdao University, Qingdao, China
| | - Leiming Xia
- College of Computer Science and Technology, Qingdao University, Qingdao, China
| | - Lei Xu
- College of Computer Science and Technology, Qingdao University, Qingdao, China
| | - Zhen Li
- College of Computer Science and Technology, Qingdao University, Qingdao, China.
| |
Collapse
|
6
|
|
7
|
Yang C, Xiao Y, Zhang Y, Sun Y, Han J. Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2022; 34:4854-4873. [PMID: 37915376 PMCID: PMC10619966 DOI: 10.1109/tkde.2020.3045924] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/03/2023]
Abstract
Since real-world objects and their interactions are often multi-modal and multi-typed, heterogeneous networks have been widely used as a more powerful, realistic, and generic superclass of traditional homogeneous networks (graphs). Meanwhile, representation learning (a.k.a. embedding) has recently been intensively studied and shown effective for various network mining and analytical tasks. In this work, we aim to provide a unified framework to deeply summarize and evaluate existing research on heterogeneous network embedding (HNE), which includes but goes beyond a normal survey. Since there has already been a broad body of HNE algorithms, as the first contribution of this work, we provide a generic paradigm for the systematic categorization and analysis over the merits of various existing HNE algorithms. Moreover, existing HNE algorithms, though mostly claimed generic, are often evaluated on different datasets. Understandable due to the application favor of HNE, such indirect comparisons largely hinder the proper attribution of improved task performance towards effective data preprocessing and novel technical design, especially considering the various ways possible to construct a heterogeneous network from real-world application data. Therefore, as the second contribution, we create four benchmark datasets with various properties regarding scale, structure, attribute/label availability, and etc. from different sources, towards handy and fair evaluations of HNE algorithms. As the third contribution, we carefully refactor and amend the implementations and create friendly interfaces for 13 popular HNE algorithms, and provide all-around comparisons among them over multiple tasks and experimental settings. By putting all existing HNE algorithms under a unified framework, we aim to provide a universal reference and guideline for the understanding and development of HNE algorithms. Meanwhile, by open-sourcing all data and code, we envision to serve the community with an ready-to-use benchmark platform to test and compare the performance of existing and future HNE algorithms (https://github.com/yangji9181/HNE).
Collapse
Affiliation(s)
- Carl Yang
- Carl Yang is with Emory University; Yuxin Xiao is with Carnegie Mellon University; Yu Zhang and Jiawei Han are with University of Illinois, Urbana Champaign; Yizhou Sun is with University of California, Los Angeles
| | - Yuxin Xiao
- Carl Yang is with Emory University; Yuxin Xiao is with Carnegie Mellon University; Yu Zhang and Jiawei Han are with University of Illinois, Urbana Champaign; Yizhou Sun is with University of California, Los Angeles
| | - Yu Zhang
- Carl Yang is with Emory University; Yuxin Xiao is with Carnegie Mellon University; Yu Zhang and Jiawei Han are with University of Illinois, Urbana Champaign; Yizhou Sun is with University of California, Los Angeles
| | - Yizhou Sun
- Carl Yang is with Emory University; Yuxin Xiao is with Carnegie Mellon University; Yu Zhang and Jiawei Han are with University of Illinois, Urbana Champaign; Yizhou Sun is with University of California, Los Angeles
| | - Jiawei Han
- Carl Yang is with Emory University; Yuxin Xiao is with Carnegie Mellon University; Yu Zhang and Jiawei Han are with University of Illinois, Urbana Champaign; Yizhou Sun is with University of California, Los Angeles
| |
Collapse
|
8
|
Wang Y, Peng Q, Wang W, Guo X, Shao M, Liu H, Liang W, Pan L. Network Alignment enhanced via modeling heterogeneity of anchor nodes. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
9
|
|
10
|
Cross-network representation learning for anchor users on multiplex heterogeneous social network. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.108461] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
11
|
Che F, Tao J, Yang G, Liu T, Zhang D. Multi-aspect self-supervised learning for heterogeneous information network. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.107474] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
12
|
Wu L, Wang D, Song K, Feng S, Zhang Y, Yu G. Dual-view hypergraph neural networks for attributed graph learning. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.107185] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
13
|
Yi HC, You ZH, Huang DS, Kwoh CK. Graph representation learning in bioinformatics: trends, methods and applications. Brief Bioinform 2021; 23:6361044. [PMID: 34471921 DOI: 10.1093/bib/bbab340] [Citation(s) in RCA: 58] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 07/18/2021] [Accepted: 08/02/2021] [Indexed: 12/12/2022] Open
Abstract
Graph is a natural data structure for describing complex systems, which contains a set of objects and relationships. Ubiquitous real-life biomedical problems can be modeled as graph analytics tasks. Machine learning, especially deep learning, succeeds in vast bioinformatics scenarios with data represented in Euclidean domain. However, rich relational information between biological elements is retained in the non-Euclidean biomedical graphs, which is not learning friendly to classic machine learning methods. Graph representation learning aims to embed graph into a low-dimensional space while preserving graph topology and node properties. It bridges biomedical graphs and modern machine learning methods and has recently raised widespread interest in both machine learning and bioinformatics communities. In this work, we summarize the advances of graph representation learning and its representative applications in bioinformatics. To provide a comprehensive and structured analysis and perspective, we first categorize and analyze both graph embedding methods (homogeneous graph embedding, heterogeneous graph embedding, attribute graph embedding) and graph neural networks. Furthermore, we summarize their representative applications from molecular level to genomics, pharmaceutical and healthcare systems level. Moreover, we provide open resource platforms and libraries for implementing these graph representation learning methods and discuss the challenges and opportunities of graph representation learning in bioinformatics. This work provides a comprehensive survey of emerging graph representation learning algorithms and their applications in bioinformatics. It is anticipated that it could bring valuable insights for researchers to contribute their knowledge to graph representation learning and future-oriented bioinformatics studies.
Collapse
Affiliation(s)
- Hai-Cheng Yi
- Chinese Academy of Sciences, Xinjiang Technical Institute of Physics and Chemistry, Urumqi 830011, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an 710129, China
| | - De-Shuang Huang
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China
| | - Chee Keong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore
| |
Collapse
|
14
|
Jafari SH, Abdolhosseini-Qomi AM, Asadpour M, Rahgozar M, Yazdani N. An information theoretic approach to link prediction in multiplex networks. Sci Rep 2021; 11:13242. [PMID: 34168194 PMCID: PMC8225891 DOI: 10.1038/s41598-021-92427-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Accepted: 06/10/2021] [Indexed: 11/09/2022] Open
Abstract
The entities of real-world networks are connected via different types of connections (i.e., layers). The task of link prediction in multiplex networks is about finding missing connections based on both intra-layer and inter-layer correlations. Our observations confirm that in a wide range of real-world multiplex networks, from social to biological and technological, a positive correlation exists between connection probability in one layer and similarity in other layers. Accordingly, a similarity-based automatic general-purpose multiplex link prediction method-SimBins-is devised that quantifies the amount of connection uncertainty based on observed inter-layer correlations in a multiplex network. Moreover, SimBins enhances the prediction quality in the target layer by incorporating the effect of link overlap across layers. Applying SimBins to various datasets from diverse domains, our findings indicate that SimBins outperforms the compared methods (both baseline and state-of-the-art methods) in most instances when predicting links. Furthermore, it is discussed that SimBins imposes minor computational overhead to the base similarity measures making it a potentially fast method, suitable for large-scale multiplex networks.
Collapse
Affiliation(s)
- Seyed Hossein Jafari
- School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran.
| | | | - Masoud Asadpour
- School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran
| | - Maseud Rahgozar
- School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran
| | - Naser Yazdani
- School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran
| |
Collapse
|
15
|
Detection of Sociolinguistic Features in Digital Social Networks for the Detection of Communities. Cognit Comput 2021. [DOI: 10.1007/s12559-021-09818-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|