1
|
Li S, Hua H, Chen S. Graph neural networks for single-cell omics data: a review of approaches and applications. Brief Bioinform 2025; 26:bbaf109. [PMID: 40091193 PMCID: PMC11911123 DOI: 10.1093/bib/bbaf109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2024] [Revised: 02/09/2025] [Accepted: 02/25/2025] [Indexed: 03/19/2025] Open
Abstract
Rapid advancement of sequencing technologies now allows for the utilization of precise signals at single-cell resolution in various omics studies. However, the massive volume, ultra-high dimensionality, and high sparsity nature of single-cell data have introduced substantial difficulties to traditional computational methods. The intricate non-Euclidean networks of intracellular and intercellular signaling molecules within single-cell datasets, coupled with the complex, multimodal structures arising from multi-omics joint analysis, pose significant challenges to conventional deep learning operations reliant on Euclidean geometries. Graph neural networks (GNNs) have extended deep learning to non-Euclidean data, allowing cells and their features in single-cell datasets to be modeled as nodes within a graph structure. GNNs have been successfully applied across a broad range of tasks in single-cell data analysis. In this survey, we systematically review 107 successful applications of GNNs and their six variants in various single-cell omics tasks. We begin by outlining the fundamental principles of GNNs and their six variants, followed by a systematic review of GNN-based models applied in single-cell epigenomics, transcriptomics, spatial transcriptomics, proteomics, and multi-omics. In each section dedicated to a specific omics type, we have summarized the publicly available single-cell datasets commonly utilized in the articles reviewed in that section, totaling 77 datasets. Finally, we summarize the potential shortcomings of current research and explore directions for future studies. We anticipate that this review will serve as a guiding resource for researchers to deepen the application of GNNs in single-cell omics.
Collapse
Affiliation(s)
- Sijie Li
- School of Mathematical Sciences and The Key Laboratory of Pure Mathematics and Combinatorics, Ministry of Education (LPMC), Nankai University, No. 94 Weijin Road, Nankai District, Tianjin 300071, China
| | - Heyang Hua
- School of Mathematical Sciences and The Key Laboratory of Pure Mathematics and Combinatorics, Ministry of Education (LPMC), Nankai University, No. 94 Weijin Road, Nankai District, Tianjin 300071, China
| | - Shengquan Chen
- School of Mathematical Sciences and The Key Laboratory of Pure Mathematics and Combinatorics, Ministry of Education (LPMC), Nankai University, No. 94 Weijin Road, Nankai District, Tianjin 300071, China
| |
Collapse
|
2
|
Liu Y, Li X, Chen C, Ding N, Ma S, Yang M. Exploration of compatibility rules and discovery of active ingredients in TCM formulas by network pharmacology. CHINESE HERBAL MEDICINES 2024; 16:572-588. [PMID: 39606260 PMCID: PMC11589340 DOI: 10.1016/j.chmed.2023.09.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 08/12/2023] [Accepted: 09/05/2023] [Indexed: 11/29/2024] Open
Abstract
Network pharmacology is an interdisciplinary field that utilizes computer science, technology, and biological networks to investigate the intricate interplay among compounds/ingredients, targets, and diseases. Within the realm of traditional Chinese medicine (TCM), network pharmacology serves as a scientific approach to elucidate the compatibility relationships and underlying mechanisms of action in TCM formulas. It facilitates the identification of potential active ingredients within these formulas, providing a comprehensive understanding of their holistic and systematic nature, which aligns with the holistic principles inherent in TCM theory. TCM formulas exhibit complexity due to their multi-component characteristic, involving diverse targets and pathways. Consequently, investigating their material basis and mechanisms becomes challenging. Network pharmacology has emerged as a valuable approach in TCM formula research, leveraging its holistic and systematic advantages. The manuscript aims to provide an overview of the application of network pharmacology in studying TCM formula compatibility rules and explore future research directions. Specifically, we focus on how network pharmacology aids in interpreting TCM pharmacological theories and understanding formula compositions. Additionally, we elucidate the process of utilizing network pharmacology to identify active ingredients within TCM formulas. These findings not only offer novel research models and perspectives for integrating network pharmacology with TCM theory but also present new methodologies for investigating TCM formula compatibility. All in all, network pharmacology has become an indispensable and crucial tool in advancing TCM formula research.
Collapse
Affiliation(s)
- Yishu Liu
- Longhua Hospital Affiliated to Shanghai University of Traditional Chinese Medicine, Shanghai 200032, China
| | - Xue Li
- Longhua Hospital Affiliated to Shanghai University of Traditional Chinese Medicine, Shanghai 200032, China
| | - Chao Chen
- Longhua Hospital Affiliated to Shanghai University of Traditional Chinese Medicine, Shanghai 200032, China
| | - Nan Ding
- Longhua Hospital Affiliated to Shanghai University of Traditional Chinese Medicine, Shanghai 200032, China
| | - Shiyu Ma
- Ruijin Hospital Affiliated to Shanghai Jiaotong University, Shanghai 200025, China
| | - Ming Yang
- Longhua Hospital Affiliated to Shanghai University of Traditional Chinese Medicine, Shanghai 200032, China
| |
Collapse
|
3
|
Pang H, Wei S, Du Z, Zhao Y, Cai S, Zhao Y. Graph Representation Learning Based on Specific Subgraphs for Biomedical Interaction Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1552-1564. [PMID: 38767994 DOI: 10.1109/tcbb.2024.3402741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Discovering the novel associations of biomedical entities is of great significance and can facilitate not only the identification of network biomarkers of disease but also the search for putative drug targets.Graph representation learning (GRL) has incredible potential to efficiently predict the interactions from biomedical networks by modeling the robust representation for each node.> However, the current GRL-based methods learn the representation of nodes by aggregating the features of their neighbors with equal weights. Furthermore, they also fail to identify which features of higher-order neighbors are integrated into the representation of the central node. In this work, we propose a novel graph representation learning framework: a multi-order graph neural network based on reconstructed specific subgraphs (MGRS) for biomedical interaction prediction. In the MGRS, we apply the multi-order graph aggregation module (MOGA) to learn the wide-view representation by integrating the multi-hop neighbor features. Besides, we propose a subgraph selection module (SGSM) to reconstruct the specific subgraph with adaptive edge weights for each node. SGSM can clearly explore the dependency of the node representation on the neighbor features and learn the subgraph-based representation based on the reconstructed weighted subgraphs. Extensive experimental results on four public biomedical networks demonstrate that the MGRS performs better and is more robust than the latest baselines.
Collapse
|
4
|
Zhang Y, Li J, Lin S, Zhao J, Xiong Y, Wei DQ. An end-to-end method for predicting compound-protein interactions based on simplified homogeneous graph convolutional network and pre-trained language model. J Cheminform 2024; 16:67. [PMID: 38849874 PMCID: PMC11162000 DOI: 10.1186/s13321-024-00862-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 05/19/2024] [Indexed: 06/09/2024] Open
Abstract
Identification of interactions between chemical compounds and proteins is crucial for various applications, including drug discovery, target identification, network pharmacology, and elucidation of protein functions. Deep neural network-based approaches are becoming increasingly popular in efficiently identifying compound-protein interactions with high-throughput capabilities, narrowing down the scope of candidates for traditional labor-intensive, time-consuming and expensive experimental techniques. In this study, we proposed an end-to-end approach termed SPVec-SGCN-CPI, which utilized simplified graph convolutional network (SGCN) model with low-dimensional and continuous features generated from our previously developed model SPVec and graph topology information to predict compound-protein interactions. The SGCN technique, dividing the local neighborhood aggregation and nonlinearity layer-wise propagation steps, effectively aggregates K-order neighbor information while avoiding neighbor explosion and expediting training. The performance of the SPVec-SGCN-CPI method was assessed across three datasets and compared against four machine learning- and deep learning-based methods, as well as six state-of-the-art methods. Experimental results revealed that SPVec-SGCN-CPI outperformed all these competing methods, particularly excelling in unbalanced data scenarios. By propagating node features and topological information to the feature space, SPVec-SGCN-CPI effectively incorporates interactions between compounds and proteins, enabling the fusion of heterogeneity. Furthermore, our method scored all unlabeled data in ChEMBL, confirming the top five ranked compound-protein interactions through molecular docking and existing evidence. These findings suggest that our model can reliably uncover compound-protein interactions within unlabeled compound-protein pairs, carrying substantial implications for drug re-profiling and discovery. In summary, SPVec-SGCN demonstrates its efficacy in accurately predicting compound-protein interactions, showcasing potential to enhance target identification and streamline drug discovery processes.Scientific contributionsThe methodology presented in this work not only enables the comparatively accurate prediction of compound-protein interactions but also, for the first time, take sample imbalance which is very common in real world and computation efficiency into consideration simultaneously, accelerating the target identification and drug discovery process.
Collapse
Affiliation(s)
- Yufang Zhang
- School of Mathematical Sciences and SJTU-Yale Joint Center for Biostatistics and Data Science, Shanghai Jiao Tong University, Shanghai, 200240, China
- Peng Cheng Laboratory, Shenzhen, 518055, Guangdong, China
- Zhongjing Research and Industrialization, Institute of Chinese Medicine, Zhongguancun Scientific Park, Meixi, Nanyang, 473006, Henan, China
| | - Jiayi Li
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai JiaoTong University, Shanghai, China
| | - Shenggeng Lin
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai JiaoTong University, Shanghai, China
| | - Jianwei Zhao
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai JiaoTong University, Shanghai, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai JiaoTong University, Shanghai, China.
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China.
| | - Dong-Qing Wei
- Peng Cheng Laboratory, Shenzhen, 518055, Guangdong, China.
- Zhongjing Research and Industrialization, Institute of Chinese Medicine, Zhongguancun Scientific Park, Meixi, Nanyang, 473006, Henan, China.
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai JiaoTong University, Shanghai, China.
| |
Collapse
|
5
|
Tu W, Ling G, Liu F, Hu F, Song X. GCSTI: A Single-Cell Pseudotemporal Trajectory Inference Method Based on Graph Compression. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2945-2958. [PMID: 37037234 DOI: 10.1109/tcbb.2023.3266109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
The single-cell pseudotemporal trajectory inference is an important way to explore the process of developmental changes within a cell. Due to the uneven rate of cell growth, changes in gene expression depend less on the time of data collection and more on a cell's "internal clock". To overcome the challenges of gene analysis, and replicate biological developmental processes, several strategies have been put forth. However, due to the size of single-cell datasets, locating relevant signposts usually necessitate clustering analysis or a sizable amount of priori information. To this end, we propose a novel single-cell pseudotemporal trajectory inference technique: GCSTI method, which is based on graph compression and doesn't rely on a priori knowledge or clustering procedures, can handle the trajectory inference problem for a large network in a stable and efficient manner. Additionally, we simultaneously improve the pseudotime defining method currently employed in this study in order to obtain more trustworthy and beneficial outcomes for trajectory inference. Finally, we validate the efficacy and stability of the GCSTI method using datasets from human skeletal muscle myogenic cells and four simulated datasets.
Collapse
|
6
|
Feng H, Xiang Y, Wang X, Xue W, Yue Z. MTAGCN: predicting miRNA-target associations in Camellia sinensis var. assamica through graph convolution neural network. BMC Bioinformatics 2022; 23:271. [PMID: 35820798 PMCID: PMC9275082 DOI: 10.1186/s12859-022-04819-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Accepted: 07/01/2022] [Indexed: 11/10/2022] Open
Abstract
Background MircoRNAs (miRNAs) play a central role in diverse biological processes of Camellia sinensis var.assamica (CSA) through their associations with target mRNAs, including CSA growth, development and stress response. However, although the experiment methods of CSA miRNA-target identifications are costly and time-consuming, few computational methods have been developed to tackle the CSA miRNA-target association prediction problem. Results In this paper, we constructed a heterogeneous network for CSA miRNA and targets by integrating rich biological information, including a miRNA similarity network, a target similarity network, and a miRNA-target association network. We then proposed a deep learning framework of graph convolution networks with layer attention mechanism, named MTAGCN. In particular, MTAGCN uses the attention mechanism to combine embeddings of multiple graph convolution layers, employing the integrated embedding to score the unobserved CSA miRNA-target associations. Discussion Comprehensive experiment results on two tasks (balanced task and unbalanced task) demonstrated that our proposed model achieved better performance than the classic machine learning and existing graph convolution network-based methods. The analysis of these results could offer valuable information for understanding complex CSA miRNA-target association mechanisms and would make a contribution to precision plant breeding. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04819-3.
Collapse
Affiliation(s)
- Haisong Feng
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Ying Xiang
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Xiaosong Wang
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Wei Xue
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Zhenyu Yue
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China.
| |
Collapse
|
7
|
Tan K, Huang W, Liu X, Hu J, Dong S. A Hierarchical Graph Convolution Network for Representation Learning of Gene Expression Data. IEEE J Biomed Health Inform 2021; 25:3219-3229. [PMID: 33449889 DOI: 10.1109/jbhi.2021.3052008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The curse of dimensionality, which is caused by high-dimensionality and low-sample-size, is a major challenge in gene expression data analysis. However, the real situation is even worse: labelling data is laborious and time-consuming, so only a small part of the limited samples will be labelled. Having such few labelled samples further increases the difficulty of training deep learning models. Interpretability is an important requirement in biomedicine. Many existing deep learning methods are trying to provide interpretability, but rarely apply to gene expression data. Recent semi-supervised graph convolution network methods try to address these problems by smoothing the label information over a graph. However, to the best of our knowledge, these methods only utilize graphs in either the feature space or sample space, which restrict their performance. We propose a transductive semi-supervised representation learning method called a hierarchical graph convolution network (HiGCN) to aggregate the information of gene expression data in both feature and sample spaces. HiGCN first utilizes external knowledge to construct a feature graph and a similarity kernel to construct a sample graph. Then, two spatial-based GCNs are used to aggregate information on these graphs. To validate the model's performance, synthetic and real datasets are provided to lend empirical support. Compared with two recent models and three traditional models, HiGCN learns better representations of gene expression data, and these representations improve the performance of downstream tasks, especially when the model is trained on a few labelled samples. Important features can be extracted from our model to provide reliable interpretability.
Collapse
|