1
|
Wang Z, Meng J, Li H, Dai Q, Lin X, Luan Y. Attention-augmented multi-domain cooperative graph representation learning for molecular interaction prediction. Neural Netw 2025; 186:107265. [PMID: 39987715 DOI: 10.1016/j.neunet.2025.107265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2024] [Revised: 01/23/2025] [Accepted: 02/07/2025] [Indexed: 02/25/2025]
Abstract
Accurate identification of molecular interactions is crucial for biological network analysis, which can provide valuable insights into fundamental regulatory mechanisms. Despite considerable progress driven by computational advancements, existing methods often rely on task-specific prior knowledge or inherent structural properties of molecules, which limits their generalizability and applicability. Recently, graph-based methods have emerged as a promising approach for predicting links in molecular networks. However, most of these methods focus primarily on aggregating topological information within individual domains, leading to an inadequate characterization of molecular interactions. To mitigate these challenges, we propose AMCGRL, a generalized multi-domain cooperative graph representation learning framework for multifarious molecular interaction prediction tasks. Concretely, AMCGRL incorporates multiple graph encoders to simultaneously learn molecular representations from both intra-domain and inter-domain graphs in a comprehensive manner. Then, the cross-domain decoder is employed to bridge these graph encoders to facilitate the extraction of task-relevant information across different domains. Furthermore, a hierarchical mutual attention mechanism is developed to capture complex pairwise interaction patterns between distinct types of molecules through inter-molecule communicative learning. Extensive experiments conducted on the various datasets demonstrate the superior representation learning capability of AMCGRL compared to the state-of-the-art methods, proving its effectiveness in advancing the prediction of molecular interactions.
Collapse
Affiliation(s)
- Zhaowei Wang
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China.
| | - Jun Meng
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China.
| | - Haibin Li
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China.
| | - Qiguo Dai
- School of Computer Science and Engineering, Dalian Minzu University, Dalian 116600, China.
| | - Xiaohui Lin
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China.
| | - Yushi Luan
- School of Bioengineering, Dalian University of Technology, Dalian 116024, China.
| |
Collapse
|
2
|
Zitnik M, Li MM, Wells A, Glass K, Morselli Gysi D, Krishnan A, Murali TM, Radivojac P, Roy S, Baudot A, Bozdag S, Chen DZ, Cowen L, Devkota K, Gitter A, Gosline SJC, Gu P, Guzzi PH, Huang H, Jiang M, Kesimoglu ZN, Koyuturk M, Ma J, Pico AR, Pržulj N, Przytycka TM, Raphael BJ, Ritz A, Sharan R, Shen Y, Singh M, Slonim DK, Tong H, Yang XH, Yoon BJ, Yu H, Milenković T. Current and future directions in network biology. BIOINFORMATICS ADVANCES 2024; 4:vbae099. [PMID: 39143982 PMCID: PMC11321866 DOI: 10.1093/bioadv/vbae099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 05/31/2024] [Accepted: 07/08/2024] [Indexed: 08/16/2024]
Abstract
Summary Network biology is an interdisciplinary field bridging computational and biological sciences that has proved pivotal in advancing the understanding of cellular functions and diseases across biological systems and scales. Although the field has been around for two decades, it remains nascent. It has witnessed rapid evolution, accompanied by emerging challenges. These stem from various factors, notably the growing complexity and volume of data together with the increased diversity of data types describing different tiers of biological organization. We discuss prevailing research directions in network biology, focusing on molecular/cellular networks but also on other biological network types such as biomedical knowledge graphs, patient similarity networks, brain networks, and social/contact networks relevant to disease spread. In more detail, we highlight areas of inference and comparison of biological networks, multimodal data integration and heterogeneous networks, higher-order network analysis, machine learning on networks, and network-based personalized medicine. Following the overview of recent breakthroughs across these five areas, we offer a perspective on future directions of network biology. Additionally, we discuss scientific communities, educational initiatives, and the importance of fostering diversity within the field. This article establishes a roadmap for an immediate and long-term vision for network biology. Availability and implementation Not applicable.
Collapse
Affiliation(s)
- Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Michelle M Li
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Aydin Wells
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
- Lucy Family Institute for Data and Society, University of Notre Dame, Notre Dame, IN 46556, United States
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Kimberly Glass
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, United States
| | - Deisy Morselli Gysi
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, United States
- Department of Statistics, Federal University of Paraná, Curitiba, Paraná 81530-015, Brazil
- Department of Physics, Northeastern University, Boston, MA 02115, United States
| | - Arjun Krishnan
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, United States
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, United States
| | - Sushmita Roy
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53715, United States
- Wisconsin Institute for Discovery, Madison, WI 53715, United States
| | - Anaïs Baudot
- Aix Marseille Université, INSERM, MMG, Marseille, France
| | - Serdar Bozdag
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
- Department of Mathematics, University of North Texas, Denton, TX 76203, United States
| | - Danny Z Chen
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Lenore Cowen
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Kapil Devkota
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Anthony Gitter
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53715, United States
- Morgridge Institute for Research, Madison, WI 53715, United States
| | - Sara J C Gosline
- Biological Sciences Division, Pacific Northwest National Laboratory, Seattle, WA 98109, United States
| | - Pengfei Gu
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Pietro H Guzzi
- Department of Medical and Surgical Sciences, University Magna Graecia of Catanzaro, Catanzaro, 88100, Italy
| | - Heng Huang
- Department of Computer Science, University of Maryland College Park, College Park, MD 20742, United States
| | - Meng Jiang
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Ziynet Nesibe Kesimoglu
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20814, United States
| | - Mehmet Koyuturk
- Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH 44106, United States
| | - Jian Ma
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, United States
| | - Alexander R Pico
- Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA 94158, United States
| | - Nataša Pržulj
- Department of Computer Science, University College London, London, WC1E 6BT, England
- ICREA, Catalan Institution for Research and Advanced Studies, Barcelona, 08010, Spain
- Barcelona Supercomputing Center (BSC), Barcelona, 08034, Spain
| | - Teresa M Przytycka
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20814, United States
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, NJ 08544, United States
| | - Anna Ritz
- Department of Biology, Reed College, Portland, OR 97202, United States
| | - Roded Sharan
- School of Computer Science, Tel Aviv University, Tel Aviv, 69978, Israel
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, NJ 08544, United States
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, United States
| | - Donna K Slonim
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Hanghang Tong
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
| | - Xinan Holly Yang
- Department of Pediatrics, University of Chicago, Chicago, IL 60637, United States
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, United States
| | - Haiyuan Yu
- Department of Computational Biology, Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, United States
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
- Lucy Family Institute for Data and Society, University of Notre Dame, Notre Dame, IN 46556, United States
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, United States
| |
Collapse
|
3
|
Li G, Li S, Liang C, Xiao Q, Luo J. Drug repositioning based on residual attention network and free multiscale adversarial training. BMC Bioinformatics 2024; 25:261. [PMID: 39118000 PMCID: PMC11308596 DOI: 10.1186/s12859-024-05893-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 08/06/2024] [Indexed: 08/10/2024] Open
Abstract
BACKGROUND Conducting traditional wet experiments to guide drug development is an expensive, time-consuming and risky process. Analyzing drug function and repositioning plays a key role in identifying new therapeutic potential of approved drugs and discovering therapeutic approaches for untreated diseases. Exploring drug-disease associations has far-reaching implications for identifying disease pathogenesis and treatment. However, reliable detection of drug-disease relationships via traditional methods is costly and slow. Therefore, investigations into computational methods for predicting drug-disease associations are currently needed. RESULTS This paper presents a novel drug-disease association prediction method, RAFGAE. First, RAFGAE integrates known associations between diseases and drugs into a bipartite network. Second, RAFGAE designs the Re_GAT framework, which includes multilayer graph attention networks (GATs) and two residual networks. The multilayer GATs are utilized for learning the node embeddings, which is achieved by aggregating information from multihop neighbors. The two residual networks are used to alleviate the deep network oversmoothing problem, and an attention mechanism is introduced to combine the node embeddings from different attention layers. Third, two graph autoencoders (GAEs) with collaborative training are constructed to simulate label propagation to predict potential associations. On this basis, free multiscale adversarial training (FMAT) is introduced. FMAT enhances node feature quality through small gradient adversarial perturbation iterations, improving the prediction performance. Finally, tenfold cross-validations on two benchmark datasets show that RAFGAE outperforms current methods. In addition, case studies have confirmed that RAFGAE can detect novel drug-disease associations. CONCLUSIONS The comprehensive experimental results validate the utility and accuracy of RAFGAE. We believe that this method may serve as an excellent predictor for identifying unobserved disease-drug associations.
Collapse
Affiliation(s)
- Guanghui Li
- School of Information Engineering, East China Jiaotong University, Nanchang, China.
| | - Shuwen Li
- School of Information Engineering, East China Jiaotong University, Nanchang, China
| | - Cheng Liang
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Qiu Xiao
- College of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China.
| |
Collapse
|
4
|
Li M, Wang Z, Liu L, Liu X, Zhang W. Subgraph-Aware Graph Kernel Neural Network for Link Prediction in Biological Networks. IEEE J Biomed Health Inform 2024; 28:4373-4381. [PMID: 38630566 DOI: 10.1109/jbhi.2024.3390092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2024]
Abstract
Identifying links within biological networks is important in various biomedical applications. Recent studies have revealed that each node in a network may play a unique role in different links, but most link prediction methods overlook distinctive node roles, hindering the acquisition of effective link representations. Subgraph-based methods have been introduced as solutions but often ignore shared information among subgraphs. To address these limitations, we propose a Subgraph-aware Graph Kernel Neural Network (SubKNet) for link prediction in biological networks. Specifically, SubKNet extracts a subgraph for each node pair and feeds it into a graph kernel neural network, which decomposes each subgraph into a combination of trainable graph filters with diversity regularization for subgraph-aware representation learning. Additionally, node embeddings of the network are extracted as auxiliary information, aiding in distinguishing node pairs that share the same subgraph. Extensive experiments on five biological networks demonstrate that SubKNet outperforms baselines, including methods especially designed for biological networks and methods adapted to various networks. Further investigations confirm that employing graph filters to subgraphs helps to distinguish node roles in different subgraphs, and the inclusion of diversity regularization further enhances its capacity from diverse perspectives, generating effective link representations that contribute to more accurate link prediction.
Collapse
|
5
|
Cinaglia P. PyMulSim: a method for computing node similarities between multilayer networks via graph isomorphism networks. BMC Bioinformatics 2024; 25:211. [PMID: 38872090 PMCID: PMC11170789 DOI: 10.1186/s12859-024-05830-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 06/10/2024] [Indexed: 06/15/2024] Open
Abstract
BACKGROUND In bioinformatics, interactions are modelled as networks, based on graph models. Generally, these support a single-layer structure which incorporates a specific entity (i.e., node) and only one type of link (i.e., edge). However, real-world biological systems consisting of biological objects belonging to heterogeneous entities, and these operate and influence each other in multiple contexts, simultaneously. Usually, node similarities are investigated to assess the relatedness between biological objects in a network of interest, and node embeddings are widely used for studying novel interaction from a topological point of view. About that, the state-of-the-art presents several methods for evaluating the node similarity inside a given network, but methodologies able to evaluate similarities between pairs of nodes belonging to different networks are missing. The latter are crucial for studies that relate different biological networks, e.g., for Network Alignment or to evaluate the possible evolution of the interactions of a little-known network on the basis of a well-known one. Existing methods are ineffective in evaluating nodes outside their structure, even more so in the context of multilayer networks, in which the topic still exploits approaches adapted from static networks. In this paper, we presented pyMulSim, a novel method for computing the pairwise similarities between nodes belonging to different multilayer networks. It uses a Graph Isomorphism Network (GIN) for the representative learning of node features, that uses for processing the embeddings and computing the similarities between the pairs of nodes of different multilayer networks. RESULTS Our experimentation investigated the performance of our method. Results show that our method effectively evaluates the similarities between the biological objects of a source multilayer network to a target one, based on the analysis of the node embeddings. Results have been also assessed for different noise levels, also through statistical significance analyses properly performed for this purpose. CONCLUSIONS PyMulSim is a novel method for computing the pairwise similarities between nodes belonging to different multilayer networks, by using a GIN for learning node embeddings. It has been evaluated both in terms of performance and validity, reporting a high degree of reliability.
Collapse
Affiliation(s)
- Pietro Cinaglia
- Department of Health Sciences, Magna Graecia University, Catanzaro, 88100, Italy.
- Data Analytics Research Center, Magna Graecia University, Catanzaro, 88100, Italy.
| |
Collapse
|
6
|
Bezbochina A, Stavinova E, Kovantsev A, Chunaev P. Enhancing Predictability Assessment: An Overview and Analysis of Predictability Measures for Time Series and Network Links. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1542. [PMID: 37998234 PMCID: PMC10670407 DOI: 10.3390/e25111542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 11/09/2023] [Accepted: 11/13/2023] [Indexed: 11/25/2023]
Abstract
Driven by the variety of available measures intended to estimate predictability of diverse objects such as time series and network links, this paper presents a comprehensive overview of the existing literature in this domain. Our overview delves into predictability from two distinct perspectives: the intrinsic predictability, which represents a data property independent of the chosen forecasting model and serves as the highest achievable forecasting quality level, and the realized predictability, which represents a chosen quality metric for a specific pair of data and model. The reviewed measures are used to assess predictability across different objects, starting from time series (univariate, multivariate, and categorical) to network links. Through experiments, we establish a noticeable relationship between measures of realized and intrinsic predictability in both generated and real-world time series data (with the correlation coefficient being statistically significant at a 5% significance level). The discovered correlation in this research holds significant value for tasks related to evaluating time series complexity and their potential to be accurately predicted.
Collapse
Affiliation(s)
| | - Elizaveta Stavinova
- National Center for Cognitive Research, ITMO University, 16 Birzhevaya Lane, Saint Petersburg 199034, Russia; (A.B.); (A.K.); (P.C.)
| | | | | |
Collapse
|
7
|
Jin S, Hong Y, Zeng L, Jiang Y, Lin Y, Wei L, Yu Z, Zeng X, Liu X. A general hypergraph learning algorithm for drug multi-task predictions in micro-to-macro biomedical networks. PLoS Comput Biol 2023; 19:e1011597. [PMID: 37956212 PMCID: PMC10681315 DOI: 10.1371/journal.pcbi.1011597] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 11/27/2023] [Accepted: 10/13/2023] [Indexed: 11/15/2023] Open
Abstract
The powerful combination of large-scale drug-related interaction networks and deep learning provides new opportunities for accelerating the process of drug discovery. However, chemical structures that play an important role in drug properties and high-order relations that involve a greater number of nodes are not tackled in current biomedical networks. In this study, we present a general hypergraph learning framework, which introduces Drug-Substructures relationship into Molecular interaction Networks to construct the micro-to-macro drug centric heterogeneous network (DSMN), and develop a multi-branches HyperGraph learning model, called HGDrug, for Drug multi-task predictions. HGDrug achieves highly accurate and robust predictions on 4 benchmark tasks (drug-drug, drug-target, drug-disease, and drug-side-effect interactions), outperforming 8 state-of-the-art task specific models and 6 general-purpose conventional models. Experiments analysis verifies the effectiveness and rationality of the HGDrug model architecture as well as the multi-branches setup, and demonstrates that HGDrug is able to capture the relations between drugs associated with the same functional groups. In addition, our proposed drug-substructure interaction networks can help improve the performance of existing network models for drug-related prediction tasks.
Collapse
Affiliation(s)
- Shuting Jin
- School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, China
- School of Informatics, Xiamen University, Xiamen, China
- Department of AIDD, Shanghai Yuyao Biotechnology Co., Ltd., Shanghai, China
| | - Yue Hong
- School of Informatics, Xiamen University, Xiamen, China
| | - Li Zeng
- Department of AIDD, Shanghai Yuyao Biotechnology Co., Ltd., Shanghai, China
| | - Yinghui Jiang
- School of Informatics, Xiamen University, Xiamen, China
| | - Yuan Lin
- School of Economics, Innovation, and Technology, Kristiania University College, Bergen, Norway
| | - Leyi Wei
- School of Software, Shandong University, Shandong, China
| | - Zhuohang Yu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Xiangxiang Zeng
- School of Information Science and Engineering, Hunan University, Hunan, China
| | - Xiangrong Liu
- School of Informatics, Xiamen University, Xiamen, China
- Zhejiang Lab, Hangzhou, China
| |
Collapse
|
8
|
Zhang Z, Fang M, Wu R, Zong H, Huang H, Tong Y, Xie Y, Cheng S, Wei Z, Crabbe MJC, Zhang X, Wang Y. Large-Scale Biomedical Relation Extraction Across Diverse Relation Types: Model Development and Usability Study on COVID-19. J Med Internet Res 2023; 25:e48115. [PMID: 37632414 PMCID: PMC10551783 DOI: 10.2196/48115] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Revised: 07/03/2023] [Accepted: 08/25/2023] [Indexed: 08/28/2023] Open
Abstract
BACKGROUND Biomedical relation extraction (RE) is of great importance for researchers to conduct systematic biomedical studies. It not only helps knowledge mining, such as knowledge graphs and novel knowledge discovery, but also promotes translational applications, such as clinical diagnosis, decision-making, and precision medicine. However, the relations between biomedical entities are complex and diverse, and comprehensive biomedical RE is not yet well established. OBJECTIVE We aimed to investigate and improve large-scale RE with diverse relation types and conduct usability studies with application scenarios to optimize biomedical text mining. METHODS Data sets containing 125 relation types with different entity semantic levels were constructed to evaluate the impact of entity semantic information on RE, and performance analysis was conducted on different model architectures and domain models. This study also proposed a continued pretraining strategy and integrated models with scripts into a tool. Furthermore, this study applied RE to the COVID-19 corpus with article topics and application scenarios of clinical interest to assess and demonstrate its biological interpretability and usability. RESULTS The performance analysis revealed that RE achieves the best performance when the detailed semantic type is provided. For a single model, PubMedBERT with continued pretraining performed the best, with an F1-score of 0.8998. Usability studies on COVID-19 demonstrated the interpretability and usability of RE, and a relation graph database was constructed, which was used to reveal existing and novel drug paths with edge explanations. The models (including pretrained and fine-tuned models), integrated tool (Docker), and generated data (including the COVID-19 relation graph database and drug paths) have been made publicly available to the biomedical text mining community and clinical researchers. CONCLUSIONS This study provided a comprehensive analysis of RE with diverse relation types. Optimized RE models and tools for diverse relation types were developed, which can be widely used in biomedical text mining. Our usability studies provided a proof-of-concept demonstration of how large-scale RE can be leveraged to facilitate novel research.
Collapse
Affiliation(s)
- Zeyu Zhang
- Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
- Department of Clinical Laboratory Medicine Center, Yueyang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Meng Fang
- Department of Laboratory Medicine, Shanghai Eastern Hepatobiliary Surgery Hospital, Shanghai, China
| | - Rebecca Wu
- University of California, Berkeley, Berkeley, CA, United States
| | - Hui Zong
- Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
- Institutes for Systems Genetics, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, China
| | - Honglian Huang
- Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Yuantao Tong
- Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Yujia Xie
- Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Shiyang Cheng
- Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Ziyi Wei
- Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - M James C Crabbe
- Wolfson College, Oxford University, Oxford, United Kingdom
- Institute of Biomedical and Environmental Science & Technology, University of Bedfordshire, Luton, United Kingdom
- School of Life Sciences, Shanxi University, Taiyuan, China
| | - Xiaoyan Zhang
- Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Ying Wang
- Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
- Department of Clinical Laboratory Medicine Center, Yueyang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai University of Traditional Chinese Medicine, Shanghai, China
- Department of Laboratory Medicine, Shanghai Eastern Hepatobiliary Surgery Hospital, Shanghai, China
| |
Collapse
|
9
|
Wang Y, Li Z, Rao J, Yang Y, Dai Z. Gene based message passing for drug repurposing. iScience 2023; 26:107663. [PMID: 37670781 PMCID: PMC10475505 DOI: 10.1016/j.isci.2023.107663] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Revised: 08/06/2023] [Accepted: 08/14/2023] [Indexed: 09/07/2023] Open
Abstract
The medicinal effect of a drug acts through a series of genes, and the pathological mechanism of a disease is also related to genes with certain biological functions. However, the complex information between drug or disease and a series of genes is neglected by traditional message passing methods. In this study, we proposed a new framework using two different strategies for gene-drug/disease and drug-disease networks, respectively. We employ long short-term memory (LSTM) network to extract the flow of message from series of genes (gene path) to drug/disease. Incorporating the resulting information of gene paths into drug-disease network, we utilize graph convolutional network (GCN) to predict drug-disease associations. Experimental results showed that our method GeneDR (gene-based drug repurposing) makes better use of the information in gene paths, and performs better in predicting drug-disease associations.
Collapse
Affiliation(s)
- Yuxing Wang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Zhiyang Li
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Jiahua Rao
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Zhiming Dai
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| |
Collapse
|
10
|
Zhang P, Wang Z, Sun W, Xu J, Zhang W, Wu K, Wong L, Li L. RDRGSE: A Framework for Noncoding RNA-Drug Resistance Discovery by Incorporating Graph Skeleton Extraction and Attentional Feature Fusion. ACS OMEGA 2023; 8:27386-27397. [PMID: 37546619 PMCID: PMC10398708 DOI: 10.1021/acsomega.3c02763] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/22/2023] [Accepted: 07/06/2023] [Indexed: 08/08/2023]
Abstract
Identifying noncoding RNAs (ncRNAs)-drug resistance association computationally would have a marked effect on understanding ncRNA molecular function and drug target mechanisms and alleviating the screening cost of corresponding biological wet experiments. Although graph neural network-based methods have been developed and facilitated the detection of ncRNAs related to drug resistance, it remains a challenge to explore a highly trusty ncRNA-drug resistance association prediction framework, due to inevitable noise edges originating from the batch effect and experimental errors. Herein, we proposed a framework, referred to as RDRGSE (RDR association prediction by using graph skeleton extraction and attentional feature fusion), for detecting ncRNA-drug resistance association. Specifically, starting with the construction of the original ncRNA-drug resistance association as a bipartite graph, RDRGSE took advantage of a bi-view skeleton extraction strategy to obtain two types of skeleton views, followed by a graph neural network-based estimator for iteratively optimizing skeleton views aimed at learning high-quality ncRNA-drug resistance edge embedding and optimal graph skeleton structure, jointly. Then, RDRGSE adopted adaptive attentional feature fusion to obtain final edge embedding and identified potential RDRAs under an end-to-end pattern. Comprehensive experiments were conducted, and experimental results indicated the significant advantage of a skeleton structure for ncRNA-drug resistance association discovery. Compared with state-of-the-art approaches, RDRGSE improved the prediction performance by 6.7% in terms of AUC and 6.1% in terms of AUPR. Also, ablation-like analysis and independent case studies corroborated RDRGSE generalization ability and robustness. Overall, RDRGSE provides a powerful computational method for ncRNA-drug resistance association prediction, which can also serve as a screening tool for drug resistance biomarkers.
Collapse
Affiliation(s)
- Ping Zhang
- Hubei
Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Zilin Wang
- Hubei
Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Weicheng Sun
- Hubei
Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Jinsheng Xu
- Hubei
Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Weihan Zhang
- Hubei
Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Kun Wu
- Department
of Biochemistry, University of California
Riverside, Riverside, California 92521, United States
| | - Leon Wong
- Guangxi
Key Lab of Human-Machine Interaction and Intelligent Decision, Guangxi Academy of Sciences, Nanning 530007, China
- Institute
of Machine Learning and Systems Biology, School of Electronics and
Information Engineering, Tongji University, Shanghai 200092, China
| | - Li Li
- Hubei
Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
- Hubei
Hongshan Laboratory, Huazhong Agricultural
University, Wuhan 430070, China
| |
Collapse
|
11
|
Bang D, Lim S, Lee S, Kim S. Biomedical knowledge graph learning for drug repurposing by extending guilt-by-association to multiple layers. Nat Commun 2023; 14:3570. [PMID: 37322032 PMCID: PMC10272215 DOI: 10.1038/s41467-023-39301-y] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 06/02/2023] [Indexed: 06/17/2023] Open
Abstract
Computational drug repurposing aims to identify new indications for existing drugs by utilizing high-throughput data, often in the form of biomedical knowledge graphs. However, learning on biomedical knowledge graphs can be challenging due to the dominance of genes and a small number of drug and disease entities, resulting in less effective representations. To overcome this challenge, we propose a "semantic multi-layer guilt-by-association" approach that leverages the principle of guilt-by-association - "similar genes share similar functions", at the drug-gene-disease level. Using this approach, our model DREAMwalk: Drug Repurposing through Exploring Associations using Multi-layer random walk uses our semantic information-guided random walk to generate drug and disease-populated node sequences, allowing for effective mapping of both drugs and diseases in a unified embedding space. Compared to state-of-the-art link prediction models, our approach improves drug-disease association prediction accuracy by up to 16.8%. Moreover, exploration of the embedding space reveals a well-aligned harmony between biological and semantic contexts. We demonstrate the effectiveness of our approach through repurposing case studies for breast carcinoma and Alzheimer's disease, highlighting the potential of multi-layer guilt-by-association perspective for drug repurposing on biomedical knowledge graphs.
Collapse
Affiliation(s)
- Dongmin Bang
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Republic of Korea
- AIGENDRUG Co., Ltd., Seoul, 08826, Republic of Korea
| | - Sangsoo Lim
- School of Artificial Intelligence Convergence, Dongguk University, Seoul, 04620, Republic of Korea
| | - Sangseon Lee
- Institute of Computer Technology, Seoul National University, Seoul, 08826, Republic of Korea
| | - Sun Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Republic of Korea.
- AIGENDRUG Co., Ltd., Seoul, 08826, Republic of Korea.
- Department of Computer Science and Engineering, Seoul National University, Seoul, 08826, Republic of Korea.
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul, 08826, Republic of Korea.
| |
Collapse
|
12
|
Temiz M, Bakir-Gungor B, Güner Şahan P, Coskun M. Topological feature generation for link prediction in biological networks. PeerJ 2023; 11:e15313. [PMID: 37187525 PMCID: PMC10178302 DOI: 10.7717/peerj.15313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 04/06/2023] [Indexed: 05/17/2023] Open
Abstract
Graph or network embedding is a powerful method for extracting missing or potential information from interactions between nodes in biological networks. Graph embedding methods learn representations of nodes and interactions in a graph with low-dimensional vectors, which facilitates research to predict potential interactions in networks. However, most graph embedding methods suffer from high computational costs in the form of high computational complexity of the embedding methods and learning times of the classifier, as well as the high dimensionality of complex biological networks. To address these challenges, in this study, we use the Chopper algorithm as an alternative approach to graph embedding, which accelerates the iterative processes and thus reduces the running time of the iterative algorithms for three different (nervous system, blood, heart) undirected protein-protein interaction (PPI) networks. Due to the high dimensionality of the matrix obtained after the embedding process, the data are transformed into a smaller representation by applying feature regularization techniques. We evaluated the performance of the proposed method by comparing it with state-of-the-art methods. Extensive experiments demonstrate that the proposed approach reduces the learning time of the classifier and performs better in link prediction. We have also shown that the proposed embedding method is faster than state-of-the-art methods on three different PPI datasets.
Collapse
Affiliation(s)
- Mustafa Temiz
- Department of Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Burcu Bakir-Gungor
- Department of Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Pınar Güner Şahan
- Department of Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Mustafa Coskun
- Department of Artificial Intelligence and Big Data Engineering, Ankara University, Ankara, Turkey
| |
Collapse
|
13
|
Node Similarity Preserving Graph Convolutional Network Based on Full-frequency Information for Node Classification. Neural Process Lett 2022. [DOI: 10.1007/s11063-022-11094-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
|
14
|
Gu Y, Zheng S, Yin Q, Jiang R, Li J. REDDA: Integrating multiple biological relations to heterogeneous graph neural network for drug-disease association prediction. Comput Biol Med 2022; 150:106127. [PMID: 36182762 DOI: 10.1016/j.compbiomed.2022.106127] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Revised: 07/27/2022] [Accepted: 09/18/2022] [Indexed: 11/03/2022]
Abstract
Computational drug repositioning is an effective way to find new indications for existing drugs, thus can accelerate drug development and reduce experimental costs. Recently, various deep learning-based repurposing methods have been established to identify the potential drug-disease associations (DDA). However, effective utilization of the relations of biological entities to capture the biological interactions to enhance the drug-disease association prediction is still challenging. To resolve the above problem, we proposed a heterogeneous graph neural network called REDDA (Relations-Enhanced Drug-Disease Association prediction). Assembled with three attention mechanisms, REDDA can sequentially learn drug/disease representations by a general heterogeneous graph convolutional network-based node embedding block, a topological subnet embedding block, a graph attention block, and a layer attention block. Performance comparisons on our proposed benchmark dataset show that REDDA outperforms 8 advanced drug-disease association prediction methods, achieving relative improvements of 0.76% on the area under the receiver operating characteristic curve (AUC) score and 13.92% on the precision-recall curve (AUPR) score compared to the suboptimal method. On the other benchmark dataset, REDDA also obtains relative improvements of 2.48% on the AUC score and 4.93% on the AUPR score. Specifically, case studies also indicate that REDDA can give valid predictions for the discovery of -new indications for drugs and new therapies for diseases. The overall results provide an inspiring potential for REDDA in the in silico drug development. The proposed benchmark dataset and source code are available in https://github.com/gu-yaowen/REDDA.
Collapse
Affiliation(s)
- Yaowen Gu
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, 100020, China
| | - Si Zheng
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, 100020, China; Institute for Artificial Intelligence, Department of Computer Science and Technology, BNRist, Tsinghua University, Beijing, 100084, China
| | - Qijin Yin
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Jiao Li
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, 100020, China.
| |
Collapse
|
15
|
Domingo-Fernández D, Gadiya Y, Patel A, Mubeen S, Rivas-Barragan D, Diana CW, Misra BB, Healey D, Rokicki J, Colluru V. Causal reasoning over knowledge graphs leveraging drug-perturbed and disease-specific transcriptomic signatures for drug discovery. PLoS Comput Biol 2022; 18:e1009909. [PMID: 35213534 PMCID: PMC8906585 DOI: 10.1371/journal.pcbi.1009909] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 03/09/2022] [Accepted: 02/09/2022] [Indexed: 12/29/2022] Open
Abstract
Network-based approaches are becoming increasingly popular for drug discovery as they provide a systems-level overview of the mechanisms underlying disease pathophysiology. They have demonstrated significant early promise over other methods of biological data representation, such as in target discovery, side effect prediction and drug repurposing. In parallel, an explosion of -omics data for the deep characterization of biological systems routinely uncovers molecular signatures of disease for similar applications. Here, we present RPath, a novel algorithm that prioritizes drugs for a given disease by reasoning over causal paths in a knowledge graph (KG), guided by both drug-perturbed as well as disease-specific transcriptomic signatures. First, our approach identifies the causal paths that connect a drug to a particular disease. Next, it reasons over these paths to identify those that correlate with the transcriptional signatures observed in a drug-perturbation experiment, and anti-correlate to signatures observed in the disease of interest. The paths which match this signature profile are then proposed to represent the mechanism of action of the drug. We demonstrate how RPath consistently prioritizes clinically investigated drug-disease pairs on multiple datasets and KGs, achieving better performance over other similar methodologies. Furthermore, we present two case studies showing how one can deconvolute the predictions made by RPath as well as predict novel targets.
Collapse
Affiliation(s)
| | - Yojana Gadiya
- Enveda Biosciences, Boulder, Colorado, United States of America
| | - Abhishek Patel
- Enveda Biosciences, Boulder, Colorado, United States of America
| | - Sarah Mubeen
- Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | | | - Chris W. Diana
- Enveda Biosciences, Boulder, Colorado, United States of America
| | | | - David Healey
- Enveda Biosciences, Boulder, Colorado, United States of America
| | - Joe Rokicki
- Enveda Biosciences, Boulder, Colorado, United States of America
| | - Viswa Colluru
- Enveda Biosciences, Boulder, Colorado, United States of America
| |
Collapse
|
16
|
Abstract
Summary Computational methods to predict protein–protein interaction (PPI) typically segregate into sequence-based ‘bottom-up’ methods that infer properties from the characteristics of the individual protein sequences, or global ‘top-down’ methods that infer properties from the pattern of already known PPIs in the species of interest. However, a way to incorporate top-down insights into sequence-based bottom-up PPI prediction methods has been elusive. We thus introduce Topsy-Turvy, a method that newly synthesizes both views in a sequence-based, multi-scale, deep-learning model for PPI prediction. While Topsy-Turvy makes predictions using only sequence data, during the training phase it takes a transfer-learning approach by incorporating patterns from both global and molecular-level views of protein interaction. In a cross-species context, we show it achieves state-of-the-art performance, offering the ability to perform genome-scale, interpretable PPI prediction for non-model organisms with no existing experimental PPI data. In species with available experimental PPI data, we further present a Topsy-Turvy hybrid (TT-Hybrid) model which integrates Topsy-Turvy with a purely network-based model for link prediction that provides information about species-specific network rewiring. TT-Hybrid makes accurate predictions for both well- and sparsely-characterized proteins, outperforming both its constituent components as well as other state-of-the-art PPI prediction methods. Furthermore, running Topsy-Turvy and TT-Hybrid screens is feasible for whole genomes, and thus these methods scale to settings where other methods (e.g. AlphaFold-Multimer) might be infeasible. The generalizability, accuracy and genome-level scalability of Topsy-Turvy and TT-Hybrid unlocks a more comprehensive map of protein interaction and organization in both model and non-model organisms. Availability and implementation https://topsyturvy.csail.mit.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Samuel Sledzieski
- Computer Science and Artificial Intelligence Lab., Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Bonnie Berger
- To whom correspondence should be addressed. E-mail: or
| | - Lenore Cowen
- To whom correspondence should be addressed. E-mail: or
| |
Collapse
|