1
|
Shi H, Hu J, Zhang X, Jin S, Xu X. Prediction of drug-target interactions based on substructure subsequences and cross-public attention mechanism. PLoS One 2025; 20:e0324146. [PMID: 40445972 PMCID: PMC12124583 DOI: 10.1371/journal.pone.0324146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2024] [Accepted: 04/22/2025] [Indexed: 06/02/2025] Open
Abstract
Drug-target interactions (DTIs) play a critical role in drug discovery and repurposing. Deep learning-based methods for predicting drug-target interactions are more efficient than wet-lab experiments. The extraction of original and substructural features from drugs and proteins plays a key role in enhancing the accuracy of DTI predictions, while the integration of multi-feature information and effective representation of interaction data also impact the precision of DTI forecasts. Consequently, we propose a drug-target interaction prediction model, SSCPA-DTI, based on substructural subsequences and a cross co-attention mechanism. We use drug SMILES sequences and protein sequences as inputs for the model, employing a Multi-feature information mining module (MIMM) to extract original and substructural features of DTIs. Substructural information provides detailed insights into molecular local structures, while original features enhance the model's understanding of the overall molecular architecture. Subsequently, a Cross-public attention module (CPA) is utilized to first integrate the extracted original and substructural features, then to extract interaction information between the protein and drug, addressing issues such as insufficient accuracy and weak interpretability arising from mere concatenation without interactive integration of feature information. We conducted experiments on three public datasets and demonstrated superior performance compared to baseline models.
Collapse
Affiliation(s)
- Haikuo Shi
- Jing Hu, School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, Hubei, China
| | - Jing Hu
- Jing Hu, School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, Hubei, China
| | - Xiaolong Zhang
- Jing Hu, School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, Hubei, China
| | - Shuting Jin
- Jing Hu, School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, Hubei, China
| | - Xin Xu
- Jing Hu, School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, Hubei, China
| |
Collapse
|
2
|
Wei Z, Wang Z, Tang C. Dynamic Prediction of Drug-Target Interactions via Cross-Modal Feature Mapping with Learnable Association Information. J Chem Inf Model 2025; 65:3915-3927. [PMID: 40227648 DOI: 10.1021/acs.jcim.4c02348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/15/2025]
Abstract
Predicting drug-target interactions (DTIs) is essential for advancing drug discovery and personalized medicine. However, accurately capturing the intricate binding relationships between drugs and targets remains a significant challenge, particularly when attempting to fully leverage the vast correlation information inherent in molecular data. This complexity is further exacerbated by the structural differences and sequence length disparities between drug molecules and protein targets, which can hinder effective feature alignment and interaction modeling. To address these challenges, we propose a model named LAM-DTI. First, drug and target features are extracted from the original molecular sequence data using a multilayer convolutional neural network. To address the sequence length discrepancy between drug and target features, we apply a connectionist temporal classification module to generate normalized feature sequences. Building on this, we introduce a learnable association information matrix as a flexible intermediary, which dynamically adjusts to capture accurate DTI association information, thereby enhancing cross-modal mapping within a unified latent space. This progressive mapping strategy enables the model to form an interaction projection between drugs and targets, effectively identifying critical interaction regions and guiding the capture of complex interaction-related features. Extensive experiments on three well-known benchmark data sets demonstrate that LAM-DTI significantly outperforms previous models.
Collapse
Affiliation(s)
- Ziyu Wei
- School of Computer Science, China University of Geosciences, Wuhan 430074, China
| | - Zhengyu Wang
- Office of the Drug Clinical Trials Agency, The Affiliated Huai'an Hospital of Xuzhou Medical University and The Second People's Hospital of Huai'an, Huai'an 223002, China
| | - Chang Tang
- School of Computer Science, China University of Geosciences, Wuhan 430074, China
| |
Collapse
|
3
|
Lin B, Yan S, Zhen B. A machine learning method for predicting molecular antimicrobial activity. Sci Rep 2025; 15:6559. [PMID: 39994442 PMCID: PMC11850884 DOI: 10.1038/s41598-025-91190-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Accepted: 02/18/2025] [Indexed: 02/26/2025] Open
Abstract
In response to the increasing concern over antibiotic resistance and the limitations of traditional methods in antibiotic discovery, we introduce a machine learning-based method named MFAGCN. This method predicts the antimicrobial efficacy of molecules by integrating three types of molecular fingerprints-MACCS, PubChem, and ECFP-along with molecular graph representations as input features, with a specific focus on molecular functional groups. MFAGCN incorporates an attention mechanism to assign different weights to the importance of information from different neighboring nodes. Comparative experiments with baseline models on two public datasets demonstrate MFAGCN's superior performance. Additionally, we conducted an analysis of the functional group distribution in both the training and test sets to validate the model's predictions. Furthermore, structural similarity analyses with known antibiotics are performed to prevent the rediscovery of established antibiotics. This approach enables researchers to rapidly screen molecules with potent antimicrobial properties and facilitates the identification of functional groups that influence antimicrobial performance, providing valuable insights for further antibiotic development.
Collapse
Affiliation(s)
- Bangjiang Lin
- Quanzhou Institute of Equipment Manufacturing, Haixi Institutes, Chinese Academy of Sciences, Quanzhou, 362216, China.
- College of Electrical Engineering and Automation, Fuzhou University, Fuzhou, 350108, China.
| | - Shujie Yan
- Quanzhou Institute of Equipment Manufacturing, Haixi Institutes, Chinese Academy of Sciences, Quanzhou, 362216, China
- College of Electrical Engineering and Automation, Fuzhou University, Fuzhou, 350108, China
| | - Bowen Zhen
- Quanzhou Institute of Equipment Manufacturing, Haixi Institutes, Chinese Academy of Sciences, Quanzhou, 362216, China
- College of Electrical Engineering and Automation, Fuzhou University, Fuzhou, 350108, China
| |
Collapse
|
4
|
Dong J, Liu X, Su R, Xu H, Yu T. TCN-Transformer Deep Network with Random Forest for Prediction of the Chemical Synthetic Ammonia Process. ACS OMEGA 2025; 10:2269-2279. [PMID: 39866626 PMCID: PMC11755154 DOI: 10.1021/acsomega.4c09634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/22/2024] [Revised: 12/11/2024] [Accepted: 12/23/2024] [Indexed: 01/28/2025]
Abstract
It is of great significance to realize the accurate prediction of the key output response of the chemical synthetic ammonia process for optimizing system performance and operation monitoring. Because many key intermediate variables of complex systems are difficult to measure comprehensively, there are great difficulties and errors in mechanism analysis and identification modeling techniques. Based on random forest (RF) variable selection, a deep neural network combining temporal convolutional network (TCN) and transformer is proposed to predict the output variables of the synthetic ammonia process. The RF technique is used to select the principal input variables to increase the computational efficiency and the generalization ability of the network. A self-attention mechanism is used to assign biased weights to the data of the key feature variables. A TCN-Transformer network with encoding and decoding techniques is first designed to enhance the correlation of information between variable data, which can extract features of input variables and achieve dynamic modeling of multivariate feature sequences. The network is optimized using a multihead attention mechanism, and the key features are enhanced by probabilistic weight assignment to improve the prediction accuracy. Finally, by comparing with existing methods, the merit and applicability of the proposed network, R 2 = 0.8233, RMSE = 0.0032, and MAE = 0.0024, are verified for predicting the key output of carbon monoxide using offline data generated.
Collapse
Affiliation(s)
- Jianguo Dong
- School
of Automation, Southeast University, Nanjing 210000, China
| | - Xiaona Liu
- School
of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Ruixian Su
- School
of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Huimin Xu
- School
of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Tianyu Yu
- School
of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| |
Collapse
|
5
|
Xie Y, Wang X, Wang P, Bi X. A pseudo-label supervised graph fusion attention network for drug–target interaction prediction. EXPERT SYSTEMS WITH APPLICATIONS 2025; 259:125264. [DOI: 10.1016/j.eswa.2024.125264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
6
|
Zeng X, Feng PK, Li SJ, Lv SQ, Wen ML, Li Y. GNN-DDAS: Drug discovery for identifying anti-schistosome small molecules based on graph neural network. J Comput Chem 2024; 45:2825-2834. [PMID: 39189298 DOI: 10.1002/jcc.27490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Revised: 08/06/2024] [Accepted: 08/09/2024] [Indexed: 08/28/2024]
Abstract
Schistosomiasis is a tropical disease that poses a significant risk to hundreds of millions of people, yet often goes unnoticed. While praziquantel, a widely used anti-schistosome drug, has a low cost and a high cure rate, it has several drawbacks. These include ineffectiveness against schistosome larvae, reduced efficacy in young children, and emerging drug resistance. Discovering new and active anti-schistosome small molecules is therefore critical, but this process presents the challenge of low accuracy in computer-aided methods. To address this issue, we proposed GNN-DDAS, a novel deep learning framework based on graph neural networks (GNN), designed for drug discovery to identify active anti-schistosome (DDAS) small molecules. Initially, a multi-layer perceptron was used to derive sequence features from various representations of small molecule SMILES. Next, GNN was employed to extract structural features from molecular graphs. Finally, the extracted sequence and structural features were then concatenated and fed into a fully connected network to predict active anti-schistosome small molecules. Experimental results showed that GNN-DDAS exhibited superior performance compared to the benchmark methods on both benchmark and real-world application datasets. Additionally, the use of GNNExplainer model allowed us to analyze the key substructure features of small molecules, providing insight into the effectiveness of GNN-DDAS. Overall, GNN-DDAS provided a promising solution for discovering new and active anti-schistosome small molecules.
Collapse
Affiliation(s)
- Xin Zeng
- College of Mathematics and Computer Science, Dali University, Dali, China
| | - Peng-Kun Feng
- College of Mathematics and Computer Science, Dali University, Dali, China
| | - Shu-Juan Li
- Department of Endemic Diseases, Yunnan Institute of Endemic Diseases Control and Prevention, Dali, China
| | - Shuang-Qing Lv
- Institute of Surveying and Information Engineering, West Yunnan University of Applied Science, Dali, China
| | - Meng-Liang Wen
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, Kunming, China
| | - Yi Li
- College of Mathematics and Computer Science, Dali University, Dali, China
| |
Collapse
|
7
|
Kwon H, Du Z, Li Y. AlphaFold 2-based stacking model for protein solubility prediction and its transferability on seed storage proteins. Int J Biol Macromol 2024; 278:134601. [PMID: 39137857 DOI: 10.1016/j.ijbiomac.2024.134601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Revised: 07/29/2024] [Accepted: 08/07/2024] [Indexed: 08/15/2024]
Abstract
Accurate protein solubility prediction is crucial in screening suitable candidates for food application. Existing models often rely only on sequences, overlooking important structural details. In this study, a regression model for protein solubility was developed using both the sequences and predicted structures of 2983 E. coli proteins. The sequence and structural level properties of the proteins were bioinformatically extracted and subjected to multilayer perceptron (MLP). Moreover, residue level features and contact maps were utilized to construct a graph convolutional network (GCN). The out-of-fold predictions of the two models were combined and fed into multiple meta-regressors to create a stacking model. The stacking model with support vector regressor (SVR) achieved R2 of 0.502 and 0.468 on test and external validation datasets, respectively, displaying higher performance compared to existing regression models. Based on the improved performance compared to its based models, the stacking model effectively captured the strength of its base models as well as the significance of the different features used. Furthermore, the model's transferability was indirectly validated on a dataset of seed storage proteins using Osborne definition as well as on a case study using molecular dynamic simulation, showing potential for application beyond microbial proteins to food and agriculture-related ones.
Collapse
Affiliation(s)
- Hyukjin Kwon
- Department of Grain Science and Industry, Kansas State University, Manhattan, KS 66506, USA
| | - Zhenjiao Du
- Department of Grain Science and Industry, Kansas State University, Manhattan, KS 66506, USA
| | - Yonghui Li
- Department of Grain Science and Industry, Kansas State University, Manhattan, KS 66506, USA.
| |
Collapse
|
8
|
Ahmed KT, Ansari MI, Zhang W. DTI-LM: language model powered drug-target interaction prediction. Bioinformatics 2024; 40:btae533. [PMID: 39221997 PMCID: PMC11520403 DOI: 10.1093/bioinformatics/btae533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Revised: 08/05/2024] [Accepted: 08/29/2024] [Indexed: 09/04/2024] Open
Abstract
MOTIVATION The identification and understanding of drug-target interactions (DTIs) play a pivotal role in the drug discovery and development process. Sequence representations of drugs and proteins in computational model offer advantages such as their widespread availability, easier input quality control, and reduced computational resource requirements. These make them an efficient and accessible tools for various computational biology and drug discovery applications. Many sequence-based DTI prediction methods have been developed over the years. Despite the advancement in methodology, cold start DTI prediction involving unknown drug or protein remains a challenging task, particularly for sequence-based models. Introducing DTI-LM, a novel framework leveraging advanced pretrained language models, we harness their exceptional context-capturing abilities along with neighborhood information to predict DTIs. DTI-LM is specifically designed to rely solely on sequence representations for drugs and proteins, aiming to bridge the gap between warm start and cold start predictions. RESULTS Large-scale experiments on four datasets show that DTI-LM can achieve state-of-the-art performance on DTI predictions. Notably, it excels in overcoming the common challenges faced by sequence-based models in cold start predictions for proteins, yielding impressive results. The incorporation of neighborhood information through a graph attention network further enhances prediction accuracy. Nevertheless, a disparity persists between cold start predictions for proteins and drugs. A detailed examination of DTI-LM reveals that language models exhibit contrasting capabilities in capturing similarities between drugs and proteins. AVAILABILITY AND IMPLEMENTATION Source code is available at: https://github.com/compbiolabucf/DTI-LM.
Collapse
Affiliation(s)
- Khandakar Tanvir Ahmed
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States
- Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, FL 32816, United States
| | - Md Istiaq Ansari
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States
- Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, FL 32816, United States
| | - Wei Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States
- Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, FL 32816, United States
| |
Collapse
|
9
|
Liu M, Yang Y, Liu Q, Liu L, Wang G. A Knowledge-Driven Self-Supervised Approach for Molecular Generation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1579-1590. [PMID: 38805329 DOI: 10.1109/tcbb.2024.3406600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2024]
Abstract
Due to the great successes of Graph Neural Networks (GNN) in numerous fields, growing research interests have been devoted to applying GNN to molecular learning tasks. The molecule structure can be naturally represented as graphs where atoms and bonds refer to nodes and edges respectively. However, the atoms are not haphazardly stacked together but combined into various spatial geometries. Meanwhile, since chemical reactions mainly occur in substructures such as functional groups, the substructure plays a decisive role in the molecule's properties. Therefore, directly applying GNN to molecular representation learning could ignore the molecular spatial structure and the substructure properties which in turn degrades the performance of downstream tasks. In this paper, we propose Knowledge-Driven Self-Supervised Model for Molecular Representation Learning (KSMRL) to address above problems. The KSMRL consists of two major pathways: (1) the Spatial Information (SI) based pathway which preserves the spatial information of molecular structure, (2) the Subgraph Constraint (SC) based pathway which retains the properties of substructures into the molecular representation. In this manner, both the atomic level and substructure level information can be included in modeling. According to the experimental results on multiple datasets, the proposed KSMRL can generate discriminative molecular representations. In molecular generation tasks, KSMRL combined with Autoregressive Flow (AF) models or Discrete Flow (DF) models outperforms the state-of-the-art baselines over all datasets. In addition, we demonstrate the effectiveness of KSMRL with property optimization experiments. To indicate the ability of predicting specified potential Drug-Target Interactions (DTIs), a case study for discriminating the interactions between molecule generated by KSMRL and targets is also given.
Collapse
|
10
|
Cheng X, Yang X, Guan Y, Feng Y. ERT-GFAN: A multimodal drug-target interaction prediction model based on molecular biology and knowledge-enhanced attention mechanism. Comput Biol Med 2024; 180:109012. [PMID: 39153394 DOI: 10.1016/j.compbiomed.2024.109012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2024] [Revised: 08/06/2024] [Accepted: 08/07/2024] [Indexed: 08/19/2024]
Abstract
In drug discovery, precisely identifying drug-target interactions is crucial for finding new drugs and understanding drug mechanisms. Evolving drug/target heterogeneous data presents challenges in obtaining multimodal representation in drug-target prediction(DTI). To deal with this, we propose 'ERT-GFAN', a multimodal drug-target interaction prediction model inspired by molecular biology. Firstly, it integrates bio-inspired principles to obtain structure feature of drugs and targets using Extended Connectivity Fingerprints(ECFP). Simultaneously, the knowledge graph embedding model RotatE is employed to discover the interaction feature of drug-target pairs. Subsequently, Transformer is utilized to refine the contextual neighborhood features from the obtained structure feature and interaction features, and multi-modal high-dimensional fusion features of the three-modal information constructed. Finally, the final DTI prediction results are outputted by integrating the multimodal fusion features into a graphical high-dimensional fusion feature attention network (GFAN) using our innovative multimodal high-dimensional fusion feature attention. This multimodal approach offers a comprehensive understanding of drug-target interactions, addressing challenges in complex knowledge graphs. By combining structure feature, interaction feature, and contextual neighborhood features, 'ERT-GFAN' excels in predicting DTI. Empirical evaluations on three datasets demonstrate our method's superior performance, with AUC of 0.9739, 0.9862, and 0.9667, AUPR of 0.9598, 0.9789, and 0.9750, and Mean Reciprocal Rank(MRR) of 0.7386, 0.7035, and 0.7133. Ablation studies show over a 5% improvement in predictive performance compared to baseline unimodal and bimodal models. These results, along with detailed case studies, highlight the efficacy and robustness of our approach.
Collapse
Affiliation(s)
- Xiaoqing Cheng
- College of Computer Science and Technology, Qingdao University, Qingdao, 266071, China.
| | - Xixin Yang
- College of Computer Science and Technology, Qingdao University, Qingdao, 266071, China; School of Automation, Qingdao University, Qingdao, 266071, China.
| | - Yuanlin Guan
- School of Mechanical and Automotive Engineering, Qingdao University of Technology, Qingdao, 266071, China; Key Lab of Industrial Fluid Energy Conservation and Pollution Control, Ministry of Education, Qingdao University of Technology, Qingdao, 266071, China
| | - Yihan Feng
- College of Computer Science and Technology, Qingdao University, Qingdao, 266071, China
| |
Collapse
|
11
|
Zhang B, Niu D, Zhang L, Zhang Q, Li Z. MSH-DTI: multi-graph convolution with self-supervised embedding and heterogeneous aggregation for drug-target interaction prediction. BMC Bioinformatics 2024; 25:275. [PMID: 39179993 PMCID: PMC11342675 DOI: 10.1186/s12859-024-05904-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Accepted: 08/16/2024] [Indexed: 08/26/2024] Open
Abstract
BACKGROUND The rise of network pharmacology has led to the widespread use of network-based computational methods in predicting drug target interaction (DTI). However, existing DTI prediction models typically rely on a limited amount of data to extract drug and target features, potentially affecting the comprehensiveness and robustness of features. In addition, although multiple networks are used for DTI prediction, the integration of heterogeneous information often involves simplistic aggregation and attention mechanisms, which may impose certain limitations. RESULTS MSH-DTI, a deep learning model for predicting drug-target interactions, is proposed in this paper. The model uses self-supervised learning methods to obtain drug and target structure features. A Heterogeneous Interaction-enhanced Feature Fusion Module is designed for multi-graph construction, and the graph convolutional networks are used to extract node features. With the help of an attention mechanism, the model focuses on the important parts of different features for prediction. Experimental results show that the AUROC and AUPR of MSH-DTI are 0.9620 and 0.9605 respectively, outperforming other models on the DTINet dataset. CONCLUSION The proposed MSH-DTI is a helpful tool to discover drug-target interactions, which is also validated through case studies in predicting new DTIs.
Collapse
Affiliation(s)
- Beiyi Zhang
- College of Computer Science and Technology, Qingdao University, Ningxia Road, Qingdao, 266071, Shandong, China
| | - Dongjiang Niu
- College of Computer Science and Technology, Qingdao University, Ningxia Road, Qingdao, 266071, Shandong, China
| | - Lianwei Zhang
- College of Computer Science and Technology, Qingdao University, Ningxia Road, Qingdao, 266071, Shandong, China
| | - Qiang Zhang
- College of Computer Science and Technology, Qingdao University, Ningxia Road, Qingdao, 266071, Shandong, China
| | - Zhen Li
- College of Computer Science and Technology, Qingdao University, Ningxia Road, Qingdao, 266071, Shandong, China.
| |
Collapse
|
12
|
Wang X, Zhang S, Chen Y, He L, Ren Y, Zhang Z, Li J, Zhang S. Air quality forecasting using a spatiotemporal hybrid deep learning model based on VMD-GAT-BiLSTM. Sci Rep 2024; 14:17841. [PMID: 39090177 PMCID: PMC11294351 DOI: 10.1038/s41598-024-68874-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Accepted: 07/29/2024] [Indexed: 08/04/2024] Open
Abstract
The precise forecasting of air quality is of great significance as an integral component of early warning systems. This remains a formidable challenge owing to the limited information of emission source and the considerable uncertainties inherent in dynamic processes. To improve the accuracy of air quality forecasting, this work proposes a new spatiotemporal hybrid deep learning model based on variational mode decomposition (VMD), graph attention networks (GAT) and bi-directional long short-term memory (BiLSTM), referred to as VMD-GAT-BiLSTM, for air quality forecasting. The proposed model initially employ a VMD to decompose original PM2.5 data into a series of relatively stable sub-sequences, thus reducing the influence of unknown factors on model prediction capabilities. For each sub-sequence, a GAT is then designed to explore deep spatial relationships among different monitoring stations. Next, a BiLSTM is utilized to learn the temporal features of each decomposed sub-sequence. Finally, the forecasting results of each decomposed sub-sequence are aggregated and summed as the final air quality prediction results. Experiment results on the collected Beijing air quality dataset show that the proposed model presents superior performance to other used methods on both short-term and long-term air quality forecasting tasks.
Collapse
Affiliation(s)
- Xiaohu Wang
- School of Intelligent Manufacturing and Mechanical Engineering, Hunan Institute of Technology, Hengyang, 421002, Hunan, China
| | - Suo Zhang
- School of Intelligent Manufacturing and Mechanical Engineering, Hunan Institute of Technology, Hengyang, 421002, Hunan, China
| | - Yi Chen
- School of Intelligent Manufacturing and Mechanical Engineering, Hunan Institute of Technology, Hengyang, 421002, Hunan, China
| | - Longying He
- School of Intelligent Manufacturing and Mechanical Engineering, Hunan Institute of Technology, Hengyang, 421002, Hunan, China
| | - Yongmei Ren
- School of Electrical and Information Engineering, Hunan Institute of Technology, Hengyang, 421002, Hunan, China
| | - Zhen Zhang
- School of Civil Engineering and Architecture, Taizhou University, Taizhou, 318000, Zhejiang, China
| | - Juan Li
- Taizhou Vocational College of Science and Technology, Taizhou, 318000, Zhejiang, China
| | - Shiqing Zhang
- Institute of Intelligent Information Processing, Taizhou University, Taizhou, 318000, Zhejiang, China.
| |
Collapse
|
13
|
Lavecchia A. Advancing drug discovery with deep attention neural networks. Drug Discov Today 2024; 29:104067. [PMID: 38925473 DOI: 10.1016/j.drudis.2024.104067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 06/10/2024] [Accepted: 06/19/2024] [Indexed: 06/28/2024]
Abstract
In the dynamic field of drug discovery, deep attention neural networks are revolutionizing our approach to complex data. This review explores the attention mechanism and its extended architectures, including graph attention networks (GATs), transformers, bidirectional encoder representations from transformers (BERT), generative pre-trained transformers (GPTs) and bidirectional and auto-regressive transformers (BART). Delving into their core principles and multifaceted applications, we uncover their pivotal roles in catalyzing de novo drug design, predicting intricate molecular properties and deciphering elusive drug-target interactions. Despite challenges, these attention-based architectures hold unparalleled promise to drive transformative breakthroughs and accelerate progress in pharmaceutical research.
Collapse
Affiliation(s)
- Antonio Lavecchia
- Drug Discovery Laboratory, Department of Pharmacy, University of Napoli Federico II, I-80131 Naples, Italy.
| |
Collapse
|
14
|
Choudhary A, Thiels CA, Salehinejad H. Graph Representation of Postoperative Patients for Opioids Refill Prediction: A Real-World Case Study. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2024; 2024:1-4. [PMID: 40039019 DOI: 10.1109/embc53108.2024.10781606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
Increased awareness of the opioid epidemic has resulted in the need to significantly reduce the number of opioids prescribed after surgery. However, up to one in five patients require a refill after discharge. Accurate identification of patients at risk of needing a refill after surgery is critically important, as it has the potential to improve pain control and patient experience while avoiding overprescription of opioids after surgery. In this paper, two graph representation learning methods are proposed for predicting opioid refills in postoperative patients. The first approach represents patients as nodes in a graph and performs node classification. The second approach is based on graph classification where each patient is represented as a graph. Performance results on a real-world retrospective cohort of postoperative patients show that a node classification approach with graph sample and aggregation (GraphSAGE) achieves the best performance in prediction of opioid refill.
Collapse
|
15
|
Bian J, Lu H, Dong G, Wang G. Hierarchical multimodal self-attention-based graph neural network for DTI prediction. Brief Bioinform 2024; 25:bbae293. [PMID: 38920341 PMCID: PMC11200190 DOI: 10.1093/bib/bbae293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2024] [Revised: 05/17/2024] [Accepted: 06/06/2024] [Indexed: 06/27/2024] Open
Abstract
Drug-target interactions (DTIs) are a key part of drug development process and their accurate and efficient prediction can significantly boost development efficiency and reduce development time. Recent years have witnessed the rapid advancement of deep learning, resulting in an abundance of deep learning-based models for DTI prediction. However, most of these models used a single representation of drugs and proteins, making it difficult to comprehensively represent their characteristics. Multimodal data fusion can effectively compensate for the limitations of single-modal data. However, existing multimodal models for DTI prediction do not take into account both intra- and inter-modal interactions simultaneously, resulting in limited presentation capabilities of fused features and a reduction in DTI prediction accuracy. A hierarchical multimodal self-attention-based graph neural network for DTI prediction, called HMSA-DTI, is proposed to address multimodal feature fusion. Our proposed HMSA-DTI takes drug SMILES, drug molecular graphs, protein sequences and protein 2-mer sequences as inputs, and utilizes a hierarchical multimodal self-attention mechanism to achieve deep fusion of multimodal features of drugs and proteins, enabling the capture of intra- and inter-modal interactions between drugs and proteins. It is demonstrated that our proposed HMSA-DTI has significant advantages over other baseline methods on multiple evaluation metrics across five benchmark datasets.
Collapse
Affiliation(s)
- Jilong Bian
- College of Computer and Control Engineering, Northeast Forestry University, No. 26 Hexing Road, Xiangfang District, Harbin, Heilongjiang 150040, China
| | - Hao Lu
- College of Computer and Control Engineering, Northeast Forestry University, No. 26 Hexing Road, Xiangfang District, Harbin, Heilongjiang 150040, China
| | - Guanghui Dong
- College of Computer and Control Engineering, Northeast Forestry University, No. 26 Hexing Road, Xiangfang District, Harbin, Heilongjiang 150040, China
| | - Guohua Wang
- College of Computer and Control Engineering, Northeast Forestry University, No. 26 Hexing Road, Xiangfang District, Harbin, Heilongjiang 150040, China
| |
Collapse
|
16
|
Li J, Sun L, Liu L, Li Z. MIFAM-DTI: a drug-target interactions predicting model based on multi-source information fusion and attention mechanism. Front Genet 2024; 15:1381997. [PMID: 38770418 PMCID: PMC11102998 DOI: 10.3389/fgene.2024.1381997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Accepted: 04/15/2024] [Indexed: 05/22/2024] Open
Abstract
Accurate identification of potential drug-target pairs is a crucial step in drug development and drug repositioning, which is characterized by the ability of the drug to bind to and modulate the activity of the target molecule, resulting in the desired therapeutic effect. As machine learning and deep learning technologies advance, an increasing number of models are being engaged for the prediction of drug-target interactions. However, there is still a great challenge to improve the accuracy and efficiency of predicting. In this study, we proposed a deep learning method called Multi-source Information Fusion and Attention Mechanism for Drug-Target Interaction (MIFAM-DTI) to predict drug-target interactions. Firstly, the physicochemical property feature vector and the Molecular ACCess System molecular fingerprint feature vector of a drug were extracted based on its SMILES sequence. The dipeptide composition feature vector and the Evolutionary Scale Modeling -1b feature vector of a target were constructed based on its amino acid sequence information. Secondly, the PCA method was employed to reduce the dimensionality of the four feature vectors, and the adjacency matrices were constructed by calculating the cosine similarity. Thirdly, the two feature vectors of each drug were concatenated and the two adjacency matrices were subjected to a logical OR operation. And then they were fed into a model composed of graph attention network and multi-head self-attention to obtain the final drug feature vectors. With the same method, the final target feature vectors were obtained. Finally, these final feature vectors were concatenated, which served as the input to a fully connected layer, resulting in the prediction output. MIFAM-DTI not only integrated multi-source information to capture the drug and target features more comprehensively, but also utilized the graph attention network and multi-head self-attention to autonomously learn attention weights and more comprehensively capture information in sequence data. Experimental results demonstrated that MIFAM-DTI outperformed state-of-the-art methods in terms of AUC and AUPR. Case study results of coenzymes involved in cellular energy metabolism also demonstrated the effectiveness and practicality of MIFAM-DTI. The source code and experimental data for MIFAM-DTI are available at https://github.com/Search-AB/MIFAM-DTI.
Collapse
Affiliation(s)
- Jianwei Li
- Institute of Computational Medicine, School of Artificial Intelligence, Hebei University of Technology, Tianjin, China
| | | | | | | |
Collapse
|
17
|
Zeng X, Li SJ, Lv SQ, Wen ML, Li Y. A comprehensive review of the recent advances on predicting drug-target affinity based on deep learning. Front Pharmacol 2024; 15:1375522. [PMID: 38628639 PMCID: PMC11019008 DOI: 10.3389/fphar.2024.1375522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 03/21/2024] [Indexed: 04/19/2024] Open
Abstract
Accurate calculation of drug-target affinity (DTA) is crucial for various applications in the pharmaceutical industry, including drug screening, design, and repurposing. However, traditional machine learning methods for calculating DTA often lack accuracy, posing a significant challenge in accurately predicting DTA. Fortunately, deep learning has emerged as a promising approach in computational biology, leading to the development of various deep learning-based methods for DTA prediction. To support researchers in developing novel and highly precision methods, we have provided a comprehensive review of recent advances in predicting DTA using deep learning. We firstly conducted a statistical analysis of commonly used public datasets, providing essential information and introducing the used fields of these datasets. We further explored the common representations of sequences and structures of drugs and targets. These analyses served as the foundation for constructing DTA prediction methods based on deep learning. Next, we focused on explaining how deep learning models, such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Transformer, and Graph Neural Networks (GNNs), were effectively employed in specific DTA prediction methods. We highlighted the unique advantages and applications of these models in the context of DTA prediction. Finally, we conducted a performance analysis of multiple state-of-the-art methods for predicting DTA based on deep learning. The comprehensive review aimed to help researchers understand the shortcomings and advantages of existing methods, and further develop high-precision DTA prediction tool to promote the development of drug discovery.
Collapse
Affiliation(s)
- Xin Zeng
- College of Mathematics and Computer Science, Dali University, Dali, China
| | - Shu-Juan Li
- Yunnan Institute of Endemic Diseases Control and Prevention, Dali, China
| | - Shuang-Qing Lv
- Institute of Surveying and Information Engineering West Yunnan University of Applied Science, Dali, China
| | - Meng-Liang Wen
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, Kunming, China
| | - Yi Li
- College of Mathematics and Computer Science, Dali University, Dali, China
| |
Collapse
|
18
|
Zeng X, Chen W, Lei B. CAT-DTI: cross-attention and Transformer network with domain adaptation for drug-target interaction prediction. BMC Bioinformatics 2024; 25:141. [PMID: 38566002 PMCID: PMC11264959 DOI: 10.1186/s12859-024-05753-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Accepted: 03/19/2024] [Indexed: 04/04/2024] Open
Abstract
Accurate and efficient prediction of drug-target interaction (DTI) is critical to advance drug development and reduce the cost of drug discovery. Recently, the employment of deep learning methods has enhanced DTI prediction precision and efficacy, but it still encounters several challenges. The first challenge lies in the efficient learning of drug and protein feature representations alongside their interaction features to enhance DTI prediction. Another important challenge is to improve the generalization capability of the DTI model within real-world scenarios. To address these challenges, we propose CAT-DTI, a model based on cross-attention and Transformer, possessing domain adaptation capability. CAT-DTI effectively captures the drug-target interactions while adapting to out-of-distribution data. Specifically, we use a convolution neural network combined with a Transformer to encode the distance relationship between amino acids within protein sequences and employ a cross-attention module to capture the drug-target interaction features. Generalization to new DTI prediction scenarios is achieved by leveraging a conditional domain adversarial network, aligning DTI representations under diverse distributions. Experimental results within in-domain and cross-domain scenarios demonstrate that CAT-DTI model overall improves DTI prediction performance compared with previous methods.
Collapse
Affiliation(s)
- Xiaoting Zeng
- School of Computer and Software, Shenzhen University, Shenzhen, 518060, China
| | - Weilin Chen
- Marshall Laboratory of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, 518055, China.
| | - Baiying Lei
- School of Biomedical Engineering, Shenzhen University, Shenzhen, 518055, China.
| |
Collapse
|
19
|
Chen S, Li M, Semenov I. MFA-DTI: Drug-target interaction prediction based on multi-feature fusion adopted framework. Methods 2024; 224:79-92. [PMID: 38430967 DOI: 10.1016/j.ymeth.2024.02.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2023] [Revised: 02/16/2024] [Accepted: 02/23/2024] [Indexed: 03/05/2024] Open
Abstract
The identification of drug-target interactions (DTI) is a valuable step in the drug discovery and repositioning process. However, traditional laboratory experiments are time-consuming and expensive. Computational methods have streamlined research to determine DTIs. The application of deep learning methods has significantly improved the prediction performance for DTIs. Modern deep learning methods can leverage multiple sources of information, including sequence data that contains biological structural information, and interaction data. While useful, these methods cannot be effectively applied to each type of information individually (e.g., chemical structure and interaction network) and do not take into account the specificity of DTI data such as low- or zero-interaction biological entities. To overcome these limitations, we propose a method called MFA-DTI (Multi-feature Fusion Adopted framework for DTI). MFA-DTI consists of three modules: an interaction graph learning module that processes the interaction network to generate interaction vectors, a chemical structure learning module that extracts features from the chemical structure, and a fusion module that combines these features for the final prediction. To validate the performance of MFA-DTI, we conducted experiments on six public datasets under different settings. The results indicate that the proposed method is highly effective in various settings and outperforms state-of-the-art methods.
Collapse
Affiliation(s)
- Siqi Chen
- School of Information Science and Engineering, Chongqing Jiaotong University, Chongqing, 400074, China.
| | - Minghui Li
- Beidahuang Industry Group General Hospital, Harbin, 150006, China
| | - Ivan Semenov
- College of Intelligence and Computing, Tianjin University, Tianjin, 300072, China
| |
Collapse
|
20
|
E Z, Qiao G, Wang G, Li Y. GSL-DTI: Graph structure learning network for Drug-Target interaction prediction. Methods 2024; 223:136-145. [PMID: 38360082 DOI: 10.1016/j.ymeth.2024.01.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2023] [Revised: 12/23/2023] [Accepted: 01/29/2024] [Indexed: 02/17/2024] Open
Abstract
MOTIVATION Drug-target interaction prediction is an important area of research to predict whether there is an interaction between a drug molecule and its target protein. It plays a critical role in drug discovery and development by facilitating the identification of potential drug candidates and expediting the overall process. Given the time-consuming, expensive, and high-risk nature of traditional drug discovery methods, the prediction of drug-target interactions has become an indispensable tool. Using machine learning and deep learning to tackle this class of problems has become a mainstream approach, and graph-based models have recently received much attention in this field. However, many current graph-based Drug-Target Interaction (DTI) prediction methods rely on manually defined rules to construct the Drug-Protein Pair (DPP) network during the DPP representation learning process. However, these methods fail to capture the true underlying relationships between drug molecules and target proteins. RESULTS We propose GSL-DTI, an automatic graph structure learning model used for predicting drug-target interactions (DTIs). Initially, we integrate large-scale heterogeneous networks using a graph convolution network based on meta-paths, effectively learning the representations of drugs and target proteins. Subsequently, we construct drug-protein pairs based on these representations. In contrast to previous studies that construct DPP networks based on manual rules, our method introduces an automatic graph structure learning approach. This approach utilizes a filter gate on the affinity scores of DPPs and relies on the classification loss of downstream tasks to guide the learning of the underlying DPP network structure. Based on the learned DPP network, we transform the prediction of drug-target interactions into a node classification problem. The comprehensive experiments conducted on three public datasets have shown the superiority of GSL-DTI in the tasks of DTI prediction. Additionally, GSL-DTI provides a fresh perspective for advancing research in graph structure learning for DTI prediction.
Collapse
Affiliation(s)
- Zixuan E
- College of Computer and Control Engineering, Northeast Forestry University,Harbin 150006, China
| | - Guanyu Qiao
- College of Computer and Control Engineering, Northeast Forestry University,Harbin 150006, China
| | - Guohua Wang
- College of Computer and Control Engineering, Northeast Forestry University,Harbin 150006, China.
| | - Yang Li
- College of Computer and Control Engineering, Northeast Forestry University,Harbin 150006, China.
| |
Collapse
|
21
|
Alam W, Tayara H, Chong KT. Unlocking the therapeutic potential of drug combinations through synergy prediction using graph transformer networks. Comput Biol Med 2024; 170:108007. [PMID: 38242015 DOI: 10.1016/j.compbiomed.2024.108007] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 01/03/2024] [Accepted: 01/13/2024] [Indexed: 01/21/2024]
Abstract
Drug combinations are frequently used to treat cancer to reduce side effects and increase efficacy. The experimental discovery of drug combination synergy is time-consuming and expensive for large datasets. Therefore, an efficient and reliable computational approach is required to investigate these drug combinations. Advancements in deep learning can handle large datasets with various biological problems. In this study, we developed a SynergyGTN model based on the Graph Transformer Network to predict the synergistic drug combinations against an untreated cancer cell line expression profile. We represent the drug via a graph, with each node and edge of the graph containing nine types of atomic feature vectors and four bonds features, respectively. The cell lines represent based on their gene expression profiles. The drug graph was passed through the GTN layers to extract a generalized feature map for each drug pairs. The drug pair extracted features and cell-line gene expression profiles were concatenated and subsequently subjected to processing through multiple densely connected layers. SynergyGTN outperformed the state-of-the-art methods, with a receiver operating characteristic area under the curve improvement of 5% on the 5-fold cross-validation. The accuracy of SynergyGTN was further verified through three types of cross-validation tests strategies namely leave-drug-out, leave-combination-out, and leave-tissue-out, resulting in improvement in accuracy of 8%, 1%, and 2%, respectively. The Astrazeneca Dream dataset was utilized as an independent dataset to validate and assess the generalizability of the proposed method, resulting in an improvement in balanced accuracy of 13%. In conclusion, SynergyGTN is a reliable and efficient computational approach for predicting drug combination synergy in cancer treatment. Finally, we developed a web server tool to facilitate the pharmaceutical industry and researchers, as available at: http://nsclbio.jbnu.ac.kr/tools/SynergyGTN/.
Collapse
Affiliation(s)
- Waleed Alam
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju, 54896, South Korea
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju, 54896, South Korea.
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju, 54896, South Korea; Advanced Electronics and Information Research Center, Jeonbuk National University, Jeonju, 54896, South Korea.
| |
Collapse
|
22
|
Shao Y, Tang J, Liu J, Han L, Dong S. Multivariable System Prediction Based on TCN-LSTM Networks with Self-Attention Mechanism and LASSO Variable Selection. ACS OMEGA 2023; 8:47798-47811. [PMID: 38144132 PMCID: PMC10733996 DOI: 10.1021/acsomega.3c06263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 11/16/2023] [Accepted: 11/23/2023] [Indexed: 12/26/2023]
Abstract
Intelligent prediction of key output variables that are difficult to measure online in complex systems has important research significance. In this paper, by using the least absolute shrinkage and selection operator (LASSO) algorithm to analyze the principal elements of input variables, a temporal convolutional network fused with long short-term memory (TCN-LSTM) network and self-attention mechanism (SAM) is designed to realize dynamic modeling of multivariate feature sequences. For complex processes with multiple input variables, each variable has different effects on the output, so it is necessary to use the LASSO algorithm to perform regression analysis on the input and output data for selecting the principal component variables and reducing the redundancy and computation burden of the network. The TCN network is used to extract the features of the input variables efficiently. The long-term memory performance of time series is enhanced by applying an LSTM network. The multihead SAM is used to optimize the network, and the role of key features is enhanced by assigning weights with probability to further improve the accuracy of sequence prediction. Finally, by comparison with the existing network model, the offline data generated by the high and low converters in the synthetic ammonia industry is used to predict the CO content so as to verify the superiority and applicability of the proposed network model.
Collapse
Affiliation(s)
- Yiqin Shao
- Key
Laboratory of Intelligent Textile and Flexible Interconnection of
Zhejiang Province,College of Textiles Science and Engineering, Zhejiang Sci-Tech University, Hangzhou 310018, China
| | - Jiale Tang
- Engineering
Research Center of Intelligent Control for Underground Space, Ministry
of Education, China University of Mining
and Technology, Xuzhou 221116, China
- School
of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Jun Liu
- Engineering
Research Center of Intelligent Control for Underground Space, Ministry
of Education, China University of Mining
and Technology, Xuzhou 221116, China
- School
of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Lixin Han
- Engineering
Research Center of Intelligent Control for Underground Space, Ministry
of Education, China University of Mining
and Technology, Xuzhou 221116, China
- School
of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Shijian Dong
- Engineering
Research Center of Intelligent Control for Underground Space, Ministry
of Education, China University of Mining
and Technology, Xuzhou 221116, China
- School
of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| |
Collapse
|
23
|
Zhu Z, Yao Z, Zheng X, Qi G, Li Y, Mazur N, Gao X, Gong Y, Cong B. Drug-target affinity prediction method based on multi-scale information interaction and graph optimization. Comput Biol Med 2023; 167:107621. [PMID: 37907030 DOI: 10.1016/j.compbiomed.2023.107621] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 10/16/2023] [Accepted: 10/23/2023] [Indexed: 11/02/2023]
Abstract
Drug-target affinity (DTA) prediction as an emerging and effective method is widely applied to explore the strength of drug-target interactions in drug development research. By predicting these interactions, researchers can assess the potential efficacy and safety of candidate drugs at an early stage, narrowing down the search space for therapeutic targets and accelerating the discovery and development of new drugs. However, existing DTA prediction models mainly use graphical representations of drug molecules, which lack information on interactions between individual substructures, thus affecting prediction accuracy and model interpretability. Therefore, transformer and diffusion on drug graphs in DTA prediction (TDGraphDTA) are introduced to predict drug-target interactions using multi-scale information interaction and graph optimization. An interactive module is integrated into feature extraction of drug and target features at different granularity levels. A diffusion model-based graph optimization module is proposed to improve the representation of molecular graph structures and enhance the interpretability of graph representations while obtaining optimal feature representations. In addition, TDGraphDTA improves the accuracy and reliability of predictions by capturing relationships and contextual information between molecular substructures. The performance of the proposed TDGraphDTA in DTA prediction was verified on three publicly available benchmark datasets (Davis, Metz, and KIBA). Compared with state-of-the-art baseline models, it achieved better results in terms of consistency index, R-squared, etc. Furthermore, compared with some existing methods, the proposed TDGraphDTA is demonstrated to have better structure capturing capabilities by visualizing the feature capturing capabilities of the model using Grad-AAM toxicity labels in the ToxCast dataset. The corresponding source codes are available at https://github.com/Lamouryz/TDGraph.
Collapse
Affiliation(s)
- Zhiqin Zhu
- College of Automation, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| | - Zheng Yao
- College of Automation, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| | - Xin Zheng
- College of Automation, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| | - Guanqiu Qi
- Computer Information Systems Department, State University of New York at Buffalo State, Buffalo, NY 14222, USA.
| | - Yuanyuan Li
- College of Automation, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| | - Neal Mazur
- Computer Information Systems Department, State University of New York at Buffalo State, Buffalo, NY 14222, USA.
| | - Xinbo Gao
- College of Automation, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| | - Yifei Gong
- Faculty of applied science & engineering, the Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto at Toronto, ON M5S, Canada.
| | - Baisen Cong
- Diagnostics Digital, DH(Shanghai) Diagnostics Co, Ltd, a Danaher company, Shanghai, 200335, China.
| |
Collapse
|
24
|
Zhang Y, Liu C, Liu M, Liu T, Lin H, Huang CB, Ning L. Attention is all you need: utilizing attention in AI-enabled drug discovery. Brief Bioinform 2023; 25:bbad467. [PMID: 38189543 PMCID: PMC10772984 DOI: 10.1093/bib/bbad467] [Citation(s) in RCA: 37] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 11/03/2023] [Accepted: 11/25/2023] [Indexed: 01/09/2024] Open
Abstract
Recently, attention mechanism and derived models have gained significant traction in drug development due to their outstanding performance and interpretability in handling complex data structures. This review offers an in-depth exploration of the principles underlying attention-based models and their advantages in drug discovery. We further elaborate on their applications in various aspects of drug development, from molecular screening and target binding to property prediction and molecule generation. Finally, we discuss the current challenges faced in the application of attention mechanisms and Artificial Intelligence technologies, including data quality, model interpretability and computational resource constraints, along with future directions for research. Given the accelerating pace of technological advancement, we believe that attention-based models will have an increasingly prominent role in future drug discovery. We anticipate that these models will usher in revolutionary breakthroughs in the pharmaceutical domain, significantly accelerating the pace of drug development.
Collapse
Affiliation(s)
- Yang Zhang
- Innovative Institute of Chinese Medicine and Pharmacy, Academy for Interdiscipline, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Caiqi Liu
- Department of Gastrointestinal Medical Oncology, Harbin Medical University Cancer Hospital, No.150 Haping Road, Nangang District, Harbin, Heilongjiang 150081, China
- Key Laboratory of Molecular Oncology of Heilongjiang Province, No.150 Haping Road, Nangang District, Harbin, Heilongjiang 150081, China
| | - Mujiexin Liu
- Chongqing Key Laboratory of Sichuan-Chongqing Co-construction for Diagnosis and Treatment of Infectious Diseases Integrated Traditional Chinese and Western Medicine, College of Medical Technology, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Tianyuan Liu
- Graduate School of Science and Technology, University of Tsukuba, Tsukuba, Japan
| | - Hao Lin
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Cheng-Bing Huang
- School of Computer Science and Technology, Aba Teachers University, Aba, China
| | - Lin Ning
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu 611844, China
| |
Collapse
|
25
|
Wang L, Zhou Y, Chen Q. AMMVF-DTI: A Novel Model Predicting Drug-Target Interactions Based on Attention Mechanism and Multi-View Fusion. Int J Mol Sci 2023; 24:14142. [PMID: 37762445 PMCID: PMC10531525 DOI: 10.3390/ijms241814142] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 09/09/2023] [Accepted: 09/12/2023] [Indexed: 09/29/2023] Open
Abstract
Accurate identification of potential drug-target interactions (DTIs) is a crucial task in drug development and repositioning. Despite the remarkable progress achieved in recent years, improving the performance of DTI prediction still presents significant challenges. In this study, we propose a novel end-to-end deep learning model called AMMVF-DTI (attention mechanism and multi-view fusion), which leverages a multi-head self-attention mechanism to explore varying degrees of interaction between drugs and target proteins. More importantly, AMMVF-DTI extracts interactive features between drugs and proteins from both node-level and graph-level embeddings, enabling a more effective modeling of DTIs. This advantage is generally lacking in existing DTI prediction models. Consequently, when compared to many of the start-of-the-art methods, AMMVF-DTI demonstrated excellent performance on the human, C. elegans, and DrugBank baseline datasets, which can be attributed to its ability to incorporate interactive information and mine features from both local and global structures. The results from additional ablation experiments also confirmed the importance of each module in our AMMVF-DTI model. Finally, a case study is presented utilizing our model for COVID-19-related DTI prediction. We believe the AMMVF-DTI model can not only achieve reasonable accuracy in DTI prediction, but also provide insights into the understanding of potential interactions between drugs and targets.
Collapse
|
26
|
Wu T, Tang Y, Sun Q, Xiong L. Molecular Joint Representation Learning via Multi-Modal Information of SMILES and Graphs. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3044-3055. [PMID: 37028366 DOI: 10.1109/tcbb.2023.3253862] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
In recent years, artificial intelligence has played an important role on accelerating the whole process of drug discovery. Various of molecular representation schemes of different modals (e.g., textual sequence or graph) are developed. By digitally encoding them, different chemical information can be learned through corresponding network structures. Molecular graphs and Simplified Molecular Input Line Entry System (SMILES) are popular means for molecular representation learning in current. Previous works have done attempts by combining both of them to solve the problem of specific information loss in single-modal representation on various tasks. To further fusing such multi-modal imformation, the correspondence between learned chemical feature from different representation should be considered. To realize this, we propose a novel framework of molecular joint representation learning via Multi-Modal information of SMILES and molecular Graphs, called MMSG. We improve the self-attention mechanism by introducing bond-level graph representation as attention bias in Transformer to reinforce feature correspondence between multi-modal information. We further propose a Bidirectional Message Communication Graph Neural Network (BMC GNN) to strengthen the information flow aggregated from graphs for further combination. Numerous experiments on public property prediction datasets have demonstrated the effectiveness of our model.
Collapse
|
27
|
Su B, Wang X, Ouyang Y, Lin X. DA-SRN: Omics data analysis based on the sample network optimization for complex diseases. Comput Biol Med 2023; 164:107252. [PMID: 37454504 DOI: 10.1016/j.compbiomed.2023.107252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Revised: 05/30/2023] [Accepted: 07/07/2023] [Indexed: 07/18/2023]
Abstract
Effective biomarker identification and accurate sample label prediction are still challenging for complex diseases. Patient similarity network (PSN) analysis is a powerful tool in disease omics data analysis. The topology of PSN can reflect the discriminative ability of the corresponding feature space on which the sample network is built. In this study, a novel omics data analysis method based on the sample reference network (DA-SRN) is proposed to identify the potential biomarkers and predict the sample categories. DA-SRN defines the informative features and the sample reference network in optimizing the network structure by genetic algorithm. It labels the samples based on the graph neural network, the reference network and the selected informative features. DA-SRN was compared with nine efficient omics data analysis methods on the genomics, metabolomics and transcriptomics datasets to show its validation. The comparison results showed that it outperformed the other methods in area under receiver operating characteristic curve (AUROC), sensitivity, specificity and area under precision-recall curve (AUPRC) in most cases. Besides, the important metabolites identified by DA-SRN for the type 2 diabetes (T2D) metabolomics data were further examined. The pathway analysis revealed the close relationships between the identified metabolites and the critical metabolic pathways related to the occurrence and development of T2D. The experimental results illustrate that DA-SRN can extract the valuable information from the complex omics data by analyzing the sample relationship, and is promising in biomarker identification and sample discrimination for complex diseases.
Collapse
Affiliation(s)
- Benzhe Su
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China.
| | - Xiaoxiao Wang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China.
| | - Yang Ouyang
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, Liaoning, China.
| | - Xiaohui Lin
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China.
| |
Collapse
|
28
|
Khojasteh H, Pirgazi J, Ghanbari Sorkhi A. Improving prediction of drug-target interactions based on fusing multiple features with data balancing and feature selection techniques. PLoS One 2023; 18:e0288173. [PMID: 37535616 PMCID: PMC10399861 DOI: 10.1371/journal.pone.0288173] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Accepted: 06/21/2023] [Indexed: 08/05/2023] Open
Abstract
Drug discovery relies on predicting drug-target interaction (DTI), which is an important challenging task. The purpose of DTI is to identify the interaction between drug chemical compounds and protein targets. Traditional wet lab experiments are time-consuming and expensive, that's why in recent years, the use of computational methods based on machine learning has attracted the attention of many researchers. Actually, a dry lab environment focusing more on computational methods of interaction prediction can be helpful in limiting search space for wet lab experiments. In this paper, a novel multi-stage approach for DTI is proposed that called SRX-DTI. In the first stage, combination of various descriptors from protein sequences, and a FP2 fingerprint that is encoded from drug are extracted as feature vectors. A major challenge in this application is the imbalanced data due to the lack of known interactions, in this regard, in the second stage, the One-SVM-US technique is proposed to deal with this problem. Next, the FFS-RF algorithm, a forward feature selection algorithm, coupled with a random forest (RF) classifier is developed to maximize the predictive performance. This feature selection algorithm removes irrelevant features to obtain optimal features. Finally, balanced dataset with optimal features is given to the XGBoost classifier to identify DTIs. The experimental results demonstrate that our proposed approach SRX-DTI achieves higher performance than other existing methods in predicting DTIs. The datasets and source code are available at: https://github.com/Khojasteh-hb/SRX-DTI.
Collapse
Affiliation(s)
- Hakimeh Khojasteh
- Department of Computer Engineering, University of Zanjan, Zanjan, Iran
- School of Biological Sciences Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
| | - Jamshid Pirgazi
- School of Biological Sciences Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
- Department of Computer Engineering, University of Science and Technology of Mazandaran, Behshahr, Iran
| | - Ali Ghanbari Sorkhi
- Department of Computer Engineering, University of Science and Technology of Mazandaran, Behshahr, Iran
| |
Collapse
|
29
|
Zhou L, Wang Y, Peng L, Li Z, Luo X. Identifying potential drug-target interactions based on ensemble deep learning. Front Aging Neurosci 2023; 15:1176400. [PMID: 37396659 PMCID: PMC10309650 DOI: 10.3389/fnagi.2023.1176400] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 05/10/2023] [Indexed: 07/04/2023] Open
Abstract
Introduction Drug-target interaction prediction is one important step in drug research and development. Experimental methods are time consuming and laborious. Methods In this study, we developed a novel DTI prediction method called EnGDD by combining initial feature acquisition, dimensional reduction, and DTI classification based on Gradient boosting neural network, Deep neural network, and Deep Forest. Results EnGDD was compared with seven stat-of-the-art DTI prediction methods (BLM-NII, NRLMF, WNNGIP, NEDTP, DTi2Vec, RoFDT, and MolTrans) on the nuclear receptor, GPCR, ion channel, and enzyme datasets under cross validations on drugs, targets, and drug-target pairs, respectively. EnGDD computed the best recall, accuracy, F1-score, AUC, and AUPR under the majority of conditions, demonstrating its powerful DTI identification performance. EnGDD predicted that D00182 and hsa2099, D07871 and hsa1813, DB00599 and hsa2562, D00002 and hsa10935 have a higher interaction probabilities among unknown drug-target pairs and may be potential DTIs on the four datasets, respectively. In particular, D00002 (Nadide) was identified to interact with hsa10935 (Mitochondrial peroxiredoxin3) whose up-regulation might be used to treat neurodegenerative diseases. Finally, EnGDD was used to find possible drug targets for Parkinson's disease and Alzheimer's disease after confirming its DTI identification performance. The results show that D01277, D04641, and D08969 may be applied to the treatment of Parkinson's disease through targeting hsa1813 (dopamine receptor D2) and D02173, D02558, and D03822 may be the clues of treatment for patients with Alzheimer's disease through targeting hsa5743 (prostaglandinendoperoxide synthase 2). The above prediction results need further biomedical validation. Discussion We anticipate that our proposed EnGDD model can help discover potential therapeutic clues for various diseases including neurodegenerative diseases.
Collapse
Affiliation(s)
- Liqian Zhou
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Yuzhuang Wang
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Zejun Li
- School of Computer Science, Hunan Institute of Technology, Hengyang, China
| | - Xueming Luo
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| |
Collapse
|
30
|
Li J, Wang Y, Li Z, Lin H, Wu B. LM-DTI: a tool of predicting drug-target interactions using the node2vec and network path score methods. Front Genet 2023; 14:1181592. [PMID: 37229202 PMCID: PMC10203599 DOI: 10.3389/fgene.2023.1181592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Accepted: 04/13/2023] [Indexed: 05/27/2023] Open
Abstract
Introduction: Drug-target interaction (DTI) prediction is a key step in drug function discovery and repositioning. The emergence of large-scale heterogeneous biological networks provides an opportunity to identify drug-related target genes, which led to the development of several computational methods for DTI prediction. Methods: Considering the limitations of conventional computational methods, a novel tool named LM-DTI based on integrated information related to lncRNAs and miRNAs was proposed, which adopted the graph embedding (node2vec) and the network path score methods. First, LM-DTI innovatively constructed a heterogeneous information network containing eight networks composed of four types of nodes (drug, target, lncRNA, and miRNA). Next, the node2vec method was used to obtain feature vectors of drug as well as target nodes, and the path score vector of each drug-target pair was calculated using the DASPfind method. Finally, the feature vectors and path score vectors were merged and input into the XGBoost classifier to predict potential drug-target interactions. Results and Discussion: The 10-fold cross validations evaluate the classification accuracies of the LM-DTI. The prediction performance of LM-DTI in AUPR reached 0.96, which showed a significant improvement compared with those of conventional tools. The validity of LM-DTI has also been verified by manually searching literature and various databases. LM-DTI is scalable and computing efficient; thus representing a powerful drug relocation tool that can be accessed for free at http://www.lirmed.com:5038/lm_dti.
Collapse
Affiliation(s)
- Jianwei Li
- School of Artificial Intelligence, Institute of Computational Medicine, Hebei University of Technology, Tianjin, China
- School of Electronic and Information Engineering, Hebei University of Technology, Tianjin, China
| | - Yinfei Wang
- School of Artificial Intelligence, Institute of Computational Medicine, Hebei University of Technology, Tianjin, China
| | - Zhiguang Li
- School of Artificial Intelligence, Institute of Computational Medicine, Hebei University of Technology, Tianjin, China
| | - Hongxin Lin
- School of Artificial Intelligence, Institute of Computational Medicine, Hebei University of Technology, Tianjin, China
| | - Baoqin Wu
- School of Artificial Intelligence, Institute of Computational Medicine, Hebei University of Technology, Tianjin, China
| |
Collapse
|
31
|
Zhao Q, Duan G, Yang M, Cheng Z, Li Y, Wang J. AttentionDTA: Drug-Target Binding Affinity Prediction by Sequence-Based Deep Learning With Attention Mechanism. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:852-863. [PMID: 35471889 DOI: 10.1109/tcbb.2022.3170365] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The identification of drug-target relations (DTRs) is substantial in drug development. A large number of methods treat DTRs as drug-target interactions (DTIs), a binary classification problem. The main drawback of these methods are the lack of reliable negative samples and the absence of many important aspects of DTR, including their dose dependence and quantitative affinities. With increasing number of publications of drug-protein binding affinity data recently, DTRs prediction can be viewed as a regression problem of drug-target affinities (DTAs) which reflects how tightly the drug binds to the target and can present more detailed and specific information than DTIs. The growth of affinity data enables the use of deep learning architectures, which have been shown to be among the state-of-the-art methods in binding affinity prediction. Although relatively effective, due to the black-box nature of deep learning, these models are less biologically interpretable. In this study, we proposed a deep learning-based model, named AttentionDTA, which uses attention mechanism to predict DTAs. Different from the models using 3D structures of drug-target complexes or graph representation of drugs and proteins, the novelty of our work is to use attention mechanism to focus on key subsequences which are important in drug and protein sequences when predicting its affinity. We use two separate one-dimensional Convolution Neural Networks (1D-CNNs) to extract the semantic information of drug's SMILES string and protein's amino acid sequence. Furthermore, a two-side multi-head attention mechanism is developed and embedded to our model to explore the relationship between drug features and protein features. We evaluate our model on three established DTA benchmark datasets, Davis, Metz, and KIBA. AttentionDTA outperforms the state-of-the-art deep learning methods under different evaluation metrics. The results show that the attention-based model can effectively extract protein features related to drug information and drug features related to protein information to better predict drug target affinities. It is worth mentioning that we test our model on IC50 dataset, which provides the binding sites between drugs and proteins, to evaluate the ability of our model to locate binding sites. Finally, we visualize the attention weight to demonstrate the biological significance of the model. The source code of AttentionDTA can be downloaded from https://github.com/zhaoqichang/AttentionDTA_TCBB.
Collapse
|
32
|
Tang C, Zhong C, Wang M, Zhou F. FMGNN: A Method to Predict Compound-Protein Interaction With Pharmacophore Features and Physicochemical Properties of Amino Acids. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1030-1040. [PMID: 35503835 DOI: 10.1109/tcbb.2022.3172340] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Identifying interactions between compounds and proteins is an essential task in drug discovery. To recommend compounds as new drug candidates, applying the computational approaches has a lower cost than conducting the wet-lab experiments. Machine learning-based methods, especially deep learning-based methods, have advantages in learning complex feature interactions between compounds and proteins. However, deep learning models will over-generalize and lead to the problem of predicting less relevant compound-protein pairs when the compound-protein feature interactions are high-dimensional sparse. This problem can be overcome by learning both low-order and high-order feature interactions. In this paper, we propose a novel hybrid model with Factorization Machines and Graph Neural Network called FMGNN to extract the low-order and high-order features, respectively. Then, we design a compound-protein interactions (CPIs) prediction method with pharmacophore features of compound and physicochemical properties of amino acids. The pharmacophore features can ensure that the prediction results much more fit the expectation of biological experiment and the physicochemical properties of amino acids are loaded into the embedding layer to improve the convergence speed and accuracy of protein feature learning. The experimental results on several datasets, especially on an imbalanced large-scale dataset, showed that our proposed method outperforms other existing methods for CPI prediction. The western blot experiment results on wogonin and its candidate target proteins also showed that our proposed method is effective and accurate for finding target proteins. The computer program of implementing the model FMGNN is available at https://github.com/tcygxu2021/FMGNN.
Collapse
|
33
|
Yan C, Ding C, Duan G. PMMS: Predicting essential miRNAs based on multi-head self-attention mechanism and sequences. Front Med (Lausanne) 2022; 9:1015278. [DOI: 10.3389/fmed.2022.1015278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 10/25/2022] [Indexed: 11/18/2022] Open
Abstract
Increasing evidence has proved that miRNA plays a significant role in biological progress. In order to understand the etiology and mechanisms of various diseases, it is necessary to identify the essential miRNAs. However, it is time-consuming and expensive to identify essential miRNAs by using traditional biological experiments. It is critical to develop computational methods to predict potential essential miRNAs. In this study, we provided a new computational method (called PMMS) to identify essential miRNAs by using multi-head self-attention and sequences. First, PMMS computes the statistic and structure features and extracts the static feature by concatenating them. Second, PMMS extracts the deep learning original feature (BiLSTM-based feature) by using bi-directional long short-term memory (BiLSTM) and pre-miRNA sequences. In addition, we further obtained the multi-head self-attention feature (MS-based feature) based on BiLSTM-based feature and multi-head self-attention mechanism. By considering the importance of the subsequence of pre-miRNA to the static feature of miRNA, we obtained the deep learning final feature (WA-based feature) based on the weighted attention mechanism. Finally, we concatenated WA-based feature and static feature as an input to the multilayer perceptron) model to predict essential miRNAs. We conducted five-fold cross-validation to evaluate the prediction performance of PMMS. The areas under the ROC curves (AUC), the F1-score, and accuracy (ACC) are used as performance metrics. From the experimental results, PMMS obtained best prediction performances (AUC: 0.9556, F1-score: 0.9030, and ACC: 0.9097). It also outperformed other compared methods. The experimental results also illustrated that PMMS is an effective method to identify essential miRNA.
Collapse
|
34
|
Zhang Y, Luo M, Wu P, Wu S, Lee TY, Bai C. Application of Computational Biology and Artificial Intelligence in Drug Design. Int J Mol Sci 2022; 23:13568. [PMID: 36362355 PMCID: PMC9658956 DOI: 10.3390/ijms232113568] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2022] [Revised: 10/29/2022] [Accepted: 11/03/2022] [Indexed: 08/24/2023] Open
Abstract
Traditional drug design requires a great amount of research time and developmental expense. Booming computational approaches, including computational biology, computer-aided drug design, and artificial intelligence, have the potential to expedite the efficiency of drug discovery by minimizing the time and financial cost. In recent years, computational approaches are being widely used to improve the efficacy and effectiveness of drug discovery and pipeline, leading to the approval of plenty of new drugs for marketing. The present review emphasizes on the applications of these indispensable computational approaches in aiding target identification, lead discovery, and lead optimization. Some challenges of using these approaches for drug design are also discussed. Moreover, we propose a methodology for integrating various computational techniques into new drug discovery and design.
Collapse
Affiliation(s)
- Yue Zhang
- School of Life and Health Sciences, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
- School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China
- Warshel Institute for Computational Biology, Shenzhen 518172, China
| | - Mengqi Luo
- School of Life and Health Sciences, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
- South China Hospital, Health Science Center, Shenzhen University, Shenzhen 518116, China
| | - Peng Wu
- School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen 518055, China
| | - Song Wu
- South China Hospital, Health Science Center, Shenzhen University, Shenzhen 518116, China
| | - Tzong-Yi Lee
- School of Life and Health Sciences, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
- Warshel Institute for Computational Biology, Shenzhen 518172, China
| | - Chen Bai
- School of Life and Health Sciences, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
- Warshel Institute for Computational Biology, Shenzhen 518172, China
| |
Collapse
|
35
|
Wang H, Guo F, Du M, Wang G, Cao C. A novel method for drug-target interaction prediction based on graph transformers model. BMC Bioinformatics 2022; 23:459. [PMID: 36329406 PMCID: PMC9635108 DOI: 10.1186/s12859-022-04812-w] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Accepted: 06/23/2022] [Indexed: 11/06/2022] Open
Abstract
BACKGROUND Drug-target interactions (DTIs) prediction becomes more and more important for accelerating drug research and drug repositioning. Drug-target interaction network is a typical model for DTIs prediction. As many different types of relationships exist between drug and target, drug-target interaction network can be used for modeling drug-target interaction relationship. Recent works on drug-target interaction network are mostly concentrate on drug node or target node and neglecting the relationships between drug-target. RESULTS We propose a novel prediction method for modeling the relationship between drug and target independently. Firstly, we use different level relationships of drugs and targets to construct feature of drug-target interaction. Then, we use line graph to model drug-target interaction. After that, we introduce graph transformer network to predict drug-target interaction. CONCLUSIONS This method introduces a line graph to model the relationship between drug and target. After transforming drug-target interactions from links to nodes, a graph transformer network is used to accomplish the task of predicting drug-target interactions.
Collapse
Affiliation(s)
- Hongmei Wang
- College of Computer Science and Engineering, Changchun University of Technology, Changchun, China
| | - Fang Guo
- College of Computer Science and Engineering, Changchun University of Technology, Changchun, China
| | - Mengyan Du
- College of Computer Science and Engineering, Changchun University of Technology, Changchun, China
| | - Guishen Wang
- College of Computer Science and Engineering, Changchun University of Technology, Changchun, China.
| | - Chen Cao
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China. .,Department of Biochemistry and Molecular Biology, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Canada.
| |
Collapse
|
36
|
Kumar V, Lee G, Yoo J, Ro HS, Lee KW. An attention mechanism-based LSTM network for cancer kinase activity prediction. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2022; 33:631-647. [PMID: 36062308 DOI: 10.1080/1062936x.2022.2109062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Accepted: 07/30/2022] [Indexed: 06/15/2023]
Abstract
Despite the endeavours and achievements made in treating cancers during the past decades, resistance to available kinase drugs continues to be a major problem in cancer therapies. Thus, it is highly desirable to develop computational models that can predict the bioactivity of a compound against cancer kinases. Here, we present a Long Short-Term Memory (LSTM) framework for predicting the activities of lead molecules against seven different kinases. A total of 14,907 compounds from the ChEMBL database were selected for model building. Two different molecular representations, namely, 2D descriptors and MACCS fingerprints were subjected to the LSTM method for the training process. We also successfully integrated an attention mechanism into our model, which helped us to interpret the contribution of chemical features on kinase activity. The attention mechanism extracted the significant chemical moieties more effectively by taking them into consideration during the activity prediction. The recorded accuracies in the test sets for both 2D descriptors and MACCS fingerprints-based models were 0.81 and 0.78, respectively. The receiver operating characteristic curve (ROC)-area under the curve (AUC) score for both models was in the range of 0.8-0.99. The proposed framework can be a good starting point for the development of new cancer kinase drugs.
Collapse
Affiliation(s)
- V Kumar
- Department of Bio & Medical Big Data (BK21 Four Program), Division of Life Sciences, Research Institute of Life Sciences, Gyeongsang National University, Jinju, Korea
| | - G Lee
- Division of Applied Life Science (BK21 Program), ABC-RLRC, PMBBRC, Gyeongsang National University, Jinju, Korea
| | - J Yoo
- Division of Applied Life Science (BK21 Program), Research Institute of Life Sciences, Gyeongsang National University, Jinju, Korea
| | - H S Ro
- Department of Bio & Medical Big Data (BK21 Four Program), Division of Life Sciences, Research Institute of Life Sciences, Gyeongsang National University, Jinju, Korea
| | - K W Lee
- Department of Bio & Medical Big Data (BK21 Four Program), Division of Life Sciences, Research Institute of Life Sciences, Gyeongsang National University, Jinju, Korea
- ANGEL i-Drug Design (AiDD), Jinju, Korea
| |
Collapse
|
37
|
Yeh SJ, Yeh TY, Chen BS. Systems Drug Discovery for Diffuse Large B Cell Lymphoma Based on Pathogenic Molecular Mechanism via Big Data Mining and Deep Learning Method. Int J Mol Sci 2022; 23:ijms23126732. [PMID: 35743172 PMCID: PMC9224183 DOI: 10.3390/ijms23126732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 06/10/2022] [Accepted: 06/15/2022] [Indexed: 02/01/2023] Open
Abstract
Diffuse large B cell lymphoma (DLBCL) is an aggressive heterogeneous disease. The most common subtypes of DLBCL include germinal center b-cell (GCB) type and activated b-cell (ABC) type. To learn more about the pathogenesis of two DLBCL subtypes (i.e., DLBCL ABC and DLBCL GCB), we firstly construct a candidate genome-wide genetic and epigenetic network (GWGEN) by big database mining. With the help of two DLBCL subtypes’ genome-wide microarray data, we identify their real GWGENs via system identification and model order selection approaches. Afterword, the core GWGENs of two DLBCL subtypes could be extracted from real GWGENs by principal network projection (PNP) method. By comparing core signaling pathways and investigating pathogenic mechanisms, we are able to identify pathogenic biomarkers as drug targets for DLBCL ABC and DLBCL GCD, respectively. Furthermore, we do drug discovery considering drug-target interaction ability, drug regulation ability, and drug toxicity. Among them, a deep neural network (DNN)-based drug-target interaction (DTI) model is trained in advance to predict potential drug candidates holding higher probability to interact with identified biomarkers. Consequently, two drug combinations are proposed to alleviate DLBCL ABC and DLBCL GCB, respectively.
Collapse
|
38
|
DeepMHADTA: Prediction of Drug-Target Binding Affinity Using Multi-Head Self-Attention and Convolutional Neural Network. Curr Issues Mol Biol 2022; 44:2287-2299. [PMID: 35678684 PMCID: PMC9164023 DOI: 10.3390/cimb44050155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 05/08/2022] [Accepted: 05/14/2022] [Indexed: 11/17/2022] Open
Abstract
Drug-target interactions provide insight into the drug-side effects and drug repositioning. However, wet-lab biochemical experiments are time-consuming and labor-intensive, and are insufficient to meet the pressing demand for drug research and development. With the rapid advancement of deep learning, computational methods are increasingly applied to screen drug-target interactions. Many methods consider this problem as a binary classification task (binding or not), but ignore the quantitative binding affinity. In this paper, we propose a new end-to-end deep learning method called DeepMHADTA, which uses the multi-head self-attention mechanism in a deep residual network to predict drug-target binding affinity. On two benchmark datasets, our method outperformed several current state-of-the-art methods in terms of multiple performance measures, including mean square error (MSE), consistency index (CI), rm2, and PR curve area (AUPR). The results demonstrated that our method achieved better performance in predicting the drug–target binding affinity.
Collapse
|
39
|
Li Y, Qiao G, Wang K, Wang G. Drug-target interaction predication via multi-channel graph neural networks. Brief Bioinform 2021; 23:6363570. [PMID: 34661237 DOI: 10.1093/bib/bbab346] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Revised: 07/21/2021] [Accepted: 08/12/2021] [Indexed: 12/15/2022] Open
Abstract
Drug-target interaction (DTI) is an important step in drug discovery. Although there are many methods for predicting drug targets, these methods have limitations in using discrete or manual feature representations. In recent years, deep learning methods have been used to predict DTIs to improve these defects. However, most of the existing deep learning methods lack the fusion of topological structure and semantic information in DPP representation learning process. Besides, when learning the DPP node representation in the DPP network, the different influences between neighboring nodes are ignored. In this paper, a new model DTI-MGNN based on multi-channel graph convolutional network and graph attention is proposed for DTI prediction. We use two independent graph attention networks to learn the different interactions between nodes for the topology graph and feature graph with different strengths. At the same time, we use a graph convolutional network with shared weight matrices to learn the common information of the two graphs. The DTI-MGNN model combines topological structure and semantic features to improve the representation learning ability of DPPs, and obtain the state-of-the-art results on public datasets. Specifically, DTI-MGNN has achieved a high accuracy in identifying DTIs (the area under the receiver operating characteristic curve is 0.9665).
Collapse
Affiliation(s)
- Yang Li
- College of Information and Computer Engineering, Northeast Forestry University, 150004, Harbin, China
| | - Guanyu Qiao
- College of Information and Computer Engineering, Northeast Forestry University, 150004, Harbin, China
| | - Keqi Wang
- College of Information and Computer Engineering, Northeast Forestry University, 150004, Harbin, China
| | - Guohua Wang
- College of Information and Computer Engineering, Northeast Forestry University, 150004, Harbin, China
| |
Collapse
|