1
|
Shi W, Yang H, Xie L, Yin XX, Zhang Y. A review of machine learning-based methods for predicting drug-target interactions. Health Inf Sci Syst 2024; 12:30. [PMID: 38617016 PMCID: PMC11014838 DOI: 10.1007/s13755-024-00287-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 03/04/2024] [Indexed: 04/16/2024] Open
Abstract
The prediction of drug-target interactions (DTI) is a crucial preliminary stage in drug discovery and development, given the substantial risk of failure and the prolonged validation period associated with in vitro and in vivo experiments. In the contemporary landscape, various machine learning-based methods have emerged as indispensable tools for DTI prediction. This paper begins by placing emphasis on the data representation employed by these methods, delineating five representations for drugs and four for proteins. The methods are then categorized into traditional machine learning-based approaches and deep learning-based ones, with a discussion of representative approaches in each category and the introduction of a novel taxonomy for deep neural network models in DTI prediction. Additionally, we present a synthesis of commonly used datasets and evaluation metrics to facilitate practical implementation. In conclusion, we address current challenges and outline potential future directions in this research field.
Collapse
Affiliation(s)
- Wen Shi
- Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, 510006 China
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua, 321004 China
| | - Hong Yang
- Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, 510006 China
| | - Linhai Xie
- State Key Laboratory of Proteomics, National Center for Protein Sciences (Beijing), Beijing, 102206 China
| | - Xiao-Xia Yin
- Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, 510006 China
| | - Yanchun Zhang
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua, 321004 China
- Department of New Networks, Peng Cheng Laboratory, Shenzhen, 518000 China
| |
Collapse
|
2
|
Zhang H, Liu X, Cheng W, Wang T, Chen Y. Prediction of drug-target binding affinity based on deep learning models. Comput Biol Med 2024; 174:108435. [PMID: 38608327 DOI: 10.1016/j.compbiomed.2024.108435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 04/05/2024] [Accepted: 04/07/2024] [Indexed: 04/14/2024]
Abstract
The prediction of drug-target binding affinity (DTA) plays an important role in drug discovery. Computerized virtual screening techniques have been used for DTA prediction, greatly reducing the time and economic costs of drug discovery. However, these techniques have not succeeded in reversing the low success rate of new drug development. In recent years, the continuous development of deep learning (DL) technology has brought new opportunities for drug discovery through the DTA prediction. This shift has moved the prediction of DTA from traditional machine learning methods to DL. The DL frameworks used for DTA prediction include convolutional neural networks (CNN), graph convolutional neural networks (GCN), and recurrent neural networks (RNN), and reinforcement learning (RL), among others. This review article summarizes the available literature on DTA prediction using DL models, including DTA quantification metrics and datasets, and DL algorithms used for DTA prediction (including input representation of models, neural network frameworks, valuation indicators, and model interpretability). In addition, the opportunities, challenges, and prospects of the application of DL frameworks for DTA prediction in the field of drug discovery are discussed.
Collapse
Affiliation(s)
- Hao Zhang
- College of Science, Nanjing Agricultural University, Nanjing, 210095, China
| | - Xiaoqian Liu
- College of Science, Nanjing Agricultural University, Nanjing, 210095, China
| | - Wenya Cheng
- College of Science, Nanjing Agricultural University, Nanjing, 210095, China
| | - Tianshi Wang
- College of Science, Nanjing Agricultural University, Nanjing, 210095, China
| | - Yuanyuan Chen
- College of Science, Nanjing Agricultural University, Nanjing, 210095, China.
| |
Collapse
|
3
|
Zhang Z, Bian Y, Xie A, Han P, Zhou S. Can Pretrained Models Really Learn Better Molecular Representations for AI-Aided Drug Discovery? J Chem Inf Model 2024; 64:2921-2930. [PMID: 38145387 PMCID: PMC11005046 DOI: 10.1021/acs.jcim.3c01707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 11/29/2023] [Accepted: 11/29/2023] [Indexed: 12/26/2023]
Abstract
Self-supervised pretrained models are gaining increasingly more popularity in AI-aided drug discovery, leading to more and more pretrained models with the promise that they can extract better feature representations for molecules. Yet, the quality of learned representations has not been fully explored. In this work, inspired by the two phenomena of Activity Cliffs (ACs) and Scaffold Hopping (SH) in traditional Quantitative Structure-Activity Relationship analysis, we propose a method named Representation-Property Relationship Analysis (RePRA) to evaluate the quality of the representations extracted by the pretrained model and visualize the relationship between the representations and properties. The concepts of ACs and SH are generalized from the structure-activity context to the representation-property context, and the underlying principles of RePRA are analyzed theoretically. Two scores are designed to measure the generalized ACs and SH detected by RePRA, and therefore, the quality of representations can be evaluated. In experiments, representations of molecules from 10 target tasks generated by 7 pretrained models are analyzed. The results indicate that the state-of-the-art pretrained models can overcome some shortcomings of canonical Extended-Connectivity FingerPrints, while the correlation between the basis of the representation space and specific molecular substructures are not explicit. Thus, some representations could be even worse than the canonical fingerprints. Our method enables researchers to evaluate the quality of molecular representations generated by their proposed self-supervised pretrained models. And our findings can guide the community to develop better pretraining techniques to regularize the occurrence of ACs and SH.
Collapse
Affiliation(s)
- Ziqiao Zhang
- Shanghai
Key Lab of Intelligent Information Processing, and School of Computer
Science, Fudan University, Shanghai 200438, China
| | | | - Ailin Xie
- Shanghai
Key Lab of Intelligent Information Processing, and School of Computer
Science, Fudan University, Shanghai 200438, China
| | - Pengju Han
- Shanghai
Key Lab of Intelligent Information Processing, and School of Computer
Science, Fudan University, Shanghai 200438, China
| | - Shuigeng Zhou
- Shanghai
Key Lab of Intelligent Information Processing, and School of Computer
Science, Fudan University, Shanghai 200438, China
| |
Collapse
|
4
|
Zeng X, Li SJ, Lv SQ, Wen ML, Li Y. A comprehensive review of the recent advances on predicting drug-target affinity based on deep learning. Front Pharmacol 2024; 15:1375522. [PMID: 38628639 PMCID: PMC11019008 DOI: 10.3389/fphar.2024.1375522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 03/21/2024] [Indexed: 04/19/2024] Open
Abstract
Accurate calculation of drug-target affinity (DTA) is crucial for various applications in the pharmaceutical industry, including drug screening, design, and repurposing. However, traditional machine learning methods for calculating DTA often lack accuracy, posing a significant challenge in accurately predicting DTA. Fortunately, deep learning has emerged as a promising approach in computational biology, leading to the development of various deep learning-based methods for DTA prediction. To support researchers in developing novel and highly precision methods, we have provided a comprehensive review of recent advances in predicting DTA using deep learning. We firstly conducted a statistical analysis of commonly used public datasets, providing essential information and introducing the used fields of these datasets. We further explored the common representations of sequences and structures of drugs and targets. These analyses served as the foundation for constructing DTA prediction methods based on deep learning. Next, we focused on explaining how deep learning models, such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Transformer, and Graph Neural Networks (GNNs), were effectively employed in specific DTA prediction methods. We highlighted the unique advantages and applications of these models in the context of DTA prediction. Finally, we conducted a performance analysis of multiple state-of-the-art methods for predicting DTA based on deep learning. The comprehensive review aimed to help researchers understand the shortcomings and advantages of existing methods, and further develop high-precision DTA prediction tool to promote the development of drug discovery.
Collapse
Affiliation(s)
- Xin Zeng
- College of Mathematics and Computer Science, Dali University, Dali, China
| | - Shu-Juan Li
- Yunnan Institute of Endemic Diseases Control and Prevention, Dali, China
| | - Shuang-Qing Lv
- Institute of Surveying and Information Engineering West Yunnan University of Applied Science, Dali, China
| | - Meng-Liang Wen
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, Kunming, China
| | - Yi Li
- College of Mathematics and Computer Science, Dali University, Dali, China
| |
Collapse
|
5
|
Ghandikota SK, Jegga AG. Application of artificial intelligence and machine learning in drug repurposing. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2024; 205:171-211. [PMID: 38789178 DOI: 10.1016/bs.pmbts.2024.03.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2024]
Abstract
The purpose of drug repurposing is to leverage previously approved drugs for a particular disease indication and apply them to another disease. It can be seen as a faster and more cost-effective approach to drug discovery and a powerful tool for achieving precision medicine. In addition, drug repurposing can be used to identify therapeutic candidates for rare diseases and phenotypic conditions with limited information on disease biology. Machine learning and artificial intelligence (AI) methodologies have enabled the construction of effective, data-driven repurposing pipelines by integrating and analyzing large-scale biomedical data. Recent technological advances, especially in heterogeneous network mining and natural language processing, have opened up exciting new opportunities and analytical strategies for drug repurposing. In this review, we first introduce the challenges in repurposing approaches and highlight some success stories, including those during the COVID-19 pandemic. Next, we review some existing computational frameworks in the literature, organized on the basis of the type of biomedical input data analyzed and the computational algorithms involved. In conclusion, we outline some exciting new directions that drug repurposing research may take, as pioneered by the generative AI revolution.
Collapse
Affiliation(s)
- Sudhir K Ghandikota
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States
| | - Anil G Jegga
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States; Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, United States.
| |
Collapse
|
6
|
Iliadis D, De Baets B, Pahikkala T, Waegeman W. A comparison of embedding aggregation strategies in drug-target interaction prediction. BMC Bioinformatics 2024; 25:59. [PMID: 38321386 PMCID: PMC10845509 DOI: 10.1186/s12859-024-05684-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 01/30/2024] [Indexed: 02/08/2024] Open
Abstract
The prediction of interactions between novel drugs and biological targets is a vital step in the early stage of the drug discovery pipeline. Many deep learning approaches have been proposed over the last decade, with a substantial fraction of them sharing the same underlying two-branch architecture. Their distinction is limited to the use of different types of feature representations and branches (multi-layer perceptrons, convolutional neural networks, graph neural networks and transformers). In contrast, the strategy used to combine the outputs (embeddings) of the branches has remained mostly the same. The same general architecture has also been used extensively in the area of recommender systems, where the choice of an aggregation strategy is still an open question. In this work, we investigate the effectiveness of three different embedding aggregation strategies in the area of drug-target interaction (DTI) prediction. We formally define these strategies and prove their universal approximator capabilities. We then present experiments that compare the different strategies on benchmark datasets from the area of DTI prediction, showcasing conditions under which specific strategies could be the obvious choice.
Collapse
Affiliation(s)
- Dimitrios Iliadis
- Department of Data Analysis and Mathematical Modelling, Ghent University, Coupure Links 653, 9000, Ghent, Belgium.
| | - Bernard De Baets
- Department of Data Analysis and Mathematical Modelling, Ghent University, Coupure Links 653, 9000, Ghent, Belgium
| | - Tapio Pahikkala
- Department of Computing, University of Turku, 20500, Turku, Finland
| | - Willem Waegeman
- Department of Data Analysis and Mathematical Modelling, Ghent University, Coupure Links 653, 9000, Ghent, Belgium
| |
Collapse
|
7
|
Zhang C, Zang T, Zhao T. KGE-UNIT: toward the unification of molecular interactions prediction based on knowledge graph and multi-task learning on drug discovery. Brief Bioinform 2024; 25:bbae043. [PMID: 38348746 PMCID: PMC10939374 DOI: 10.1093/bib/bbae043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 12/29/2023] [Accepted: 01/23/2024] [Indexed: 02/15/2024] Open
Abstract
The prediction of molecular interactions is vital for drug discovery. Existing methods often focus on individual prediction tasks and overlook the relationships between them. Additionally, certain tasks encounter limitations due to insufficient data availability, resulting in limited performance. To overcome these limitations, we propose KGE-UNIT, a unified framework that combines knowledge graph embedding (KGE) and multi-task learning, for simultaneous prediction of drug-target interactions (DTIs) and drug-drug interactions (DDIs) and enhancing the performance of each task, even when data availability is limited. Via KGE, we extract heterogeneous features from the drug knowledge graph to enhance the structural features of drug and protein nodes, thereby improving the quality of features. Additionally, employing multi-task learning, we introduce an innovative predictor that comprises the task-aware Convolutional Neural Network-based (CNN-based) encoder and the task-aware attention decoder which can fuse better multimodal features, capture the contextual interactions of molecular tasks and enhance task awareness, leading to improved performance. Experiments on two imbalanced datasets for DTIs and DDIs demonstrate the superiority of KGE-UNIT, achieving high area under the receiver operating characteristics curves (AUROCs) (0.942, 0.987) and area under the precision-recall curve ( AUPRs) (0.930, 0.980) for DTIs and high AUROCs (0.975, 0.989) and AUPRs (0.966, 0.988) for DDIs. Notably, on the LUO dataset where the data were more limited, KGE-UNIT exhibited a more pronounced improvement, with increases of 4.32$\%$ in AUROC and 3.56$\%$ in AUPR for DTIs and 6.56$\%$ in AUROC and 8.17$\%$ in AUPR for DDIs. The scalability of KGE-UNIT is demonstrated through its extension to protein-protein interactions prediction, ablation studies and case studies further validate its effectiveness.
Collapse
Affiliation(s)
- Chengcheng Zhang
- Department of Computer Science, Harbin Institute of Technology, Harbin, 150001, China
| | - Tianyi Zang
- Department of Computer Science, Harbin Institute of Technology, Harbin, 150001, China
| | - Tianyi Zhao
- School of Medicine and Health, Harbin Institute of Technology, Harbin, 150001, China
| |
Collapse
|
8
|
Jiang M, Shao Y, Zhang Y, Zhou W, Pang S. A deep learning method for drug-target affinity prediction based on sequence interaction information mining. PeerJ 2023; 11:e16625. [PMID: 38099302 PMCID: PMC10720480 DOI: 10.7717/peerj.16625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Accepted: 11/16/2023] [Indexed: 12/17/2023] Open
Abstract
Background A critical aspect of in silico drug discovery involves the prediction of drug-target affinity (DTA). Conducting wet lab experiments to determine affinity is both expensive and time-consuming, making it necessary to find alternative approaches. In recent years, deep learning has emerged as a promising technique for DTA prediction, leveraging the substantial computational power of modern computers. Methods We proposed a novel sequence-based approach, named KC-DTA, for predicting drug-target affinity (DTA). In this approach, we converted the target sequence into two distinct matrices, while representing the molecule compound as a graph. The proposed method utilized k-mers analysis and Cartesian product calculation to capture the interactions and evolutionary information among various residues, enabling the creation of the two matrices for target sequence. For molecule, it was represented by constructing a molecular graph where atoms serve as nodes and chemical bonds serve as edges. Subsequently, the obtained target matrices and molecule graph were utilized as inputs for convolutional neural networks (CNNs) and graph neural networks (GNNs) to extract hidden features, which were further used for the prediction of binding affinity. Results In order to evaluate the effectiveness of the proposed method, we conducted several experiments and made a comprehensive comparison with the state-of-the-art approaches using multiple evaluation metrics. The results of our experiments demonstrated that the KC-DTA method achieves high performance in predicting drug-target affinity (DTA). The findings of this research underscore the significance of the KC-DTA method as a valuable tool in the field of in silico drug discovery, offering promising opportunities for accelerating the drug development process. All the data and code are available for access on https://github.com/syc2017/KCDTA.
Collapse
Affiliation(s)
- Mingjian Jiang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, Shandong, China
| | - Yunchang Shao
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, Shandong, China
| | - Yuanyuan Zhang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, Shandong, China
| | - Wei Zhou
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, Shandong, China
| | - Shunpeng Pang
- School of Computer Engineering, WeiFang University, Weifang, Shandong, China
| |
Collapse
|
9
|
Zhang L, Wang CC, Zhang Y, Chen X. GPCNDTA: Prediction of drug-target binding affinity through cross-attention networks augmented with graph features and pharmacophores. Comput Biol Med 2023; 166:107512. [PMID: 37788507 DOI: 10.1016/j.compbiomed.2023.107512] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 08/28/2023] [Accepted: 09/19/2023] [Indexed: 10/05/2023]
Abstract
Drug-target affinity prediction is a challenging task in drug discovery. The latest computational models have limitations in mining edge information in molecule graphs, accessing to knowledge in pharmacophores, integrating multimodal data of the same biomolecule and realizing effective interactions between two different biomolecules. To solve these problems, we proposed a method called Graph features and Pharmacophores augmented Cross-attention Networks based Drug-Target binding Affinity prediction (GPCNDTA). First, we utilized the GNN module, the linear projection unit and self-attention layer to correspondingly extract features of drugs and proteins. Second, we devised intramolecular and intermolecular cross-attention to respectively fuse and interact features of drugs and proteins. Finally, the linear projection unit was applied to gain final features of drugs and proteins, and the Multi-Layer Perceptron was employed to predict drug-target binding affinity. Three major innovations of GPCNDTA are as follows: (i) developing the residual CensNet and the residual EW-GCN to correspondingly extract features of drug and protein graphs, (ii) regarding pharmacophores as a new type of priors to heighten drug-target affinity prediction performance, and (iii) devising intramolecular and intermolecular cross-attention, in which the intramolecular cross-attention realizes the effective fusion of different modal data related to the same biomolecule, and the intermolecular cross-attention fulfills the information interaction between two different biomolecules in attention space. The test results on five benchmark datasets imply that GPCNDTA achieves the best performance compared with state-of-the-art computational models. Besides, relying on ablation experiments, we proved effectiveness of GNN modules, pharmacophores and two cross-attention strategies in improving the prediction accuracy, stability and reliability of GPCNDA. In case studies, we applied GPCNDTA to predict binding affinities between 3C-like proteinase and 185 drugs, and observed that most binding affinities predicted by GPCNDTA are close to corresponding experimental measurements.
Collapse
Affiliation(s)
- Li Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Chun-Chun Wang
- School of Science, Jiangnan University, Wuxi, 214122, China
| | - Yong Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Xing Chen
- School of Science, Jiangnan University, Wuxi, 214122, China.
| |
Collapse
|
10
|
Ong WJG, Kirubakaran P, Karanicolas J. Poor Generalization by Current Deep Learning Models for Predicting Binding Affinities of Kinase Inhibitors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.04.556234. [PMID: 37732243 PMCID: PMC10508770 DOI: 10.1101/2023.09.04.556234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/22/2023]
Abstract
The extreme surge of interest over the past decade surrounding the use of neural networks has inspired many groups to deploy them for predicting binding affinities of drug-like molecules to their receptors. A model that can accurately make such predictions has the potential to screen large chemical libraries and help streamline the drug discovery process. However, despite reports of models that accurately predict quantitative inhibition using protein kinase sequences and inhibitors' SMILES strings, it is still unclear whether these models can generalize to previously unseen data. Here, we build a Convolutional Neural Network (CNN) analogous to those previously reported and evaluate the model over four datasets commonly used for inhibitor/kinase predictions. We find that the model performs comparably to those previously reported, provided that the individual data points are randomly split between the training set and the test set. However, model performance is dramatically deteriorated when all data for a given inhibitor is placed together in the same training/testing fold, implying that information leakage underlies the models' performance. Through comparison to simple models in which the SMILES strings are tokenized, or in which test set predictions are simply copied from the closest training set data points, we demonstrate that there is essentially no generalization whatsoever in this model. In other words, the model has not learned anything about molecular interactions, and does not provide any benefit over much simpler and more transparent models. These observations strongly point to the need for richer structure-based encodings, to obtain useful prospective predictions of not-yet-synthesized candidate inhibitors.
Collapse
Affiliation(s)
- Wern Juin Gabriel Ong
- Cancer Signaling & Microenvironment Program, Fox Chase Cancer Center, Philadelphia, PA 19111
- Bowdoin College, Brunswick, ME 04011
| | - Palani Kirubakaran
- Cancer Signaling & Microenvironment Program, Fox Chase Cancer Center, Philadelphia, PA 19111
| | - John Karanicolas
- Cancer Signaling & Microenvironment Program, Fox Chase Cancer Center, Philadelphia, PA 19111
| |
Collapse
|
11
|
Sun J, Si S, Ru J, Wang X. DeepdlncUD: Predicting regulation types of small molecule inhibitors on modulating lncRNA expression by deep learning. Comput Biol Med 2023; 163:107226. [PMID: 37450966 DOI: 10.1016/j.compbiomed.2023.107226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Revised: 05/31/2023] [Accepted: 07/01/2023] [Indexed: 07/18/2023]
Abstract
Targeting lncRNAs by small molecules (SM-lncR) to alter their expression levels has emerged as an important therapeutic modality for disease treatment. To date, no computational tools have been dedicated to predicting small molecule-mediated upregulation or downregulation of lncRNA expression. Here, we introduce DeepdlncUD, which integrates predictions of nine deep learning algorithms together, to infer the regulation types of small molecules on modulating lncRNA expression. Through systematic optimization on a training set of 771 upregulation and 739 downregulation SM-lncR pairs, each encoding 1369 sequence, representational, and physiochemical features, this method outperforms a recently released program, DeepsmirUD, by achieving 0.674 in AUC (area under the receiver operating characteristic curve), 0.722 in AUCPR (area under the precision-recall curve), 0.681 in F1-score, and 0.516 in Jaccard Index on a test set of 222 SM-lncR pairs. By extracting 125 upregulation and 46 downregulation SM-lncR pairs that involve disease-associated lncRNAs, DeepdlncUD is shown to gain an accuracy of 0.700 in the pathological context. Using connectivity scores, around half of the small molecules are correctly estimated as drugs to treat lncRNA-regulated diseases. This tool can be run at a fast speed to assist the discovery of potential small molecule drugs of lncRNA targets on a large scale. DeepdlncUD is publicly available at https://github.com/2003100127/deepdlncud.
Collapse
Affiliation(s)
- Jianfeng Sun
- Botnar Research Centre, University of Oxford, Headington, Oxford, OX3 7LD, UK.
| | - Shuyue Si
- School of Mathematics and Physics, Xi'an Jiaotong-liverpool University, Renai, Suzhou, 215028, China
| | - Jinlong Ru
- Chair of Prevention of Microbial Diseases, School of Life Sciences Weihenstephan, Technical University of Munich, 85354, Freising, Germany
| | - Xia Wang
- College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China.
| |
Collapse
|
12
|
Yi J, Lee S, Lim S, Cho C, Piao Y, Yeo M, Kim D, Kim S, Lee S. Exploring chemical space for lead identification by propagating on chemical similarity network. Comput Struct Biotechnol J 2023; 21:4187-4195. [PMID: 37680266 PMCID: PMC10480321 DOI: 10.1016/j.csbj.2023.08.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 08/08/2023] [Accepted: 08/20/2023] [Indexed: 09/09/2023] Open
Abstract
Motivation Lead identification is a fundamental step to prioritize candidate compounds for downstream drug discovery process. Machine learning (ML) and deep learning (DL) approaches are widely used to identify lead compounds using both chemical property and experimental information. However, ML or DL methods rarely consider compound similarity information directly since ML and DL models use abstract representation of molecules for model construction. Alternatively, data mining approaches are also used to explore chemical space with drug candidates by screening undesirable compounds. A major challenge for data mining approaches is to develop efficient data mining methods that search large chemical space for desirable lead compounds with low false positive rate. Results In this work, we developed a network propagation (NP) based data mining method for lead identification that performs search on an ensemble of chemical similarity networks. We compiled 14 fingerprint-based similarity networks. Given a target protein of interest, we use a deep learning-based drug target interaction model to narrow down compound candidates and then we use network propagation to prioritize drug candidates that are highly correlated with drug activity score such as IC50. In an extensive experiment with BindingDB, we showed that our approach successfully discovered intentionally unlabeled compounds for given targets. To further demonstrate the prediction power of our approach, we identified 24 candidate leads for CLK1. Two out of five synthesizable candidates were experimentally validated in binding assays. In conclusion, our framework can be very useful for lead identification from very large compound databases such as ZINC.
Collapse
Affiliation(s)
- Jungseob Yi
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea
| | - Sangseon Lee
- Institute of Computer Technology, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea
| | - Sangsoo Lim
- School of AI Software Convergence, Dongguk University, Pildong-ro 1-gil, Jung-gu, Seoul, South Korea
| | - Changyun Cho
- Interdisciplinary Program in Bioinformatics, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea
| | - Yinhua Piao
- Department of Computer Science and Engineering, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea
| | - Marie Yeo
- PHARMGENSCIENCE CO., LTD., 216, Dongjak-daero, Seocho-gu, Seoul, 06554, South Korea
| | - Dongkyu Kim
- PHARMGENSCIENCE CO., LTD., 216, Dongjak-daero, Seocho-gu, Seoul, 06554, South Korea
| | - Sun Kim
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea
- Department of Computer Science and Engineering, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea
- AIGENDRUG CO., LTD., Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea
| | - Sunho Lee
- AIGENDRUG CO., LTD., Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea
| |
Collapse
|
13
|
Sun J, Xu M, Ru J, James-Bott A, Xiong D, Wang X, Cribbs AP. Small molecule-mediated targeting of microRNAs for drug discovery: Experiments, computational techniques, and disease implications. Eur J Med Chem 2023; 257:115500. [PMID: 37262996 DOI: 10.1016/j.ejmech.2023.115500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 05/05/2023] [Accepted: 05/15/2023] [Indexed: 06/03/2023]
Abstract
Small molecules have been providing medical breakthroughs for human diseases for more than a century. Recently, identifying small molecule inhibitors that target microRNAs (miRNAs) has gained importance, despite the challenges posed by labour-intensive screening experiments and the significant efforts required for medicinal chemistry optimization. Numerous experimentally-verified cases have demonstrated the potential of miRNA-targeted small molecule inhibitors for disease treatment. This new approach is grounded in their posttranscriptional regulation of the expression of disease-associated genes. Reversing dysregulated gene expression using this mechanism may help control dysfunctional pathways. Furthermore, the ongoing improvement of algorithms has allowed for the integration of computational strategies built on top of laboratory-based data, facilitating a more precise and rational design and discovery of lead compounds. To complement the use of extensive pharmacogenomics data in prioritising potential drugs, our previous work introduced a computational approach based on only molecular sequences. Moreover, various computational tools for predicting molecular interactions in biological networks using similarity-based inference techniques have been accumulated in established studies. However, there are a limited number of comprehensive reviews covering both computational and experimental drug discovery processes. In this review, we outline a cohesive overview of both biological and computational applications in miRNA-targeted drug discovery, along with their disease implications and clinical significance. Finally, utilizing drug-target interaction (DTIs) data from DrugBank, we showcase the effectiveness of deep learning for obtaining the physicochemical characterization of DTIs.
Collapse
Affiliation(s)
- Jianfeng Sun
- Botnar Research Centre, Nuffield Department of Orthopedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK.
| | - Miaoer Xu
- Department of Biology, Emory University, Atlanta, GA, 30322, USA
| | - Jinlong Ru
- Chair of Prevention of Microbial Diseases, School of Life Sciences Weihenstephan, Technical University of Munich, Freising, 85354, Germany
| | - Anna James-Bott
- Botnar Research Centre, Nuffield Department of Orthopedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK
| | - Dapeng Xiong
- Department of Computational Biology, Cornell University, Ithaca, NY, 14853, USA; Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, 14853, USA
| | - Xia Wang
- College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China.
| | - Adam P Cribbs
- Botnar Research Centre, Nuffield Department of Orthopedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK.
| |
Collapse
|
14
|
D’Souza S, Prema KV, Balaji S, Shah R. Deep Learning-Based Modeling of Drug–Target Interaction Prediction Incorporating Binding Site Information of Proteins. INTERDISCIPLINARY SCIENCES: COMPUTATIONAL LIFE SCIENCES 2023; 15:306-315. [PMID: 36967455 PMCID: PMC10148762 DOI: 10.1007/s12539-023-00557-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 02/22/2023] [Accepted: 02/22/2023] [Indexed: 03/29/2023]
Abstract
AbstractChemogenomics, also known as proteochemometrics, covers various computational methods for predicting interactions between related drugs and targets on large-scale data. Chemogenomics is used in the early stages of drug discovery to predict the off-target effects of proteins against therapeutic candidates. This study aims to predict unknown ligand–target interactions using one-dimensional SMILES as inputs for ligands and binding site residues for proteins in a computationally efficient manner. We first formulate a Deep learning CNN model using one-dimensional SMILES for drugs and motif-rich binding pocket subsequences of proteins as inputs. We evaluate and compare the proposed deep learning model trained on expert-based features against shallow feature-based machine learning methods. The proposed method achieved better or similar performance on the MSE and AUPR metrics than the shallow methods. Additionally, We show that our deep learning model, DeepPS is computationally more efficient than the deep learning model trained on full-length raw sequences of proteins. We conclude that a beneficial research approach would be to integrate structural information of proteins for modeling drug-target interaction prediction of large datasets for more interpretability, high throughput, and broad applicability.
Graphical abstract
Collapse
Affiliation(s)
- Sofia D’Souza
- Department of Computer Science and Engineering, Manipal Academy of Higher Education, Manipal, India
| | - K. V. Prema
- Department of Computer Science and Engineering, Manipal Academy of Higher Education, Bengaluru, India
| | - S. Balaji
- Department of Biotechnology, Manipal Academy of Higher Education, Manipal, India
| | - Ronak Shah
- Department of Computer Science and Engineering, Manipal Academy of Higher Education, Manipal, India
| |
Collapse
|
15
|
Yang Z, Zhong W, Lv Q, Dong T, Yu-Chian Chen C. Geometric Interaction Graph Neural Network for Predicting Protein-Ligand Binding Affinities from 3D Structures (GIGN). J Phys Chem Lett 2023; 14:2020-2033. [PMID: 36794930 DOI: 10.1021/acs.jpclett.2c03906] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Predicting protein-ligand binding affinities (PLAs) is a core problem in drug discovery. Recent advances have shown great potential in applying machine learning (ML) for PLA prediction. However, most of them omit the 3D structures of complexes and physical interactions between proteins and ligands, which are considered essential to understanding the binding mechanism. This paper proposes a geometric interaction graph neural network (GIGN) that incorporates 3D structures and physical interactions for predicting protein-ligand binding affinities. Specifically, we design a heterogeneous interaction layer that unifies covalent and noncovalent interactions into the message passing phase to learn node representations more effectively. The heterogeneous interaction layer also follows fundamental biological laws, including invariance to translations and rotations of the complexes, thus avoiding expensive data augmentation strategies. GIGN achieves state-of-the-art performance on three external test sets. Moreover, by visualizing learned representations of protein-ligand complexes, we show that the predictions of GIGN are biologically meaningful.
Collapse
Affiliation(s)
- Ziduo Yang
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
| | - Weihe Zhong
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
| | - Qiujie Lv
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
| | - Tiejun Dong
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
| | - Calvin Yu-Chian Chen
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
- Department of Medical Research, China Medical University Hospital, Taichung 40447, Taiwan
- Department of Bioinformatics and Medical Engineering, Asia University, Taichung 41354, Taiwan
| |
Collapse
|
16
|
Wu L, Gao J, Zhang Y, Sui B, Wen Y, Wu Q, Liu K, He S, Bo X. A hybrid deep forest-based method for predicting synergistic drug combinations. CELL REPORTS METHODS 2023; 3:100411. [PMID: 36936075 PMCID: PMC10014304 DOI: 10.1016/j.crmeth.2023.100411] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Revised: 11/27/2022] [Accepted: 01/27/2023] [Indexed: 02/23/2023]
Abstract
Combination therapy is a promising approach in treating multiple complex diseases. However, the large search space of available drug combinations exacerbates challenge for experimental screening. To predict synergistic drug combinations in different cancer cell lines, we propose an improved deep forest-based method, ForSyn, and design two forest types embedded in ForSyn. ForSyn handles imbalanced and high-dimensional data in medium-/small-scale datasets, which are inherent characteristics of drug combination datasets. Compared with 12 state-of-the-art methods, ForSyn ranks first on four metrics for eight datasets with different feature combinations. We conduct a systematic analysis to identify the most appropriate configuration parameters. We validate the predictive value of ForSyn with cell-based experiments on several previously unexplored drug combinations. Finally, a systematic analysis of feature importance is performed on the top contributing features extracted by ForSyn. The resulting key genes may play key roles on corresponding cancers.
Collapse
Affiliation(s)
- Lianlian Wu
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072, China
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Jie Gao
- Department of Epidemiology and Health Statistics, School of Public Health, Fujian Medical University, Fuzhou 350122, China
| | - Yixin Zhang
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Binsheng Sui
- School of Film, Xiamen University, Xiamen 361005, China
| | - Yuqi Wen
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Qingqiang Wu
- School of Film, Xiamen University, Xiamen 361005, China
| | - Kunhong Liu
- School of Film, Xiamen University, Xiamen 361005, China
| | - Song He
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Xiaochen Bo
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072, China
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| |
Collapse
|
17
|
Atas Guvenilir H, Doğan T. How to approach machine learning-based prediction of drug/compound-target interactions. J Cheminform 2023; 15:16. [PMID: 36747300 PMCID: PMC9901167 DOI: 10.1186/s13321-023-00689-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 01/30/2023] [Indexed: 02/08/2023] Open
Abstract
The identification of drug/compound-target interactions (DTIs) constitutes the basis of drug discovery, for which computational predictive approaches have been developed. As a relatively new data-driven paradigm, proteochemometric (PCM) modeling utilizes both protein and compound properties as a pair at the input level and processes them via statistical/machine learning. The representation of input samples (i.e., proteins and their ligands) in the form of quantitative feature vectors is crucial for the extraction of interaction-related properties during the artificial learning and subsequent prediction of DTIs. Lately, the representation learning approach, in which input samples are automatically featurized via training and applying a machine/deep learning model, has been utilized in biomedical sciences. In this study, we performed a comprehensive investigation of different computational approaches/techniques for protein featurization (including both conventional approaches and the novel learned embeddings), data preparation and exploration, machine learning-based modeling, and performance evaluation with the aim of achieving better data representations and more successful learning in DTI prediction. For this, we first constructed realistic and challenging benchmark datasets on small, medium, and large scales to be used as reliable gold standards for specific DTI modeling tasks. We developed and applied a network analysis-based splitting strategy to divide datasets into structurally different training and test folds. Using these datasets together with various featurization methods, we trained and tested DTI prediction models and evaluated their performance from different angles. Our main findings can be summarized under 3 items: (i) random splitting of datasets into train and test folds leads to near-complete data memorization and produce highly over-optimistic results, as a result, should be avoided, (ii) learned protein sequence embeddings work well in DTI prediction and offer high potential, despite interaction-related properties (e.g., structures) of proteins are unused during their self-supervised model training, and (iii) during the learning process, PCM models tend to rely heavily on compound features while partially ignoring protein features, primarily due to the inherent bias in DTI data, indicating the requirement for new and unbiased datasets. We hope this study will aid researchers in designing robust and high-performing data-driven DTI prediction systems that have real-world translational value in drug discovery.
Collapse
Affiliation(s)
- Heval Atas Guvenilir
- Biological Data Science Laboratory, Department of Computer Engineering, Hacettepe University, Ankara, Turkey
- Department of Health Informatics, Graduate School of Informatics, METU, Ankara, Turkey
| | - Tunca Doğan
- Biological Data Science Laboratory, Department of Computer Engineering, Hacettepe University, Ankara, Turkey.
- Institute of Informatics, Hacettepe University, Ankara, Turkey.
- Department of Bioinformatics, Graduate School of Health Sciences, Hacettepe University, Ankara, Turkey.
| |
Collapse
|
18
|
Hu Z, Liu W, Zhang C, Huang J, Zhang S, Yu H, Xiong Y, Liu H, Ke S, Hong L. SAM-DTA: a sequence-agnostic model for drug-target binding affinity prediction. Brief Bioinform 2023; 24:6955272. [PMID: 36545795 DOI: 10.1093/bib/bbac533] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 10/05/2022] [Accepted: 11/07/2022] [Indexed: 12/24/2022] Open
Abstract
Drug-target binding affinity prediction is a fundamental task for drug discovery and has been studied for decades. Most methods follow the canonical paradigm that processes the inputs of the protein (target) and the ligand (drug) separately and then combines them together. In this study we demonstrate, surprisingly, that a model is able to achieve even superior performance without access to any protein-sequence-related information. Instead, a protein is characterized completely by the ligands that it interacts. Specifically, we treat different proteins separately, which are jointly trained in a multi-head manner, so as to learn a robust and universal representation of ligands that is generalizable across proteins. Empirical evidences show that the novel paradigm outperforms its competitive sequence-based counterpart, with the Mean Squared Error (MSE) of 0.4261 versus 0.7612 and the R-Square of 0.7984 versus 0.6570 compared with DeepAffinity. We also investigate the transfer learning scenario where unseen proteins are encountered after the initial training, and the cross-dataset evaluation for prospective studies. The results reveals the robustness of the proposed model in generalizing to unseen proteins as well as in predicting future data. Source codes and data are available at https://github.com/huzqatpku/SAM-DTA.
Collapse
Affiliation(s)
| | - Wenfeng Liu
- Shanghai Matwings Technology Co., Ltd., Shanghai, 200240, China
- Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | | | - Jiawen Huang
- Shanghai Matwings Technology Co., Ltd., Shanghai, 200240, China
- Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Shaoting Zhang
- SenseTime Research, Shanghai, 201103, China
- Shanghai Artificial Intelligence Laboratory, Shanghai 200232, China
| | - Huiqun Yu
- Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Yi Xiong
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Hao Liu
- Shanghai Matwings Technology Co., Ltd., Shanghai, 200240, China
- Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Song Ke
- Shanghai Matwings Technology Co., Ltd., Shanghai, 200240, China
| | - Liang Hong
- School of Pharmacy, Shanghai Jiao Tong University, Shanghai 200240, China
- School of Physics and Astronomy, Shanghai Jiao Tong University, Shanghai 200240, China
- Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
- Shanghai Artificial Intelligence Laboratory, Shanghai 200232, China
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
19
|
Vora DS, Kalakoti Y, Sundar D. Computational Methods and Deep Learning for Elucidating Protein Interaction Networks. Methods Mol Biol 2023; 2553:285-323. [PMID: 36227550 DOI: 10.1007/978-1-0716-2617-7_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Protein interactions play a critical role in all biological processes, but experimental identification of protein interactions is a time- and resource-intensive process. The advances in next-generation sequencing and multi-omics technologies have greatly benefited large-scale predictions of protein interactions using machine learning methods. A wide range of tools have been developed to predict protein-protein, protein-nucleic acid, and protein-drug interactions. Here, we discuss the applications, methods, and challenges faced when employing the various prediction methods. We also briefly describe ways to overcome the challenges and prospective future developments in the field of protein interaction biology.
Collapse
Affiliation(s)
- Dhvani Sandip Vora
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, India
| | - Yogesh Kalakoti
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, India
| | - Durai Sundar
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, India.
- School of Artificial Intelligence, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, India.
| |
Collapse
|
20
|
Ciray F, Doğan T. Machine learning-based prediction of drug approvals using molecular, physicochemical, clinical trial, and patent-related features. Expert Opin Drug Discov 2022; 17:1425-1441. [PMID: 36444655 DOI: 10.1080/17460441.2023.2153830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
BACKGROUND Drug development productivity has been declining lately due to elevated costs and reduced discovery rates. Therefore, pharmaceutical companies have been seeking alternative ways to determine and evaluate drug candidates. RESEARCH DESIGN AND METHODS In this work, we proposed a new computational approach to directly predict the regulatory approval of drug candidates, and implemented it as a method called 'DrugApp.' To accomplish this task, we employed multiple types of features including molecular and physicochemical properties of drug candidates, together with clinical trial and patent-related features, which are then processed by random forest classifiers to train our disease group-specific approval prediction models. RESULTS Our evaluations indicated DrugApp has a high and robust prediction performance. Within a use-case study, we showed our method can predict phase IV trial drugs that are later withdrawn from the market due to severe side effects. Finally, we used DrugApp models to forecast the approval of drug candidates that are currently in phases I/II/III of clinical trials. CONCLUSIONS We hope that our study will aid the research community in terms of evaluating and improving the process of drug development. The datasets, source code, results, and pre-trained models of DrugApp are freely available at https://github.com/HUBioDataLab/DrugApp.
Collapse
Affiliation(s)
- Fulya Ciray
- Biological Data Science Laboratory, Department of Computer Engineering, Hacettepe University, Ankara, Turkey.,Department of Health Informatics, Graduate School of Informatics, METU, Ankara, Turkey
| | - Tunca Doğan
- Biological Data Science Laboratory, Department of Computer Engineering, Hacettepe University, Ankara, Turkey.,Department of Health Informatics, Institute of Informatics, Hacettepe University, Ankara, Turkey.,Department of Bioinformatics, Graduate School of Health Sciences, Hacettepe University, Ankara, Turkey
| |
Collapse
|
21
|
Dong R, Yang H, Ai C, Duan G, Wang J, Guo F. DeepBLI: A Transferable Multichannel Model for Detecting β-Lactamase-Inhibitor Interaction. J Chem Inf Model 2022; 62:5830-5840. [PMID: 36245217 DOI: 10.1021/acs.jcim.2c01008] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Pathogens producing β-lactamase pose a great challenge to antibiotic-resistant infection treatment; thus, it is urgent to discover novel β-lactamase inhibitors for drug development. Conventional high-throughput screening is very costly, and structure-based virtual screening is limited with mechanisms. In this study, we construct a novel multichannel deep neural network (DeepBLI) for β-lactamase inhibitor screening, pretrained with a label reversal KIBA data set and fine-tuned on β-lactamase-inhibitor pairs from BindingDB. First, the pairs of encoders (Conv and Att) fuse the information spatially and sequentially for both enzymes and inhibitors. Then, a co-attention module creates the connection between the inhibitor and enzyme embeddings. Finally, multichannel outputs fuse with an element-wise product and then are fed into 3-layer fully connected networks to predict interactions. Comparing the state-of-the-art methods, DeepBLI yields an AUROC of 0.9240 and an AUPRC of 0.9715, which indicates that it can identify new β-lactamase-inhibitor interactions. To demonstrate its prediction ability, an application of DeepBLI is described to screen potential inhibitor compounds for metallo-β-lactamase AIM-1 and repurpose rottlerin for four classes of β-lactamase targets, showing the possibility of being a broad-spectrum inhibitor. DeepBLI provides an effective way for antibacterial drug development, contributing to antibiotic-resistant therapeutics.
Collapse
Affiliation(s)
- Ruihan Dong
- Academy for Advanced Interdisciplinary Studies, Peking University, Beijing100871, China
| | - Hongpeng Yang
- Department of Computer Science and Engineering, University of South Carolina, Columbia, South Carolina29208, United States
| | - Chengwei Ai
- College of Intelligence and Computing, Tianjin University, Tianjin300350, China
| | - Guihua Duan
- School of Computer Science and Engineering, Central South University, Changsha410083, China
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha410083, China
| | - Fei Guo
- School of Computer Science and Engineering, Central South University, Changsha410083, China
| |
Collapse
|
22
|
Aleb N. A Mutual Attention Model for Drug Target Binding Affinity Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3224-3232. [PMID: 34665738 DOI: 10.1109/tcbb.2021.3121275] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Vrious machine learning approaches have been developed for drug-target interaction (DTI) prediction. One class of these approaches, DTBA, is interested in Drug-Target Binding Affinity strength, rather than focusing merely on the presence or absence of interaction. Several machine learning methods have been developed for this purpose. However, almost all depend heavily on the use of increasingly sophisticated inputs to improve their performance. In addition, these methods do not allow any analysis or interpretation due to their black-box characteristic. This work is an attempt to overcome these limitations by taking advantage of the use of attention mechanisms with convolution models. In this paper, we define a new mutual attention based model for DTBA prediction. We represent both compounds and targets by sequences. Our model starts by aligning the drug-target pairs, then a learned masking is performed to retain the most promising regions, of both sequences, and amplify them with a learned factor in such a way to make the learning focus more on them. We evaluate the performance of our method on two benchmark datasets, KIBA and Davis. The results show that our mutual attention approach is very effective. Compared to other well-known approaches, it achieved excellent results regarding the considered performance metrics.
Collapse
|
23
|
PD-BertEDL: An Ensemble Deep Learning Method Using BERT and Multivariate Representation to Predict Peptide Detectability. Int J Mol Sci 2022; 23:ijms232012385. [PMID: 36293242 PMCID: PMC9604182 DOI: 10.3390/ijms232012385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Revised: 10/11/2022] [Accepted: 10/12/2022] [Indexed: 12/03/2022] Open
Abstract
Peptide detectability is defined as the probability of identifying a peptide from a mixture of standard samples, which is a key step in protein identification and analysis. Exploring effective methods for predicting peptide detectability is helpful for disease treatment and clinical research. However, most existing computational methods for predicting peptide detectability rely on a single information. With the increasing complexity of feature representation, it is necessary to explore the influence of multivariate information on peptide detectability. Thus, we propose an ensemble deep learning method, PD-BertEDL. Bidirectional encoder representations from transformers (BERT) is introduced to capture the context information of peptides. Context information, sequence information, and physicochemical information of peptides were combined to construct the multivariate feature space of peptides. We use different deep learning methods to capture the high-quality features of different categories of peptides information and use the average fusion strategy to integrate three model prediction results to solve the heterogeneity problem and to enhance the robustness and adaptability of the model. The experimental results show that PD-BertEDL is superior to the existing prediction methods, which can effectively predict peptide detectability and provide strong support for protein identification and quantitative analysis, as well as disease treatment.
Collapse
|
24
|
Hierarchical graph representation learning for the prediction of drug-target binding affinity. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.09.043] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
25
|
Özsarı G, Rifaioglu AS, Atakan A, Doğan T, Martin MJ, Çetin Atalay R, Atalay V. SLPred: a multi-view subcellular localization prediction tool for multi-location human proteins. Bioinformatics 2022; 38:4226-4229. [PMID: 35801913 DOI: 10.1093/bioinformatics/btac458] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2022] [Revised: 06/08/2022] [Accepted: 07/07/2022] [Indexed: 12/24/2022] Open
Abstract
SUMMARY Accurate prediction of the subcellular locations (SLs) of proteins is a critical topic in protein science. In this study, we present SLPred, an ensemble-based multi-view and multi-label protein subcellular localization prediction tool. For a query protein sequence, SLPred provides predictions for nine main SLs using independent machine-learning models trained for each location. We used UniProtKB/Swiss-Prot human protein entries and their curated SL annotations as our source data. We connected all disjoint terms in the UniProt SL hierarchy based on the corresponding term relationships in the cellular component category of Gene Ontology and constructed a training dataset that is both reliable and large scale using the re-organized hierarchy. We tested SLPred on multiple benchmarking datasets including our-in house sets and compared its performance against six state-of-the-art methods. Results indicated that SLPred outperforms other tools in the majority of cases. AVAILABILITY AND IMPLEMENTATION SLPred is available both as an open-access and user-friendly web-server (https://slpred.kansil.org) and a stand-alone tool (https://github.com/kansil/SLPred). All datasets used in this study are also available at https://slpred.kansil.org. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gökhan Özsarı
- Department of Computer Engineering, Middle East Technical University, Ankara 06800, Turkey.,Department of Computer Engineering, Niğde Ömer Halisdemir University, Niğde 51240, Turkey
| | - Ahmet Sureyya Rifaioglu
- Department of Computer Engineering, İskenderun Technical University, Hatay 31200, Turkey.,Faculty of Medicine, Institute for Computational Biomedicine, Heidelberg University and Heidelberg University Hospital, Heidelberg 69120, Germany
| | - Ahmet Atakan
- Department of Computer Engineering, Middle East Technical University, Ankara 06800, Turkey.,Department of Computer Engineering, Erzincan Binali Yıldırım University, Erzincan 24002, Turkey
| | - Tunca Doğan
- Department of Computer Engineering, Hacettepe University, Ankara 06800, Turkey
| | - Maria Jesus Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, Hinxton CB10 1SD, UK
| | - Rengül Çetin Atalay
- Graduate School of Informatics Middle East Technical University, Ankara 06800, Turkey.,Section of Pulmonary and Critical Care Medicine, the University of Chicago, Chicago, IL 60637, USA
| | - Volkan Atalay
- Department of Computer Engineering, Middle East Technical University, Ankara 06800, Turkey
| |
Collapse
|
26
|
Cheng Z, Zhao Q, Li Y, Wang J. IIFDTI: predicting drug-target interactions through interactive and independent features based on attention mechanism. Bioinformatics 2022; 38:4153-4161. [PMID: 35801934 DOI: 10.1093/bioinformatics/btac485] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 05/02/2022] [Accepted: 07/07/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Identifying drug-target interactions is a crucial step for drug discovery and design. Traditional biochemical experiments are credible to accurately validate drug-target interactions. However, they are also extremely laborious, time-consuming and expensive. With the collection of more validated biomedical data and the advancement of computing technology, the computational methods based on chemogenomics gradually attract more attention, which guide the experimental verifications. RESULTS In this study, we propose an end-to-end deep learning-based method named IIFDTI to predict drug-target interactions (DTIs) based on independent features of drug-target pairs and interactive features of their substructures. First, the interactive features of substructures between drugs and targets are extracted by the bidirectional encoder-decoder architecture. The independent features of drugs and targets are extracted by the graph neural networks and convolutional neural networks, respectively. Then, all extracted features are fused and inputted into fully connected dense layers in downstream tasks for predicting DTIs. IIFDTI takes into account the independent features of drugs/targets and simulates the interactive features of the substructures from the biological perspective. Multiple experiments show that IIFDTI outperforms the state-of-the-art methods in terms of the area under the receiver operating characteristics curve (AUC), the area under the precision-recall curve (AUPR), precision, and recall on benchmark datasets. In addition, the mapped visualizations of attention weights indicate that IIFDTI has learned the biological knowledge insights, and two case studies illustrate the capabilities of IIFDTI in practical applications. AVAILABILITY AND IMPLEMENTATION The data and codes underlying this article are available in Github at https://github.com/czjczj/IIFDTI. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhongjian Cheng
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Qichang Zhao
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Yaohang Li
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
27
|
Jia XN, Wang WJ, Yin B, Zhou LJ, Zhen YQ, Zhang L, Zhou XL, Song HN, Tang Y, Gao F. Deep Learning Promotes the Screening of Natural Products with Potential Microtubule Inhibition Activity. ACS OMEGA 2022; 7:28334-28341. [PMID: 35990425 PMCID: PMC9386835 DOI: 10.1021/acsomega.2c02854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Accepted: 07/27/2022] [Indexed: 06/15/2023]
Abstract
Natural microtubule inhibitors, such as paclitaxel and ixabepilone, are key sources of novel medications, which have a considerable influence on anti-tumor chemotherapy. Natural product chemists have been encouraged to create novel methodologies for screening the new generation of microtubule inhibitors from the enormous natural product library. There have been major advancements in the use of artificial intelligence in medication discovery recently. Deep learning algorithms, in particular, have shown promise in terms of swiftly screening effective leads from huge compound libraries and producing novel compounds with desirable features. We used a deep neural network to search for potent β-microtubule inhibitors in natural goods. Eleutherobin, bruceine D (BD), and phorbol 12-myristate 13-acetate (PMA) are three highly effective natural compounds that have been found as β-microtubule inhibitors. In conclusion, this paper describes the use of deep learning to screen for effective β-microtubule inhibitors. This research also demonstrates the promising possibility of employing deep learning to develop drugs from natural products for a wider range of disorders.
Collapse
Affiliation(s)
- Xiao-Nan Jia
- School
of Life Science and Engineering, Southwest
Jiaotong University, Chengdu 610031, PR China
| | - Wei-Jia Wang
- School
of Computer Science and Engineering, University
of Electronic Science and Technology of China, Chengdu 610054, PR China
| | - Bo Yin
- School
of Life Science and Engineering, Southwest
Jiaotong University, Chengdu 610031, PR China
| | - Lin-Jing Zhou
- School
of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu 610054, PR China
| | - Yong-Qi Zhen
- School
of Life Science and Engineering, Southwest
Jiaotong University, Chengdu 610031, PR China
| | - Lan Zhang
- School
of Life Science and Engineering, Southwest
Jiaotong University, Chengdu 610031, PR China
| | - Xian-Li Zhou
- School
of Life Science and Engineering, Southwest
Jiaotong University, Chengdu 610031, PR China
| | - Hai-Ning Song
- Department
of Pharmacy, The Third People’s Hospital of Chengdu and College
of Medicine, Southwest Jiaotong University, Chengdu 610031, PR China
| | - Yong Tang
- School
of Computer Science and Engineering, University
of Electronic Science and Technology of China, Chengdu 610054, PR China
| | - Feng Gao
- School
of Life Science and Engineering, Southwest
Jiaotong University, Chengdu 610031, PR China
| |
Collapse
|
28
|
Monteiro NR, Oliveira JL, Arrais JP. DTITR: End-to-end drug–target binding affinity prediction with transformers. Comput Biol Med 2022; 147:105772. [DOI: 10.1016/j.compbiomed.2022.105772] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Revised: 06/07/2022] [Accepted: 06/19/2022] [Indexed: 11/03/2022]
|
29
|
Artificial intelligence and machine-learning approaches in structure and ligand-based discovery of drugs affecting central nervous system. Mol Divers 2022; 27:959-985. [PMID: 35819579 DOI: 10.1007/s11030-022-10489-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Accepted: 06/21/2022] [Indexed: 12/11/2022]
Abstract
CNS disorders are indications with a very high unmet medical needs, relatively smaller number of available drugs, and a subpar satisfaction level among patients and caregiver. Discovery of CNS drugs is extremely expensive affair with its own unique challenges leading to extremely high attrition rates and low efficiency. With explosion of data in information age, there is hardly any aspect of life that has not been touched by data driven technologies such as artificial intelligence (AI) and machine learning (ML). Drug discovery is no exception, emergence of big data via genomic, proteomic, biological, and chemical technologies has driven pharmaceutical giants to collaborate with AI oriented companies to revolutionise drug discovery, with the goal of increasing the efficiency of the process. In recent years many examples of innovative applications of AI and ML techniques in CNS drug discovery has been reported. Research on therapeutics for diseases such as schizophrenia, Alzheimer's and Parkinsonism has been provided with a new direction and thrust from these developments. AI and ML has been applied to both ligand-based and structure-based drug discovery and design of CNS therapeutics. In this review, we have summarised the general aspects of AI and ML from the perspective of drug discovery followed by a comprehensive coverage of the recent developments in the applications of AI/ML techniques in CNS drug discovery.
Collapse
|
30
|
Pan X, Lin X, Cao D, Zeng X, Yu PS, He L, Nussinov R, Cheng F. Deep learning for drug repurposing: Methods, databases, and applications. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1597] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Affiliation(s)
- Xiaoqin Pan
- School of Computer Science and Engineering Hunan University Changsha Hunan China
| | - Xuan Lin
- School of Computer Science Xiangtan University Xiangtan China
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education Xiangtan University Xiangtan China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences Central South University Changsha China
| | - Xiangxiang Zeng
- School of Computer Science and Engineering Hunan University Changsha Hunan China
| | - Philip S. Yu
- Department of Computer Science University of Illinois at Chicago Chicago Illinois USA
| | - Lifang He
- Department of Computer Science and Engineering Lehigh University Bethlehem Pennsylvania USA
| | - Ruth Nussinov
- Computational Structural Biology Section, Basic Science Program, Frederick National Laboratory for Cancer Research National Cancer Institute at Frederick Frederick Maryland USA
- Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine Tel Aviv University Tel Aviv Israel
| | - Feixiong Cheng
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic Cleveland Ohio USA
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine Case Western Reserve University Cleveland Ohio USA
- Case Comprehensive Cancer Center Case Western Reserve University School of Medicine Cleveland Ohio USA
| |
Collapse
|
31
|
Monteiro NRC, Simões CJV, Ávila HV, Abbasi M, Oliveira JL, Arrais JP. Explainable deep drug-target representations for binding affinity prediction. BMC Bioinformatics 2022; 23:237. [PMID: 35715734 PMCID: PMC9204982 DOI: 10.1186/s12859-022-04767-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Accepted: 05/25/2022] [Indexed: 11/10/2022] Open
Abstract
Background Several computational advances have been achieved in the drug discovery field, promoting the identification of novel drug–target interactions and new leads. However, most of these methodologies have been overlooking the importance of providing explanations to the decision-making process of deep learning architectures. In this research study, we explore the reliability of convolutional neural networks (CNNs) at identifying relevant regions for binding, specifically binding sites and motifs, and the significance of the deep representations extracted by providing explanations to the model’s decisions based on the identification of the input regions that contributed the most to the prediction. We make use of an end-to-end deep learning architecture to predict binding affinity, where CNNs are exploited in their capacity to automatically identify and extract discriminating deep representations from 1D sequential and structural data. Results The results demonstrate the effectiveness of the deep representations extracted from CNNs in the prediction of drug–target interactions. CNNs were found to identify and extract features from regions relevant for the interaction, where the weight associated with these spots was in the range of those with the highest positive influence given by the CNNs in the prediction. The end-to-end deep learning model achieved the highest performance both in the prediction of the binding affinity and on the ability to correctly distinguish the interaction strength rank order when compared to baseline approaches. Conclusions This research study validates the potential applicability of an end-to-end deep learning architecture in the context of drug discovery beyond the confined space of proteins and ligands with determined 3D structure. Furthermore, it shows the reliability of the deep representations extracted from the CNNs by providing explainability to the decision-making process. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04767-y.
Collapse
Affiliation(s)
- Nelson R C Monteiro
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal.
| | | | - Henrique V Ávila
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal
| | - Maryam Abbasi
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal
| | - José L Oliveira
- IEETA, Department of Electronics, Telecommunications and Informatics, University of Aveiro, Aveiro, Portugal
| | - Joel P Arrais
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal
| |
Collapse
|
32
|
Moriwaki H, Saito S, Matsumoto T, Serizawa T, Kunimoto R. Global Analysis of Deep Learning Prediction Using Large-Scale In-House Kinome-Wide Profiling Data. ACS OMEGA 2022; 7:18374-18381. [PMID: 35694454 PMCID: PMC9178758 DOI: 10.1021/acsomega.2c00664] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Accepted: 05/12/2022] [Indexed: 06/11/2023]
Abstract
In drug discovery, the prediction of activity and absorption, distribution, metabolism, excretion, and toxicity parameters is one of the most important approaches in determining which compound to synthesize next. In recent years, prediction methods based on deep learning as well as non-deep learning approaches have been established, and a number of applications to drug discovery have been reported by various companies and organizations. In this research, we performed activity prediction using deep learning and non-deep learning methods on in-house assay data for several hundred kinases and compared and discussed the prediction results. We found that the prediction accuracy of the single-task graph neural network (GNN) model was generally lower than that of the non-deep learning model (LightGBM), but the multitask GNN model, which combined data from other kinases, comprehensively outperformed LightGBM. In addition, the extrapolative validity of the multitask model was verified by using it for prediction on known kinase ligands. We observed an overlap between characteristic protein-ligand interaction sites and the atoms that are important for prediction. By building appropriate models based on the conditions of the data set and analyzing the feature importance of the prediction results, a ligand-based prediction method may be used not only for activity prediction but also for drug design.
Collapse
Affiliation(s)
- Hirotomo Moriwaki
- ExaWizards
Inc., 21F Shiodome Sumitomo
Building, 1-9-2 Higashi Shimbashi, Minato-ku, Tokyo 105-0021, Japan
| | - Shin Saito
- ExaWizards
Inc., 21F Shiodome Sumitomo
Building, 1-9-2 Higashi Shimbashi, Minato-ku, Tokyo 105-0021, Japan
| | - Tomoya Matsumoto
- ExaWizards
Inc., 21F Shiodome Sumitomo
Building, 1-9-2 Higashi Shimbashi, Minato-ku, Tokyo 105-0021, Japan
| | - Takayuki Serizawa
- Medicinal
Chemistry Research Laboratories, R&D Division, Daiichi-Sankyo
Shinagawa R&D Center, Daiichi Sankyo
Company, Limited, 1-2-58 Hiromachi, Shinagawa-ku, Tokyo 140-8710, Japan
| | - Ryo Kunimoto
- Medicinal
Chemistry Research Laboratories, R&D Division, Daiichi-Sankyo
Shinagawa R&D Center, Daiichi Sankyo
Company, Limited, 1-2-58 Hiromachi, Shinagawa-ku, Tokyo 140-8710, Japan
| |
Collapse
|
33
|
Li F, Zhang Z, Guan J, Zhou S. Effective drug-target interaction prediction with mutual interaction neural network. Bioinformatics 2022; 38:3582-3589. [PMID: 35652721 PMCID: PMC9272808 DOI: 10.1093/bioinformatics/btac377] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Revised: 05/09/2022] [Accepted: 05/31/2022] [Indexed: 11/30/2022] Open
Abstract
Motivation Accurately predicting drug–target interaction (DTI) is a crucial step to drug discovery. Recently, deep learning techniques have been widely used for DTI prediction and achieved significant performance improvement. One challenge in building deep learning models for DTI prediction is how to appropriately represent drugs and targets. Target distance map and molecular graph are low dimensional and informative representations, which however have not been jointly used in DTI prediction. Another challenge is how to effectively model the mutual impact between drugs and targets. Though attention mechanism has been used to capture the one-way impact of targets on drugs or vice versa, the mutual impact between drugs and targets has not yet been explored, which is very important in predicting their interactions. Results Therefore, in this article we propose MINN-DTI, a new model for DTI prediction. MINN-DTI combines an interacting-transformer module (called Interformer) with an improved Communicative Message Passing Neural Network (CMPNN) (called Inter-CMPNN) to better capture the two-way impact between drugs and targets, which are represented by molecular graph and distance map, respectively. The proposed method obtains better performance than the state-of-the-art methods on three benchmark datasets: DUD-E, human and BindingDB. MINN-DTI also provides good interpretability by assigning larger weights to the amino acids and atoms that contribute more to the interactions between drugs and targets. Availability and implementation The data and code of this study are available at https://github.com/admislf/MINN-DTI.
Collapse
Affiliation(s)
- Fei Li
- School of Computer Science, Fudan University, Shanghai 200438, China
| | - Ziqiao Zhang
- School of Computer Science, Fudan University, Shanghai 200438, China
| | - Jihong Guan
- Department of Computer Science and Technology, Tongji University, Shanghai 201804, China
| | - Shuigeng Zhou
- School of Computer Science, Fudan University, Shanghai 200438, China.,Shanghai Key Lab of Intelligent Information Processing, Shanghai 200438, China
| |
Collapse
|
34
|
DeepMHADTA: Prediction of Drug-Target Binding Affinity Using Multi-Head Self-Attention and Convolutional Neural Network. Curr Issues Mol Biol 2022; 44:2287-2299. [PMID: 35678684 PMCID: PMC9164023 DOI: 10.3390/cimb44050155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 05/08/2022] [Accepted: 05/14/2022] [Indexed: 11/17/2022] Open
Abstract
Drug-target interactions provide insight into the drug-side effects and drug repositioning. However, wet-lab biochemical experiments are time-consuming and labor-intensive, and are insufficient to meet the pressing demand for drug research and development. With the rapid advancement of deep learning, computational methods are increasingly applied to screen drug-target interactions. Many methods consider this problem as a binary classification task (binding or not), but ignore the quantitative binding affinity. In this paper, we propose a new end-to-end deep learning method called DeepMHADTA, which uses the multi-head self-attention mechanism in a deep residual network to predict drug-target binding affinity. On two benchmark datasets, our method outperformed several current state-of-the-art methods in terms of multiple performance measures, including mean square error (MSE), consistency index (CI), rm2, and PR curve area (AUPR). The results demonstrated that our method achieved better performance in predicting the drug–target binding affinity.
Collapse
|
35
|
Liu S, Wang Y, Deng Y, He L, Shao B, Yin J, Zheng N, Liu TY, Wang T. Improved drug-target interaction prediction with intermolecular graph transformer. Brief Bioinform 2022; 23:6581433. [PMID: 35514186 DOI: 10.1093/bib/bbac162] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 03/28/2022] [Accepted: 04/11/2022] [Indexed: 11/13/2022] Open
Abstract
The identification of active binding drugs for target proteins (referred to as drug-target interaction prediction) is the key challenge in virtual screening, which plays an essential role in drug discovery. Although recent deep learning-based approaches achieve better performance than molecular docking, existing models often neglect topological or spatial of intermolecular information, hindering prediction performance. We recognize this problem and propose a novel approach called the Intermolecular Graph Transformer (IGT) that employs a dedicated attention mechanism to model intermolecular information with a three-way Transformer-based architecture. IGT outperforms state-of-the-art (SoTA) approaches by 9.1% and 20.5% over the second best option for binding activity and binding pose prediction, respectively, and exhibits superior generalization ability to unseen receptor proteins than SoTA approaches. Furthermore, IGT exhibits promising drug screening ability against severe acute respiratory syndrome coronavirus 2 by identifying 83.1% active drugs that have been validated by wet-lab experiments with near-native predicted binding poses. Source code and datasets are available at https://github.com/microsoft/IGT-Intermolecular-Graph-Transformer.
Collapse
Affiliation(s)
- Siyuan Liu
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510006, China.,Guangdong Key Laboratory of Big Data Analysis and Processing, Guangzhou, 510006, China.,Microsoft Research Asia, Beijing, 100080, China
| | - Yusong Wang
- Microsoft Research Asia, Beijing, 100080, China.,Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Yifan Deng
- Microsoft Research Asia, Beijing, 100080, China
| | - Liang He
- Microsoft Research Asia, Beijing, 100080, China.,School of Computer Science, Fudan University, Shanghai, 200433, China
| | - Bin Shao
- Microsoft Research Asia, Beijing, 100080, China
| | - Jian Yin
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510006, China.,Guangdong Key Laboratory of Big Data Analysis and Processing, Guangzhou, 510006, China
| | - Nanning Zheng
- Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Tie-Yan Liu
- Microsoft Research Asia, Beijing, 100080, China
| | - Tong Wang
- Microsoft Research Asia, Beijing, 100080, China
| |
Collapse
|
36
|
Zhang R, Ghosh S, Pal R. Predicting binding affinities of emerging variants of SARS-CoV-2 using spike protein sequencing data: observations, caveats and recommendations. Brief Bioinform 2022; 23:6569542. [PMID: 35437577 DOI: 10.1093/bib/bbac128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Revised: 02/13/2022] [Accepted: 03/16/2022] [Indexed: 11/13/2022] Open
Abstract
Predicting protein properties from amino acid sequences is an important problem in biology and pharmacology. Protein-protein interactions among SARS-CoV-2 spike protein, human receptors and antibodies are key determinants of the potency of this virus and its ability to evade the human immune response. As a rapidly evolving virus, SARS-CoV-2 has already developed into many variants with considerable variation in virulence among these variants. Utilizing the proteomic data of SARS-CoV-2 to predict its viral characteristics will, therefore, greatly aid in disease control and prevention. In this paper, we review and compare recent successful prediction methods based on long short-term memory (LSTM), transformer, convolutional neural network (CNN) and a similarity-based topological regression (TR) model and offer recommendations about appropriate predictive methodology depending on the similarity between training and test datasets. We compare the effectiveness of these models in predicting the binding affinity and expression of SARS-CoV-2 spike protein sequences. We also explore how effective these predictive methods are when trained on laboratory-created data and are tasked with predicting the binding affinity of the in-the-wild SARS-CoV-2 spike protein sequences obtained from the GISAID datasets. We observe that TR is a better method when the sample size is small and test protein sequences are sufficiently similar to the training sequence. However, when the training sample size is sufficiently large and prediction requires extrapolation, LSTM embedding and CNN-based predictive model show superior performance.
Collapse
Affiliation(s)
- Ruibo Zhang
- Department of Electrical and Computer Engineering, Texas Tech University, TX, USA
| | - Souparno Ghosh
- Department of Statistics, University of Nebraska - Lincoln, NB, USA
| | - Ranadip Pal
- Department of Electrical and Computer Engineering, Texas Tech University, TX, USA
| |
Collapse
|
37
|
|
38
|
Affinity2Vec: drug-target binding affinity prediction through representation learning, graph mining, and machine learning. Sci Rep 2022; 12:4751. [PMID: 35306525 PMCID: PMC8934358 DOI: 10.1038/s41598-022-08787-9] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Accepted: 03/08/2022] [Indexed: 11/21/2022] Open
Abstract
Drug-target interaction (DTI) prediction plays a crucial role in drug repositioning and virtual drug screening. Most DTI prediction methods cast the problem as a binary classification task to predict if interactions exist or as a regression task to predict continuous values that indicate a drug's ability to bind to a specific target. The regression-based methods provide insight beyond the binary relationship. However, most of these methods require the three-dimensional (3D) structural information of targets which are still not generally available to the targets. Despite this bottleneck, only a few methods address the drug-target binding affinity (DTBA) problem from a non-structure-based approach to avoid the 3D structure limitations. Here we propose Affinity2Vec, as a novel regression-based method that formulates the entire task as a graph-based problem. To develop this method, we constructed a weighted heterogeneous graph that integrates data from several sources, including drug-drug similarity, target-target similarity, and drug-target binding affinities. Affinity2Vec further combines several computational techniques from feature representation learning, graph mining, and machine learning to generate or extract features, build the model, and predict the binding affinity between the drug and the target with no 3D structural data. We conducted extensive experiments to evaluate and demonstrate the robustness and efficiency of the proposed method on benchmark datasets used in state-of-the-art non-structured-based drug-target binding affinity studies. Affinity2Vec showed superior and competitive results compared to the state-of-the-art methods based on several evaluation metrics, including mean squared error, rm2, concordance index, and area under the precision-recall curve.
Collapse
|
39
|
Du BX, Qin Y, Jiang YF, Xu Y, Yiu SM, Yu H, Shi JY. Compound–protein interaction prediction by deep learning: Databases, descriptors and models. Drug Discov Today 2022; 27:1350-1366. [DOI: 10.1016/j.drudis.2022.02.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Revised: 11/19/2021] [Accepted: 02/28/2022] [Indexed: 11/24/2022]
|
40
|
Nikolaienko T, Gurbych O, Druchok M. Complex machine learning model needs complex testing: Examining predictability of molecular binding affinity by a graph neural network. J Comput Chem 2022; 43:728-739. [PMID: 35201629 DOI: 10.1002/jcc.26831] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Revised: 01/04/2022] [Accepted: 02/09/2022] [Indexed: 12/12/2022]
Abstract
Drug discovery pipelines typically involve high-throughput screening of large amounts of compounds in a search of potential drugs candidates. As a chemical space of small organic molecules is huge, a "navigation" over it urges for fast and lightweight computational methods, thus promoting machine-learning approaches for processing huge pools of candidates. In this contribution, we present a graph-based deep neural network for prediction of protein-drug binding affinity and assess its predictive power under thorough testing conditions. Within the suggested approach, both protein and drug molecules are represented as graphs and passed to separate graph sub-networks, then concatenated and regressed towards a binding affinity. The neural network is trained on two binding affinity datasets-PDBbind and data imported from RCSB Protein Data Bank. In order to explore the generalization capabilities of the model we go beyond traditional random or leave-cluster-out techniques and demonstrate the need for more elaborate model performance assessment - six different strategies for test/train data partitioning (random, time- and property-arranged, protein- and ligand-clustered) with a k-fold cross-validation are engaged. Finally, we discuss the model performance in terms of a set of metrics for different split strategies and fold arrangement. Our code is available at https://github.com/SoftServeInc/affinity-by-GNN.
Collapse
Affiliation(s)
- Tymofii Nikolaienko
- SoftServe, Inc., Lviv, Ukraine.,Faculty of Physics, Taras Shevchenko National University of Kyiv, Kyiv, Ukraine
| | - Oleksandr Gurbych
- Blackthorn AI Ltd., London, UK.,Department of Artificial Intelligence Systems, Lviv Polytechnic National University, Lviv, Ukraine
| | - Maksym Druchok
- SoftServe, Inc., Lviv, Ukraine.,Institute for Condensed Matter Physics, NAS of Ukraine, Lviv, Ukraine
| |
Collapse
|
41
|
Yang Z, Zhong W, Zhao L, Yu-Chian Chen C. MGraphDTA: deep multiscale graph neural network for explainable drug-target binding affinity prediction. Chem Sci 2022; 13:816-833. [PMID: 35173947 PMCID: PMC8768884 DOI: 10.1039/d1sc05180f] [Citation(s) in RCA: 69] [Impact Index Per Article: 34.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2021] [Accepted: 12/17/2021] [Indexed: 12/22/2022] Open
Abstract
Predicting drug-target affinity (DTA) is beneficial for accelerating drug discovery. Graph neural networks (GNNs) have been widely used in DTA prediction. However, existing shallow GNNs are insufficient to capture the global structure of compounds. Besides, the interpretability of the graph-based DTA models highly relies on the graph attention mechanism, which can not reveal the global relationship between each atom of a molecule. In this study, we proposed a deep multiscale graph neural network based on chemical intuition for DTA prediction (MGraphDTA). We introduced a dense connection into the GNN and built a super-deep GNN with 27 graph convolutional layers to capture the local and global structure of the compound simultaneously. We also developed a novel visual explanation method, gradient-weighted affinity activation mapping (Grad-AAM), to analyze a deep learning model from the chemical perspective. We evaluated our approach using seven benchmark datasets and compared the proposed method to the state-of-the-art deep learning (DL) models. MGraphDTA outperforms other DL-based approaches significantly on various datasets. Moreover, we show that Grad-AAM creates explanations that are consistent with pharmacologists, which may help us gain chemical insights directly from data beyond human perception. These advantages demonstrate that the proposed method improves the generalization and interpretation capability of DTA prediction modeling.
Collapse
Affiliation(s)
- Ziduo Yang
- Artificial Intelligence Medical Center, School of Intelligent Systems Engineering, Sun Yat-sen University Shenzhen 510275 China +862039332153
| | - Weihe Zhong
- Artificial Intelligence Medical Center, School of Intelligent Systems Engineering, Sun Yat-sen University Shenzhen 510275 China +862039332153
| | - Lu Zhao
- Artificial Intelligence Medical Center, School of Intelligent Systems Engineering, Sun Yat-sen University Shenzhen 510275 China +862039332153
- Department of Clinical Laboratory, The Sixth Affiliated Hospital, Sun Yat-sen University Guangzhou 510655 China
| | - Calvin Yu-Chian Chen
- Artificial Intelligence Medical Center, School of Intelligent Systems Engineering, Sun Yat-sen University Shenzhen 510275 China +862039332153
- Department of Medical Research, China Medical University Hospital Taichung 40447 Taiwan
- Department of Bioinformatics and Medical Engineering, Asia University Taichung 41354 Taiwan
| |
Collapse
|
42
|
Ikeda K, Doi T, Ikeda M, Tomii K. PreBINDS: An Interactive Web Tool to Create Appropriate Datasets for Predicting Compound-Protein Interactions. Front Mol Biosci 2021; 8:758480. [PMID: 34938773 PMCID: PMC8685504 DOI: 10.3389/fmolb.2021.758480] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2021] [Accepted: 11/15/2021] [Indexed: 11/13/2022] Open
Abstract
Given the abundant computational resources and the huge amount of data of compound-protein interactions (CPIs), constructing appropriate datasets for learning and evaluating prediction models for CPIs is not always easy. For this study, we have developed a web server to facilitate the development and evaluation of prediction models by providing an appropriate dataset according to the task. Our web server provides an environment and dataset that aid model developers and evaluators in obtaining a suitable dataset for both proteins and compounds, in addition to attributes necessary for deep learning. With the web server interface, users can customize the CPI dataset derived from ChEMBL by setting positive and negative thresholds to be adjusted according to the user's definitions. We have also implemented a function for graphic display of the distribution of activity values in the dataset as a histogram to set appropriate thresholds for positive and negative examples. These functions enable effective development and evaluation of models. Furthermore, users can prepare their task-specific datasets by selecting a set of target proteins based on various criteria such as Pfam families, ChEMBL's classification, and sequence similarities. The accuracy and efficiency of in silico screening and drug design using machine learning including deep learning can therefore be improved by facilitating access to an appropriate dataset prepared using our web server (https://binds.lifematics.work/).
Collapse
Affiliation(s)
- Kazuyoshi Ikeda
- Medicinal Chemistry Applied AI Unit, HPC- and AI-driven Drug Development Platform Division, RIKEN Center for Computational Science, Yokohama, Japan.,Division of Physics for Life Functions, Keio University Faculty of Pharmacy, Tokyo, Japan
| | | | - Masami Ikeda
- Artificial Intelligence Research Center (AIRC), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| | - Kentaro Tomii
- Artificial Intelligence Research Center (AIRC), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan.,AIST-Tokyo Tech Real World Big-Data Computation Open Innovation Laboratory (RWBC-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| |
Collapse
|
43
|
Cetin-Atalay R, Kahraman DC, Nalbat E, Rifaioglu AS, Atakan A, Donmez A, Atas H, Atalay MV, Acar AC, Doğan T. Data Centric Molecular Analysis and Evaluation of Hepatocellular Carcinoma Therapeutics Using Machine Intelligence-Based Tools. J Gastrointest Cancer 2021; 52:1266-1276. [PMID: 34910274 DOI: 10.1007/s12029-021-00768-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/13/2021] [Indexed: 10/19/2022]
Abstract
PURPOSE Computational approaches have been used at different stages of drug development with the purpose of decreasing the time and cost of conventional experimental procedures. Lately, techniques mainly developed and applied in the field of artificial intelligence (AI), have been transferred to different application domains such as biomedicine. METHODS In this study, we conducted an investigative analysis via data-driven evaluation of potential hepatocellular carcinoma (HCC) therapeutics in the context of AI-assisted drug discovery/repurposing. First, we discussed basic concepts, computational approaches, databases, modeling approaches, and featurization techniques in drug discovery/repurposing. In the analysis part, we automatically integrated HCC-related biological entities such as genes/proteins, pathways, phenotypes, drugs/compounds, and other diseases with similar implications, and represented these heterogeneous relationships via a knowledge graph using the CROssBAR system. RESULTS Following the system-level evaluation and selection of critical genes/proteins and pathways to target, our deep learning-based drug/compound-target protein interaction predictors DEEPScreen and MDeePred have been employed for predicting new bioactive drugs and compounds for these critical targets. Finally, we embedded ligands of selected HCC-associated proteins which had a significant enrichment with the CROssBAR system into a 2-D space to identify and repurpose small molecule inhibitors as potential drug candidates based on their molecular similarities to known HCC drugs. CONCLUSIONS We expect that these series of data-driven analyses can be used as a roadmap to propose early-stage potential inhibitors (from database-scale sets of compounds) to both HCC and other complex diseases, which may subsequently be analyzed with more targeted in silico and experimental approaches.
Collapse
Affiliation(s)
- Rengul Cetin-Atalay
- Section of Pulmonary and Critical Care Medicine, University of Chicago, Chicago, IL, 60637, USA.
| | - Deniz Cansen Kahraman
- Cancer Systems Biology Laboratory, Graduate School of Informatics, METU, Ankara, 06800, Turkey.
| | - Esra Nalbat
- Cancer Systems Biology Laboratory, Graduate School of Informatics, METU, Ankara, 06800, Turkey
| | - Ahmet Sureyya Rifaioglu
- Department of Computer Engineering, Iskenderun Technical University, Iskenderun, Hatay, 31200, Turkey.,Department of Computer Engineering, METU, Ankara, 06800, Turkey
| | - Ahmet Atakan
- Department of Computer Engineering, METU, Ankara, 06800, Turkey.,Department of Computer Engineering, EBYU, Ankara, 24002, Turkey
| | - Ataberk Donmez
- Department of Computer Engineering, METU, Ankara, 06800, Turkey.,Department of Computer Science, University of Maryland, College Park, MD, 20742, USA
| | - Heval Atas
- Cancer Systems Biology Laboratory, Graduate School of Informatics, METU, Ankara, 06800, Turkey
| | - M Volkan Atalay
- Cancer Systems Biology Laboratory, Graduate School of Informatics, METU, Ankara, 06800, Turkey.,Department of Computer Engineering, METU, Ankara, 06800, Turkey
| | - Aybar C Acar
- Cancer Systems Biology Laboratory, Graduate School of Informatics, METU, Ankara, 06800, Turkey
| | - Tunca Doğan
- Cancer Systems Biology Laboratory, Graduate School of Informatics, METU, Ankara, 06800, Turkey. .,Department of Computer Engineering, Hacettepe University, Ankara, 06800, Turkey.
| |
Collapse
|
44
|
Multilevel Attention Models for Drug Target Binding Affinity Prediction. Neural Process Lett 2021. [DOI: 10.1007/s11063-021-10617-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
45
|
Wang Y, Wu S, Duan Y, Huang Y. A point cloud-based deep learning strategy for protein-ligand binding affinity prediction. Brief Bioinform 2021; 23:6440132. [PMID: 34849569 DOI: 10.1093/bib/bbab474] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 09/21/2021] [Accepted: 10/15/2021] [Indexed: 01/14/2023] Open
Abstract
There is great interest to develop artificial intelligence-based protein-ligand binding affinity models due to their immense applications in drug discovery. In this paper, PointNet and PointTransformer, two pointwise multi-layer perceptrons have been applied for protein-ligand binding affinity prediction for the first time. Three-dimensional point clouds could be rapidly generated from PDBbind-2016 with 3772 and 11 327 individual point clouds derived from the refined or/and general sets, respectively. These point clouds (the refined or the extended set) were used to train PointNet or PointTransformer, resulting in protein-ligand binding affinity prediction models with Pearson correlation coefficients R = 0.795 or 0.833 from the extended data set, respectively, based on the CASF-2016 benchmark test. The analysis of parameters suggests that the two deep learning models were capable to learn many interactions between proteins and their ligands, and some key atoms for the interactions could be visualized. The protein-ligand interaction features learned by PointTransformer could be further adapted for the XGBoost-based machine learning algorithm, resulting in prediction models with an average Rp of 0.827, which is on par with state-of-the-art machine learning models. These results suggest that the point clouds derived from PDBbind data sets are useful to evaluate the performance of 3D point clouds-centered deep learning algorithms, which could learn atomic features of protein-ligand interactions from natural evolution or medicinal chemistry and thus have wide applications in chemistry and biology.
Collapse
Affiliation(s)
- Yeji Wang
- Xiangya International Academy of Translational Medicine, Central South University, Changsha, Hunan 410013, China
| | - Shuo Wu
- Xiangya International Academy of Translational Medicine, Central South University, Changsha, Hunan 410013, China
| | - Yanwen Duan
- Xiangya International Academy of Translational Medicine, Central South University, Changsha, Hunan 410013, China.,Hunan Engineering Research Center of Combinatorial Biosynthesis and Natural Product Drug Discover, Changsha, Hunan 410011, China.,National Engineering Research Center of Combinatorial Biosynthesis for Drug Discovery, Changsha, Hunan 410011, China
| | - Yong Huang
- Xiangya International Academy of Translational Medicine, Central South University, Changsha, Hunan 410013, China.,National Engineering Research Center of Combinatorial Biosynthesis for Drug Discovery, Changsha, Hunan 410011, China
| |
Collapse
|
46
|
Tong X, Liu S, Gu J, Wu C, Liang Y, Shi X. Amino acid environment affinity model based on graph attention network. J Bioinform Comput Biol 2021; 20:2150032. [PMID: 34775920 DOI: 10.1142/s0219720021500323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Proteins are engines involved in almost all functions of life. They have specific spatial structures formed by twisting and folding of one or more polypeptide chains composed of amino acids. Protein sites are protein structure microenvironments that can be identified by three-dimensional locations and local neighborhoods in which the structure or function exists. Understanding the amino acid environment affinity is essential for additional protein structural or functional studies, such as mutation analysis and functional site detection. In this study, an amino acid environment affinity model based on the graph attention network was developed. Initially, we constructed a protein graph according to the distance between amino acid pairs. Then, we extracted a set of structural features for each node. Finally, the protein graph and the associated node feature set were set to input the graph attention network model and to obtain the amino acid affinities. Numerical results show that our proposed method significantly outperforms a recent 3DCNN-based method by almost 30%.
Collapse
Affiliation(s)
- Xueheng Tong
- College of Computer Science and Technology, Jilin University, Qianjing Street 2699, Changchun, Jilin 130012, China
| | - Shuqi Liu
- College of Computer Science and Technology, Jilin University, Qianjing Street 2699, Changchun, Jilin 130012, China
| | - Jiawei Gu
- College of Computer Science and Technology, Jilin University, Qianjing Street 2699, Changchun, Jilin 130012, China
| | - Chunguo Wu
- College of Computer Science and Technology, Jilin University, Qianjing Street 2699, Changchun, Jilin 130012, China
| | - Yanchun Liang
- School of Computer Science, Zhuhai College of Science and Technology Zhuhai, Guangdong 519041, China
| | - Xiaohu Shi
- College of Computer Science and Technology, Jilin University, Qianjing Street 2699, Changchun, Jilin 130012, China.,School of Computer Science, Zhuhai College of Science and Technology Zhuhai, Guangdong 519041, China
| |
Collapse
|
47
|
Wei YP, Yao LY, Wu YY, Liu X, Peng LH, Tian YL, Ding JH, Li KH, He QG. Critical Review of Synthesis, Toxicology and Detection of Acyclovir. Molecules 2021; 26:molecules26216566. [PMID: 34770975 PMCID: PMC8587948 DOI: 10.3390/molecules26216566] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2021] [Revised: 10/25/2021] [Accepted: 10/27/2021] [Indexed: 02/02/2023] Open
Abstract
Acyclovir (ACV) is an effective and selective antiviral drug, and the study of its toxicology and the use of appropriate detection techniques to control its toxicity at safe levels are extremely important for medicine efforts and human health. This review discusses the mechanism driving ACV’s ability to inhibit viral coding, starting from its development and pharmacology. A comprehensive summary of the existing preparation methods and synthetic materials, such as 5-aminoimidazole-4-carboxamide, guanine and its derivatives, and other purine derivatives, is presented to elucidate the preparation of ACV in detail. In addition, it presents valuable analytical procedures for the toxicological studies of ACV, which are essential for human use and dosing. Analytical methods, including spectrophotometry, high performance liquid chromatography (HPLC), liquid chromatography/tandem mass spectrometry (LC-MS/MS), electrochemical sensors, molecularly imprinted polymers (MIPs), and flow injection–chemiluminescence (FI-CL) are also highlighted. A brief description of the characteristics of each of these methods is also presented. Finally, insight is provided for the development of ACV to drive further innovation of ACV in pharmaceutical applications. This review provides a comprehensive summary of the past life and future challenges of ACV.
Collapse
Affiliation(s)
- Yan-Ping Wei
- School of Life Science and Chemistry, Hunan University of Technology, Zhuzhou 412007, China; (Y.-P.W.); (Y.-Y.W.); (L.-H.P.); (Y.-L.T.)
- Zhuzhou People’s Hospital, Zhuzhou 412001, China; (X.L.); (J.-H.D.)
- Hunan Qianjin Xiangjiang Pharmaceutical Joint Stock Co., Ltd., Zhuzhou 412001, China;
| | - Liang-Yuan Yao
- Hunan Qianjin Xiangjiang Pharmaceutical Joint Stock Co., Ltd., Zhuzhou 412001, China;
| | - Yi-Yong Wu
- School of Life Science and Chemistry, Hunan University of Technology, Zhuzhou 412007, China; (Y.-P.W.); (Y.-Y.W.); (L.-H.P.); (Y.-L.T.)
| | - Xia Liu
- Zhuzhou People’s Hospital, Zhuzhou 412001, China; (X.L.); (J.-H.D.)
| | - Li-Hong Peng
- School of Life Science and Chemistry, Hunan University of Technology, Zhuzhou 412007, China; (Y.-P.W.); (Y.-Y.W.); (L.-H.P.); (Y.-L.T.)
| | - Ya-Ling Tian
- School of Life Science and Chemistry, Hunan University of Technology, Zhuzhou 412007, China; (Y.-P.W.); (Y.-Y.W.); (L.-H.P.); (Y.-L.T.)
| | - Jian-Hua Ding
- Zhuzhou People’s Hospital, Zhuzhou 412001, China; (X.L.); (J.-H.D.)
| | - Kang-Hua Li
- Zhuzhou People’s Hospital, Zhuzhou 412001, China; (X.L.); (J.-H.D.)
- Correspondence: (K.-H.L.); (Q.-G.H.); Tel./Fax: +86-731-2218-3426 (Q.-G.H.)
| | - Quan-Guo He
- School of Life Science and Chemistry, Hunan University of Technology, Zhuzhou 412007, China; (Y.-P.W.); (Y.-Y.W.); (L.-H.P.); (Y.-L.T.)
- Zhuzhou People’s Hospital, Zhuzhou 412001, China; (X.L.); (J.-H.D.)
- Hunan Qianjin Xiangjiang Pharmaceutical Joint Stock Co., Ltd., Zhuzhou 412001, China;
- Correspondence: (K.-H.L.); (Q.-G.H.); Tel./Fax: +86-731-2218-3426 (Q.-G.H.)
| |
Collapse
|
48
|
Doğan T, Atas H, Joshi V, Atakan A, Rifaioglu A, Nalbat E, Nightingale A, Saidi R, Volynkin V, Zellner H, Cetin-Atalay R, Martin M, Atalay V. CROssBAR: comprehensive resource of biomedical relations with knowledge graph representations. Nucleic Acids Res 2021; 49:e96. [PMID: 34181736 PMCID: PMC8450100 DOI: 10.1093/nar/gkab543] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2020] [Revised: 04/11/2021] [Accepted: 06/10/2021] [Indexed: 12/11/2022] Open
Abstract
Systemic analysis of available large-scale biological/biomedical data is critical for studying biological mechanisms, and developing novel and effective treatment approaches against diseases. However, different layers of the available data are produced using different technologies and scattered across individual computational resources without any explicit connections to each other, which hinders extensive and integrative multi-omics-based analysis. We aimed to address this issue by developing a new data integration/representation methodology and its application by constructing a biological data resource. CROssBAR is a comprehensive system that integrates large-scale biological/biomedical data from various resources and stores them in a NoSQL database. CROssBAR is enriched with the deep-learning-based prediction of relationships between numerous data entries, which is followed by the rigorous analysis of the enriched data to obtain biologically meaningful modules. These complex sets of entities and relationships are displayed to users via easy-to-interpret, interactive knowledge graphs within an open-access service. CROssBAR knowledge graphs incorporate relevant genes-proteins, molecular interactions, pathways, phenotypes, diseases, as well as known/predicted drugs and bioactive compounds, and they are constructed on-the-fly based on simple non-programmatic user queries. These intensely processed heterogeneous networks are expected to aid systems-level research, especially to infer biological mechanisms in relation to genes, proteins, their ligands, and diseases.
Collapse
Affiliation(s)
- Tunca Doğan
- Department of Computer Engineering, Hacettepe University, Ankara 06800, Turkey
- Institute of Informatics, Hacettepe University, Ankara 06800, Turkey
- Cancer Systems Biology Laboratory, Graduate School of Informatics, METU, Ankara 06800, Turkey
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL–EBI), Hinxton, Cambridgeshire CB10 1SD, UK
| | - Heval Atas
- Cancer Systems Biology Laboratory, Graduate School of Informatics, METU, Ankara 06800, Turkey
| | - Vishal Joshi
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL–EBI), Hinxton, Cambridgeshire CB10 1SD, UK
| | - Ahmet Atakan
- Department of Computer Engineering, METU, Ankara 06800, Turkey
- Department of Computer Engineering, EBYU, Erzincan 24002, Turkey
| | - Ahmet Sureyya Rifaioglu
- Department of Computer Engineering, METU, Ankara 06800, Turkey
- Department of Computer Engineering, İskenderun Technical University, Hatay 31200, Turkey
| | - Esra Nalbat
- Cancer Systems Biology Laboratory, Graduate School of Informatics, METU, Ankara 06800, Turkey
| | - Andrew Nightingale
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL–EBI), Hinxton, Cambridgeshire CB10 1SD, UK
| | - Rabie Saidi
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL–EBI), Hinxton, Cambridgeshire CB10 1SD, UK
| | - Vladimir Volynkin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL–EBI), Hinxton, Cambridgeshire CB10 1SD, UK
| | - Hermann Zellner
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL–EBI), Hinxton, Cambridgeshire CB10 1SD, UK
| | - Rengul Cetin-Atalay
- Cancer Systems Biology Laboratory, Graduate School of Informatics, METU, Ankara 06800, Turkey
- Section of Pulmonary and Critical Care Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Maria Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL–EBI), Hinxton, Cambridgeshire CB10 1SD, UK
| | - Volkan Atalay
- Department of Computer Engineering, METU, Ankara 06800, Turkey
| |
Collapse
|
49
|
Gupta R, Srivastava D, Sahu M, Tiwari S, Ambasta RK, Kumar P. Artificial intelligence to deep learning: machine intelligence approach for drug discovery. Mol Divers 2021; 25:1315-1360. [PMID: 33844136 PMCID: PMC8040371 DOI: 10.1007/s11030-021-10217-3] [Citation(s) in RCA: 249] [Impact Index Per Article: 83.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 03/22/2021] [Indexed: 02/06/2023]
Abstract
Drug designing and development is an important area of research for pharmaceutical companies and chemical scientists. However, low efficacy, off-target delivery, time consumption, and high cost impose a hurdle and challenges that impact drug design and discovery. Further, complex and big data from genomics, proteomics, microarray data, and clinical trials also impose an obstacle in the drug discovery pipeline. Artificial intelligence and machine learning technology play a crucial role in drug discovery and development. In other words, artificial neural networks and deep learning algorithms have modernized the area. Machine learning and deep learning algorithms have been implemented in several drug discovery processes such as peptide synthesis, structure-based virtual screening, ligand-based virtual screening, toxicity prediction, drug monitoring and release, pharmacophore modeling, quantitative structure-activity relationship, drug repositioning, polypharmacology, and physiochemical activity. Evidence from the past strengthens the implementation of artificial intelligence and deep learning in this field. Moreover, novel data mining, curation, and management techniques provided critical support to recently developed modeling algorithms. In summary, artificial intelligence and deep learning advancements provide an excellent opportunity for rational drug design and discovery process, which will eventually impact mankind. The primary concern associated with drug design and development is time consumption and production cost. Further, inefficiency, inaccurate target delivery, and inappropriate dosage are other hurdles that inhibit the process of drug delivery and development. With advancements in technology, computer-aided drug design integrating artificial intelligence algorithms can eliminate the challenges and hurdles of traditional drug design and development. Artificial intelligence is referred to as superset comprising machine learning, whereas machine learning comprises supervised learning, unsupervised learning, and reinforcement learning. Further, deep learning, a subset of machine learning, has been extensively implemented in drug design and development. The artificial neural network, deep neural network, support vector machines, classification and regression, generative adversarial networks, symbolic learning, and meta-learning are examples of the algorithms applied to the drug design and discovery process. Artificial intelligence has been applied to different areas of drug design and development process, such as from peptide synthesis to molecule design, virtual screening to molecular docking, quantitative structure-activity relationship to drug repositioning, protein misfolding to protein-protein interactions, and molecular pathway identification to polypharmacology. Artificial intelligence principles have been applied to the classification of active and inactive, monitoring drug release, pre-clinical and clinical development, primary and secondary drug screening, biomarker development, pharmaceutical manufacturing, bioactivity identification and physiochemical properties, prediction of toxicity, and identification of mode of action.
Collapse
Affiliation(s)
- Rohan Gupta
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Devesh Srivastava
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Mehar Sahu
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Swati Tiwari
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Rashmi K Ambasta
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Pravir Kumar
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India.
| |
Collapse
|
50
|
Yang Z, Zhong W, Zhao L, Chen CYC. ML-DTI: Mutual Learning Mechanism for Interpretable Drug-Target Interaction Prediction. J Phys Chem Lett 2021; 12:4247-4261. [PMID: 33904745 DOI: 10.1021/acs.jpclett.1c00867] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Deep learning (DL) provides opportunities for the identification of drug-target interactions (DTIs). The challenges of applying DL lie primarily with the lack of interpretability. Also, most of the existing DL-based methods formulate the drug and target encoder as two independent modules without considering the relationship between them. In this study, we propose a mutual learning mechanism to bridge the gap between the two encoders. We formulated the DTI problem from a global perspective by inserting mutual learning layers between the two encoders. The mutual learning layer was achieved by multihead attention and position-aware attention. The neural attention mechanism also provides effective visualization, which makes it easier to analyze a model. We evaluated our approach using three benchmark kinase data sets under different experimental settings and compared the proposed method to three baseline models. We found that the four methods yielded similar results in the random split setting (training and test sets share common drugs and targets), while the proposed method increases the predictive performance significantly in the orphan-target and orphan-drug split setting (training and test sets share only targets or drugs). The experimental results demonstrated that the proposed method improved the generalization and interpretation capability of DTI modeling.
Collapse
Affiliation(s)
- Ziduo Yang
- Artificial Intelligence Medical Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen 510275, China
| | - Weihe Zhong
- Artificial Intelligence Medical Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen 510275, China
| | - Lu Zhao
- Artificial Intelligence Medical Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen 510275, China
- Department of Clinical Laboratory, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou 510655, China
| | - Calvin Yu-Chian Chen
- Artificial Intelligence Medical Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen 510275, China
- Department of Medical Research, China Medical University Hospital, Taichung 40447, Taiwan
- Department of Bioinformatics and Medical Engineering, Asia University, Taichung 41354, Taiwan
| |
Collapse
|