1
|
Zhang Y, Huang C, Wang Y, Li S, Sun S. CL-GNN: Contrastive Learning and Graph Neural Network for Protein-Ligand Binding Affinity Prediction. J Chem Inf Model 2025; 65:1724-1735. [PMID: 39913849 DOI: 10.1021/acs.jcim.4c01290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2025]
Abstract
In the realm of drug discovery and design, the accurate prediction of protein-ligand binding affinity is of paramount importance as it underpins the functional interactions within biological systems. This study introduces a novel self-supervised learning (SSL) framework that combines contrastive learning and graph neural networks (CL-GNN) for predicting protein-ligand binding affinities, which is a critical aspect of drug discovery. Traditional methods for affinity prediction are expensive and time-consuming, prompting the development of more efficient computational approaches. CL-GNN utilizes a contrastive learning strategy, a form of SSL, to learn from a large data set of 371 458 unique unlabeled protein-ligand complexes. By employing graph neural networks and molecular graph enhancement techniques, the model effectively captures protein-ligand interactions in a self-supervised manner. The fine-tuned model demonstrates competitive performance, achieving high Pearson's correlation coefficients and low root-mean-square errors on benchmark data sets. The proposed method outperforms existing machine learning models, showcasing its potential for accelerating the drug development process. The method effectively quantifies the similarity between protein-ligand complex representations learned in the pretraining and downstream testing phases through cosine similarity assessment. This approach not only revealed potential connections between complexes in their binding properties but also provided new insights into the understanding of drug mechanisms of action. In addition, the transparency of the model is significantly improved by visualizing the importance of key protein residues and ligand atoms. This visualization tool provides insight into the model's predictive decision-making process, providing key biological insights for drug design and optimization.
Collapse
Affiliation(s)
- Yunjiang Zhang
- Department of Chemical Engineering and Technology, College of Materials Science and Engineering, Beijing University of Technology, Beijing 100124, P. R. China
| | - Chenyu Huang
- Department of Chemical Engineering and Technology, College of Materials Science and Engineering, Beijing University of Technology, Beijing 100124, P. R. China
| | - Yaxin Wang
- Department of Chemical Engineering and Technology, College of Materials Science and Engineering, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shuyuan Li
- Department of Chemical Engineering and Technology, College of Materials Science and Engineering, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shaorui Sun
- Department of Chemical Engineering and Technology, College of Materials Science and Engineering, Beijing University of Technology, Beijing 100124, P. R. China
| |
Collapse
|
2
|
Carpenter KA, Altman RB. Databases of ligand-binding pockets and protein-ligand interactions. Comput Struct Biotechnol J 2024; 23:1320-1338. [PMID: 38585646 PMCID: PMC10997877 DOI: 10.1016/j.csbj.2024.03.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 03/16/2024] [Accepted: 03/17/2024] [Indexed: 04/09/2024] Open
Abstract
Many research groups and institutions have created a variety of databases curating experimental and predicted data related to protein-ligand binding. The landscape of available databases is dynamic, with new databases emerging and established databases becoming defunct. Here, we review the current state of databases that contain binding pockets and protein-ligand binding interactions. We have compiled a list of such databases, fifty-three of which are currently available for use. We discuss variation in how binding pockets are defined and summarize pocket-finding methods. We organize the fifty-three databases into subgroups based on goals and contents, and describe standard use cases. We also illustrate that pockets within the same protein are characterized differently across different databases. Finally, we assess critical issues of sustainability, accessibility and redundancy.
Collapse
Affiliation(s)
- Kristy A. Carpenter
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - Russ B. Altman
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
- Department of Medicine, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
3
|
Zhang J, Basu S, Zhang F, Kurgan L. MERIT: Accurate Prediction of Multi Ligand-binding Residues with Hybrid Deep Transformer Network, Evolutionary Couplings and Transfer Learning. J Mol Biol 2024:168872. [PMID: 40133785 DOI: 10.1016/j.jmb.2024.168872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Revised: 10/30/2024] [Accepted: 11/15/2024] [Indexed: 03/27/2025]
Abstract
Multi-ligand binding residues (MLBRs) are amino acids in protein sequences that interact with multiple different ligands that include proteins, peptides, nucleic acids, and a variety of small molecules. MLBRs are implicated in a number of cellular functions and targeted in a context of multiple human diseases. There are many sequence-based predictors of residues that interact with specific ligand types and they can be collectively used to identify MLBRs. However, there are no methods that directly predict MLBRs. To this end, we conceptualize, design, evaluate and release MERIT (Multi-binding rEsidues pRedIcTor). This tool relies on a custom-crafted deep neural network that implements a number of innovative features, such as a multi-layered/step architecture with transformer modules that we train using a custom-designed loss function, computation of evolutionary couplings, and application of transfer learning. These innovations boost predictive performance, which we demonstrate using an ablation analysis. In particular, they reduce the number of cross-predictions, defined as residues that interact with a single ligand type that are incorrectly predicted as MLBRs. We compare MERIT against a representative selection of current and popular ligand-specific predictors, meta-predictors that combine their results to identify MLBRs, and a baseline regression-based predictor. These tests reveal that MERIT provides accurate predictions and statistically outperforms these alternatives. Moreover, using two test datasets, one with MLBRs and another with only the single ligand binding residues, we show that MERIT consistently produces relatively low false positive rates, including low rates of cross-predictions. The web server and datasets from this study are freely available at http://biomine.cs.vcu.edu/servers/MERIT/.
Collapse
Affiliation(s)
- Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, China; Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003, China.
| | - Sushmita Basu
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Fuhao Zhang
- College of Information Engineering, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA.
| |
Collapse
|
4
|
Wang J, Liu Y, Tian B. Protein-small molecule binding site prediction based on a pre-trained protein language model with contrastive learning. J Cheminform 2024; 16:125. [PMID: 39506806 PMCID: PMC11542454 DOI: 10.1186/s13321-024-00920-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Accepted: 10/20/2024] [Indexed: 11/08/2024] Open
Abstract
Predicting protein-small molecule binding sites, the initial step in structure-guided drug design, remains challenging for proteins lacking experimentally derived ligand-bound structures. Here, we propose CLAPE-SMB, which integrates a pre-trained protein language model with contrastive learning to provide high accuracy predictions of small molecule binding sites that can accommodate proteins without a published crystal structure. We trained and tested CLAPE-SMB on the SJC dataset, a non-redundant dataset based on sc-PDB, JOINED, and COACH420, and achieved an MCC of 0.529. We also compiled the UniProtSMB dataset, which merges sites from similar proteins based on raw data from UniProtKB database, and achieved an MCC of 0.699 on the test set. In addition, CLAPE-SMB achieved an MCC of 0.815 on our intrinsically disordered protein (IDP) dataset that contains 336 non-redundant sequences. Case studies of DAPK1, RebH, and Nep1 support the potential of this binding site prediction tool to aid in drug design. The code and datasets are freely available at https://github.com/JueWangTHU/CLAPE-SMB . SCIENTIFIC CONTRIBUTION: CLAPE-SMB combines a pre-trained protein language model with contrastive learning to accurately predict protein-small molecule binding sites, especially for proteins without experimental structures, such as IDPs. Trained across various datasets, this model shows strong adaptability, making it a valuable tool for advancing drug design and understanding protein-small molecule interactions.
Collapse
Affiliation(s)
- Jue Wang
- MOE Key Laboratory of Bioinformatics, State Key Laboratory of Molecular Oncology, Beijing Frontier Research Center for Biological Structure, School of Pharmaceutical Sciences, Tsinghua University, Beijing, 100084, China
| | - Yufan Liu
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Kingdom of Saudi Arabia
| | - Boxue Tian
- MOE Key Laboratory of Bioinformatics, State Key Laboratory of Molecular Oncology, Beijing Frontier Research Center for Biological Structure, School of Pharmaceutical Sciences, Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
5
|
Khaledi M, Khatami M, Hemmati J, Bakhti S, Hoseini SA, Ghahramanpour H. Role of Small Non-Coding RNA in Gram-Negative Bacteria: New Insights and Comprehensive Review of Mechanisms, Functions, and Potential Applications. Mol Biotechnol 2024:10.1007/s12033-024-01248-w. [PMID: 39153013 DOI: 10.1007/s12033-024-01248-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 08/02/2024] [Indexed: 08/19/2024]
Abstract
Small non-coding RNAs (sRNAs) are a key part of gene expression regulation in bacteria. Many physiologic activities like adaptation to environmental stresses, antibiotic resistance, quorum sensing, and modulation of the host immune response are regulated directly or indirectly by sRNAs in Gram-negative bacteria. Therefore, sRNAs can be considered as potentially useful therapeutic options. They have opened promising perspectives in the field of diagnosis of pathogens and treatment of infections caused by antibiotic-resistant organisms. Identification of sRNAs can be executed by sequence and expression-based methods. Despite the valuable progress in the last two decades, and discovery of new sRNAs, their exact role in biological pathways especially in co-operation with other biomolecules involved in gene expression regulation such as RNA-binding proteins (RBPs), riboswitches, and other sRNAs needs further investigation. Although the numerous RNA databases are available, including 59 databases used by RNAcentral, there remains a significant gap in the absence of a comprehensive and professional database that categorizes experimentally validated sRNAs in Gram-negative pathogens. Here, we review the present knowledge about most recent and important sRNAs and their regulatory mechanism, strengths and weaknesses of current methods of sRNAs identification. Also, we try to demonstrate the potential applications and new insights of sRNAs for future studies.
Collapse
Affiliation(s)
- Mansoor Khaledi
- Cellular and Molecular Research Center, Basic Health Sciences Institute, Shahrekord University of Medical Sciences, Shahrekord, Iran
- Department of Microbiology and Immunology, School of Medicine, Shahrekord University of Medical Sciences, Shahrekord, Iran
| | - Mehrdad Khatami
- Department of Medical Biotechnology, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran
| | - Jaber Hemmati
- Department of Microbiology, Faculty of Medicine, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Shahriar Bakhti
- Department of Microbiology, Faculty of Medicine, Shahed University, Tehran, Iran
| | | | - Hossein Ghahramanpour
- Department of Bacteriology, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran.
| |
Collapse
|
6
|
Xia S, Chen E, Zhang Y. Integrated Molecular Modeling and Machine Learning for Drug Design. J Chem Theory Comput 2023; 19:7478-7495. [PMID: 37883810 PMCID: PMC10653122 DOI: 10.1021/acs.jctc.3c00814] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 10/10/2023] [Accepted: 10/11/2023] [Indexed: 10/28/2023]
Abstract
Modern therapeutic development often involves several stages that are interconnected, and multiple iterations are usually required to bring a new drug to the market. Computational approaches have increasingly become an indispensable part of helping reduce the time and cost of the research and development of new drugs. In this Perspective, we summarize our recent efforts on integrating molecular modeling and machine learning to develop computational tools for modulator design, including a pocket-guided rational design approach based on AlphaSpace to target protein-protein interactions, delta machine learning scoring functions for protein-ligand docking as well as virtual screening, and state-of-the-art deep learning models to predict calculated and experimental molecular properties based on molecular mechanics optimized geometries. Meanwhile, we discuss remaining challenges and promising directions for further development and use a retrospective example of FDA approved kinase inhibitor Erlotinib to demonstrate the use of these newly developed computational tools.
Collapse
Affiliation(s)
- Song Xia
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Eric Chen
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Yingkai Zhang
- Department
of Chemistry, New York University, New York, New York 10003, United States
- Simons
Center for Computational Physical Chemistry at New York University, New York, New York 10003, United States
- NYU-ECNU
Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
7
|
Gagliardi L, Rocchia W. SiteFerret: Beyond Simple Pocket Identification in Proteins. J Chem Theory Comput 2023; 19:5242-5259. [PMID: 37470784 PMCID: PMC10413863 DOI: 10.1021/acs.jctc.2c01306] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Indexed: 07/21/2023]
Abstract
We present a novel method for the automatic detection of pockets on protein molecular surfaces. The algorithm is based on an ad hoc hierarchical clustering of virtual probe spheres obtained from the geometrical primitives used by the NanoShaper software to build the solvent-excluded molecular surface. The final ranking of putative pockets is based on the Isolation Forest method, an unsupervised learning approach originally developed for anomaly detection. A detailed importance analysis of pocket features provides insight into which geometrical (clustering) and chemical (amino acidic composition) properties characterize a good binding site. The method also provides a segmentation of pockets into smaller subpockets. We prove that subpockets are a convenient representation to pinpoint the binding site with great precision. SiteFerret is outstanding in its versatility, accurately predicting a wide range of binding sites, from those binding small molecules to those binding peptides, including difficult shallow sites.
Collapse
Affiliation(s)
| | - Walter Rocchia
- CONCEPT Lab, Istituto Italiano di Tecnologia, Via Melen - 83, B Block, 16152 Genova, Italy
| |
Collapse
|
8
|
Petrovski ŽH, Hribar-Lee B, Bosnić Z. CAT-Site: Predicting Protein Binding Sites Using a Convolutional Neural Network. Pharmaceutics 2022; 15:pharmaceutics15010119. [PMID: 36678749 PMCID: PMC9862895 DOI: 10.3390/pharmaceutics15010119] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 12/18/2022] [Accepted: 12/22/2022] [Indexed: 01/01/2023] Open
Abstract
Identifying binding sites on the protein surface is an important part of computer-assisted drug design processes. Reliable prediction of binding sites not only assists with docking algorithms, but it can also explain the possible side-effects of a potential drug as well as its efficiency. In this work, we propose a novel workflow for predicting possible binding sites of a ligand on a protein surface. We use proteins from the PDBbind and sc-PDB databases, from which we combine available ligand information for similar proteins using all the possible ligands rather than only a special sub-selection to generalize the work of existing research. After performing protein clustering and merging of ligands of similar proteins, we use a three-dimensional convolutional neural network that takes into account the spatial structure of a protein. Lastly, we combine ligandability predictions for points on protein surfaces into joint binding sites. Analysis of our model's performance shows that its achieved sensitivity is 0.829, specificity is 0.98, and F1 score is 0.517, and that for 54% of larger and pharmacologically relevant binding sites, the distance between their real and predicted centers amounts to less than 4 Å.
Collapse
Affiliation(s)
- Žan Hafner Petrovski
- University of Ljubljana, Faculty of Computer and Information Science, SI-1000 Ljubljana, Slovenia
| | - Barbara Hribar-Lee
- University of Ljubljana, Faculty of Chemistry and Chemical Technology, SI-1000 Ljubljana, Slovenia
- Correspondence: (B.-H.L.); (Z.B.)
| | - Zoran Bosnić
- University of Ljubljana, Faculty of Computer and Information Science, SI-1000 Ljubljana, Slovenia
- Correspondence: (B.-H.L.); (Z.B.)
| |
Collapse
|
9
|
Zacharioudakis E, Gavathiotis E. Targeting protein conformations with small molecules to control protein complexes. Trends Biochem Sci 2022; 47:1023-1037. [PMID: 35985943 PMCID: PMC9669135 DOI: 10.1016/j.tibs.2022.07.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2022] [Revised: 06/23/2022] [Accepted: 07/11/2022] [Indexed: 12/24/2022]
Abstract
Dynamic protein complexes function in all cellular processes, from signaling to transcription, using distinct conformations that regulate their activity. Conformational switching of proteins can turn on or off their activity through protein-protein interactions, catalytic function, cellular localization, or membrane interaction. Recent advances in structural, computational, and chemical methodologies have enabled the discovery of small-molecule activators and inhibitors of conformationally dynamic proteins by using a more rational design than a serendipitous screening approach. Here, we discuss such recent examples, focusing on the mechanism of protein conformational switching and its regulation by small molecules. We emphasize the rational approaches to control protein oligomerization with small molecules that offer exciting opportunities for investigation of novel biological mechanisms and drug discovery.
Collapse
Affiliation(s)
- Emmanouil Zacharioudakis
- Department of Biochemistry, Albert Einstein College of Medicine, Bronx, NY, USA; Department of Medicine, Albert Einstein College of Medicine, Bronx, NY, USA; Albert Einstein Cancer Center, Albert Einstein College of Medicine, Bronx, NY, USA; Wilf Family Cardiovascular Research Institute, Albert Einstein College of Medicine, Bronx, NY, USA; Institute for Aging Research, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Evripidis Gavathiotis
- Department of Biochemistry, Albert Einstein College of Medicine, Bronx, NY, USA; Department of Medicine, Albert Einstein College of Medicine, Bronx, NY, USA; Albert Einstein Cancer Center, Albert Einstein College of Medicine, Bronx, NY, USA; Wilf Family Cardiovascular Research Institute, Albert Einstein College of Medicine, Bronx, NY, USA; Institute for Aging Research, Albert Einstein College of Medicine, Bronx, NY, USA.
| |
Collapse
|
10
|
Simončič M, Lukšič M, Druchok M. Machine learning assessment of the binding region as a tool for more efficient computational receptor-ligand docking. J Mol Liq 2022; 353:118759. [PMID: 35273421 PMCID: PMC8903148 DOI: 10.1016/j.molliq.2022.118759] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
We present a combined computational approach to protein-ligand binding, which consists of two steps: (1) a deep neural network is used to locate a binding region on a target protein, and (2) molecular docking of a ligand is performed within the specified region to obtain the best pose using Autodock Vina. Our in-house designed neural network was trained using the PepBDB dataset. Although the training dataset consisted of protein-peptide complexes, we show that the approach is not limited to peptides, but also works remarkably well for a large class of non-peptide ligands. The results are compared with those in which the binding region (first step) was provided by Accluster. In cases where no prior experimental data on the binding region are available, our deep neural network provides a fast and effective alternative to classical software for its localization. Our code is available at https://github.com/mksmd/NNforDocking.
Collapse
Affiliation(s)
- Matjaž Simončič
- Faculty of Chemistry and Chemical Technology, University of Ljubljana, Večna pot 113, SI-1000 Ljubljana, Slovenia
| | - Miha Lukšič
- Faculty of Chemistry and Chemical Technology, University of Ljubljana, Večna pot 113, SI-1000 Ljubljana, Slovenia
| | - Maksym Druchok
- Institute for Condensed Matter Physics, 1 Svientsitskii Str., UA-79011 Lviv, Ukraine
- SoftServe Inc., 2d Sadova Str., UA-79021 Lviv, Ukraine
| |
Collapse
|
11
|
Dhakal A, McKay C, Tanner JJ, Cheng J. Artificial intelligence in the prediction of protein-ligand interactions: recent advances and future directions. Brief Bioinform 2022; 23:bbab476. [PMID: 34849575 PMCID: PMC8690157 DOI: 10.1093/bib/bbab476] [Citation(s) in RCA: 104] [Impact Index Per Article: 34.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 09/28/2021] [Accepted: 10/15/2021] [Indexed: 12/13/2022] Open
Abstract
New drug production, from target identification to marketing approval, takes over 12 years and can cost around $2.6 billion. Furthermore, the COVID-19 pandemic has unveiled the urgent need for more powerful computational methods for drug discovery. Here, we review the computational approaches to predicting protein-ligand interactions in the context of drug discovery, focusing on methods using artificial intelligence (AI). We begin with a brief introduction to proteins (targets), ligands (e.g. drugs) and their interactions for nonexperts. Next, we review databases that are commonly used in the domain of protein-ligand interactions. Finally, we survey and analyze the machine learning (ML) approaches implemented to predict protein-ligand binding sites, ligand-binding affinity and binding pose (conformation) including both classical ML algorithms and recent deep learning methods. After exploring the correlation between these three aspects of protein-ligand interaction, it has been proposed that they should be studied in unison. We anticipate that our review will aid exploration and development of more accurate ML-based prediction strategies for studying protein-ligand interactions.
Collapse
Affiliation(s)
- Ashwin Dhakal
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA
| | - Cole McKay
- Department of Biochemistry, University of Missouri, Columbia, MO, 65211, USA
| | - John J Tanner
- Department of Biochemistry, University of Missouri, Columbia, MO, 65211, USA
- Department of Chemistry, University of Missouri, Columbia, MO, 65211, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA
| |
Collapse
|
12
|
Castro LHE, Sant'Anna CMR. Molecular Modeling Techniques Applied to the Design of Multitarget Drugs: Methods and Applications. Curr Top Med Chem 2021; 22:333-346. [PMID: 34844540 DOI: 10.2174/1568026621666211129140958] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 10/23/2021] [Accepted: 10/28/2021] [Indexed: 11/22/2022]
Abstract
Multifactorial diseases, such as cancer and diabetes present a challenge for the traditional "one-target, one disease" paradigm due to their complex pathogenic mechanisms. Although a combination of drugs can be used, a multitarget drug may be a better choice face of its efficacy, lower adverse effects and lower chance of resistance development. The computer-based design of these multitarget drugs can explore the same techniques used for single-target drug design, but the difficulties associated to the obtention of drugs that are capable of modulating two or more targets with similar efficacy impose new challenges, whose solutions involve the adaptation of known techniques and also to the development of new ones, including machine-learning approaches. In this review, some SBDD and LBDD techniques for the multitarget drug design are discussed, together with some cases where the application of such techniques led to effective multitarget ligands.
Collapse
Affiliation(s)
| | - Carlos Mauricio R Sant'Anna
- Programa de Pós-Graduação em Química, Instituto de Química, Universidade Federal Rural do Rio de Janeiro, Seropédica. Brazil
| |
Collapse
|
13
|
CAVIAR: a method for automatic cavity detection, description and decomposition into subcavities. J Comput Aided Mol Des 2021; 35:737-750. [PMID: 34050420 DOI: 10.1007/s10822-021-00390-w] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Accepted: 05/11/2021] [Indexed: 10/21/2022]
Abstract
The accurate description of protein binding sites is essential to the determination of similarity and the application of machine learning methods to relate the binding sites to observed functions. This work describes CAVIAR, a new open source tool for generating descriptors for binding sites, using protein structures in PDB and mmCIF format as well as trajectory frames from molecular dynamics simulations as input. The applicability of CAVIAR descriptors is showcased by computing machine learning predictions of binding site ligandability. The method can also automatically assign subcavities, even in the absence of a bound ligand. The defined subpockets mimic the empirical definitions used in medicinal chemistry projects. It is shown that the experimental binding affinity scales relatively well with the number of subcavities filled by the ligand, with compounds binding to more than three subcavities having nanomolar or better affinities to the target. The CAVIAR descriptors and methods can be used in any machine learning-based investigations of problems involving binding sites, from protein engineering to hit identification. The full software code is available on GitHub and a conda package is hosted on Anaconda cloud.
Collapse
|
14
|
Mylonas SK, Axenopoulos A, Daras P. DeepSurf: A surface-based deep learning approach for the prediction of ligand binding sites on proteins. Bioinformatics 2021; 37:1681-1690. [PMID: 33471069 DOI: 10.1093/bioinformatics/btab009] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2020] [Revised: 12/16/2020] [Accepted: 01/05/2021] [Indexed: 12/29/2022] Open
Abstract
MOTIVATION The knowledge of potentially druggable binding sites on proteins is an important preliminary step towards the discovery of novel drugs. The computational prediction of such areas can be boosted by following the recent major advances in the deep learning field and by exploiting the increasing availability of proper data. RESULTS In this paper, a novel computational method for the prediction of potential binding sites is proposed, called DeepSurf. DeepSurf combines a surface-based representation, where a number of 3 D voxelized grids are placed on the protein's surface, with state-of-the-art deep learning architectures. After being trained on the large database of scPDB, DeepSurf demonstrates superior results on three diverse testing datasets, by surpassing all its main deep learning-based competitors, while attaining competitive performance to a set of traditional non-data-driven approaches. AVAILABILITY The source code of the method along with trained models are freely available at https://github.com/stemylonas/DeepSurf.git. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Stelios K Mylonas
- Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, 57001, Greece
| | - Apostolos Axenopoulos
- Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, 57001, Greece
| | - Petros Daras
- Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, 57001, Greece
| |
Collapse
|
15
|
Santana CA, Silveira SDA, Moraes JPA, Izidoro SC, de Melo-Minardi RC, Ribeiro AJM, Tyzack JD, Borkakoti N, Thornton JM. GRaSP: a graph-based residue neighborhood strategy to predict binding sites. Bioinformatics 2020; 36:i726-i734. [DOI: 10.1093/bioinformatics/btaa805] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/08/2020] [Indexed: 01/22/2023] Open
Abstract
Abstract
Motivation
The discovery of protein–ligand-binding sites is a major step for elucidating protein function and for investigating new functional roles. Detecting protein–ligand-binding sites experimentally is time-consuming and expensive. Thus, a variety of in silico methods to detect and predict binding sites was proposed as they can be scalable, fast and present low cost.
Results
We proposed Graph-based Residue neighborhood Strategy to Predict binding sites (GRaSP), a novel residue centric and scalable method to predict ligand-binding site residues. It is based on a supervised learning strategy that models the residue environment as a graph at the atomic level. Results show that GRaSP made compatible or superior predictions when compared with methods described in the literature. GRaSP outperformed six other residue-centric methods, including the one considered as state-of-the-art. Also, our method achieved better results than the method from CAMEO independent assessment. GRaSP ranked second when compared with five state-of-the-art pocket-centric methods, which we consider a significant result, as it was not devised to predict pockets. Finally, our method proved scalable as it took 10–20 s on average to predict the binding site for a protein complex whereas the state-of-the-art residue-centric method takes 2–5 h on average.
Availability and implementation
The source code and datasets are available at https://github.com/charles-abreu/GRaSP.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Charles A Santana
- Department of Biochemistry and Immunology
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - Sabrina de A Silveira
- Department of Computer Science, Universidade Federal de Viçosa, Viçosa 36570-900, Brazil
- Institute of Technological Sciences (ICT), Advanced Campus at Itabira, Universidade Federal de Itajubá, Itabira 35903-087, Brazil
| | - João P A Moraes
- Institute of Technological Sciences (ICT), Advanced Campus at Itabira, Universidade Federal de Itajubá, Itabira 35903-087, Brazil
| | - Sandro C Izidoro
- Institute of Technological Sciences (ICT), Advanced Campus at Itabira, Universidade Federal de Itajubá, Itabira 35903-087, Brazil
| | - Raquel C de Melo-Minardi
- Department of Biochemistry and Immunology
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - António J M Ribeiro
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jonathan D Tyzack
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Neera Borkakoti
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Janet M Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
16
|
Gervasoni S, Vistoli G, Talarico C, Manelfi C, Beccari AR, Studer G, Tauriello G, Waterhouse AM, Schwede T, Pedretti A. A Comprehensive Mapping of the Druggable Cavities within the SARS-CoV-2 Therapeutically Relevant Proteins by Combining Pocket and Docking Searches as Implemented in Pockets 2.0. Int J Mol Sci 2020; 21:ijms21145152. [PMID: 32708196 PMCID: PMC7403965 DOI: 10.3390/ijms21145152] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 07/10/2020] [Accepted: 07/14/2020] [Indexed: 12/14/2022] Open
Abstract
(1) Background: Virtual screening studies on the therapeutically relevant proteins of the severe acute respiratory syndrome Coronavirus 2 (SARS-CoV-2) require a detailed characterization of their druggable binding sites, and, more generally, a convenient pocket mapping represents a key step for structure-based in silico studies; (2) Methods: Along with a careful literature search on SARS-CoV-2 protein targets, the study presents a novel strategy for pocket mapping based on the combination of pocket (as performed by the well-known FPocket tool) and docking searches (as performed by PLANTS or AutoDock/Vina engines); such an approach is implemented by the Pockets 2.0 plug-in for the VEGA ZZ suite of programs; (3) Results: The literature analysis allowed the identification of 16 promising binding cavities within the SARS-CoV-2 proteins and the here proposed approach was able to recognize them showing performances clearly better than those reached by the sole pocket detection; and (4) Conclusions: Even though the presented strategy should require more extended validations, this proved successful in precisely characterizing a set of SARS-CoV-2 druggable binding pockets including both orthosteric and allosteric sites, which are clearly amenable for virtual screening campaigns and drug repurposing studies. All results generated by the study and the Pockets 2.0 plug-in are available for download.
Collapse
Affiliation(s)
- Silvia Gervasoni
- Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, Via Mangiagalli, 25, I-20133 Milano, Italy; (S.G.); (G.V.)
| | - Giulio Vistoli
- Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, Via Mangiagalli, 25, I-20133 Milano, Italy; (S.G.); (G.V.)
| | - Carmine Talarico
- Dompé Farmaceutici SpA, Via Campo di Pile, I-67100 L’Aquila, Italy; (C.T.); (C.M.); (A.R.B.)
| | - Candida Manelfi
- Dompé Farmaceutici SpA, Via Campo di Pile, I-67100 L’Aquila, Italy; (C.T.); (C.M.); (A.R.B.)
| | - Andrea R. Beccari
- Dompé Farmaceutici SpA, Via Campo di Pile, I-67100 L’Aquila, Italy; (C.T.); (C.M.); (A.R.B.)
| | - Gabriel Studer
- Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland; (G.S.); (G.T.); (A.M.W.); (T.S.)
- SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
| | - Gerardo Tauriello
- Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland; (G.S.); (G.T.); (A.M.W.); (T.S.)
- SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
| | - Andrew Mark Waterhouse
- Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland; (G.S.); (G.T.); (A.M.W.); (T.S.)
- SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
| | - Torsten Schwede
- Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland; (G.S.); (G.T.); (A.M.W.); (T.S.)
- SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland
| | - Alessandro Pedretti
- Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, Via Mangiagalli, 25, I-20133 Milano, Italy; (S.G.); (G.V.)
- Correspondence: ; Tel.: +39-02-503-19332
| |
Collapse
|