1
|
Eguida M, Rognan D. Estimating the Similarity between Protein Pockets. Int J Mol Sci 2022; 23:12462. [PMID: 36293316 PMCID: PMC9604425 DOI: 10.3390/ijms232012462] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Revised: 10/15/2022] [Accepted: 10/16/2022] [Indexed: 10/28/2023] Open
Abstract
With the exponential increase in publicly available protein structures, the comparison of protein binding sites naturally emerged as a scientific topic to explain observations or generate hypotheses for ligand design, notably to predict ligand selectivity for on- and off-targets, explain polypharmacology, and design target-focused libraries. The current review summarizes the state-of-the-art computational methods applied to pocket detection and comparison as well as structural druggability estimates. The major strengths and weaknesses of current pocket descriptors, alignment methods, and similarity search algorithms are presented. Lastly, an exhaustive survey of both retrospective and prospective applications in diverse medicinal chemistry scenarios illustrates the capability of the existing methods and the hurdle that still needs to be overcome for more accurate predictions.
Collapse
Affiliation(s)
| | - Didier Rognan
- Laboratoire d’Innovation Thérapeutique, UMR7200 CNRS-Université de Strasbourg, 67400 Illkirch, France
| |
Collapse
|
2
|
Konc J, Lešnik S, Škrlj B, Janežič D. ProBiS-Dock Database: A Web Server and Interactive Web Repository of Small Ligand-Protein Binding Sites for Drug Design. J Chem Inf Model 2021; 61:4097-4107. [PMID: 34319727 DOI: 10.1021/acs.jcim.1c00454] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
We have developed a new system, ProBiS-Dock, which can be used to determine the different types of protein binding sites for small ligands. The binding sites identified this way are then used to construct a new binding site database, the ProBiS-Dock Database, that allows for the ranking of binding sites according to their utility for drug development. The newly constructed database currently has more than 1.4 million binding sites and offers the possibility to investigate potential drug targets originating from different biological species. The interactive ProBiS-Dock Database, a web server and repository that consists of all small-molecule ligand binding sites in all of the protein structures in the Protein Data Bank, is freely available at http://probis-dock-database.insilab.org. The ProBiS-Dock Database will be regularly updated to keep pace with the growth of the Protein Data Bank, and our anticipation is that it will be useful in drug discovery.
Collapse
Affiliation(s)
- Janez Konc
- Theory Department, National Institute of Chemistry, Hajdrihova 19, SI-1000 Ljubljana, Slovenia
| | - Samo Lešnik
- Theory Department, National Institute of Chemistry, Hajdrihova 19, SI-1000 Ljubljana, Slovenia
| | - Blaž Škrlj
- Theory Department, National Institute of Chemistry, Hajdrihova 19, SI-1000 Ljubljana, Slovenia.,Jozef Stefan International Postgraduate School, Jamova cesta 39, SI-1000 Ljubljana, Slovenia
| | - Dušanka Janežič
- Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Glagoljaška ulica 8, SI-6000 Koper, Slovenia
| |
Collapse
|
3
|
Krivák R, Hoksza D. P2Rank: machine learning based tool for rapid and accurate prediction of ligand binding sites from protein structure. J Cheminform 2018; 10:39. [PMID: 30109435 PMCID: PMC6091426 DOI: 10.1186/s13321-018-0285-8] [Citation(s) in RCA: 226] [Impact Index Per Article: 32.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2017] [Accepted: 06/29/2018] [Indexed: 01/29/2023] Open
Abstract
Background Ligand binding site prediction from protein structure has many applications related to elucidation of protein function and structure based drug discovery. It often represents only one step of many in complex computational drug design efforts. Although many methods have been published to date, only few of them are suitable for use in automated pipelines or for processing large datasets.
These use cases require stability and speed, which disqualifies many of the recently introduced tools that are either template based or available only as web servers. Results We present P2Rank, a stand-alone template-free tool for prediction of ligand binding sites based on machine learning. It is based on prediction of ligandability of local chemical neighbourhoods that are centered on points placed on the solvent accessible surface of a protein.
We show that P2Rank outperforms several existing tools, which include two widely used stand-alone tools (Fpocket, SiteHound), a comprehensive consensus based tool (MetaPocket 2.0), and a recent deep learning based method (DeepSite). P2Rank belongs to the fastest available tools (requires under 1 s for prediction on one protein), with additional advantage of multi-threaded implementation. Conclusions P2Rank is a new open source software package for ligand binding site prediction from protein structure. It is available as a user-friendly stand-alone command line program and a Java library. P2Rank has a lightweight installation and does not depend on other bioinformatics tools or large structural or sequence databases. Thanks to its speed and ability to make fully automated predictions, it is particularly well suited for processing large datasets or as a component of scalable structural bioinformatics pipelines. Electronic supplementary material The online version of this article (10.1186/s13321-018-0285-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Radoslav Krivák
- Department of Software Engineering, Charles University, Prague, Czech Republic.
| | - David Hoksza
- Department of Software Engineering, Charles University, Prague, Czech Republic.
| |
Collapse
|
4
|
Abstract
Ligandability is a prerequisite for druggability and is a much easier concept to understand, model and predict because it does not depend on the complex pharmacodynamic and pharmacokinetic mechanisms in the human body. In this review, we consider a metric for quantifying ligandability from experimental data. We discuss ligandability in terms of the balance between effort and reward. The metric is evaluated for a standard set of well-studied drug targets - some traditionally considered to be ligandable and some regarded as difficult. We suggest that this metric should be used to systematically improve computational predictions of ligandability, which can then be applied to novel drug targets to predict their tractability.
Collapse
|
5
|
Vukovic S, Brennan PE, Huggins DJ. Exploring the role of water in molecular recognition: predicting protein ligandability using a combinatorial search of surface hydration sites. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2016; 28:344007. [PMID: 27367338 DOI: 10.1088/0953-8984/28/34/344007] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The interaction between any two biological molecules must compete with their interaction with water molecules. This makes water the most important molecule in medicine, as it controls the interactions of every therapeutic with its target. A small molecule binding to a protein is able to recognize a unique binding site on a protein by displacing bound water molecules from specific hydration sites. Quantifying the interactions of these water molecules allows us to estimate the potential of the protein to bind a small molecule. This is referred to as ligandability. In the study, we describe a method to predict ligandability by performing a search of all possible combinations of hydration sites on protein surfaces. We predict ligandability as the summed binding free energy for each of the constituent hydration sites, computed using inhomogeneous fluid solvation theory. We compared the predicted ligandability with the maximum observed binding affinity for 20 proteins in the human bromodomain family. Based on this comparison, it was determined that effective inhibitors have been developed for the majority of bromodomains, in the range from 10 to 100 nM. However, we predict that more potent inhibitors can be developed for the bromodomains BPTF and BRD7 with relative ease, but that further efforts to develop inhibitors for ATAD2 will be extremely challenging. We have also made predictions for the 14 bromodomains with no reported small molecule K d values by isothermal titration calorimetry. The calculations predict that PBRM1(1) will be a challenging target, while others such as TAF1L(2), PBRM1(4) and TAF1(2), should be highly ligandable. As an outcome of this work, we assembled a database of experimental maximal K d that can serve as a community resource assisting medicinal chemistry efforts focused on BRDs. Effective prediction of ligandability would be a very useful tool in the drug discovery process.
Collapse
Affiliation(s)
- Sinisa Vukovic
- Department of Physics, Cavendish Laboratory, University of Cambridge, 19 JJ Thomson Avenue, Cambridge, CB3 0HE, UK
| | | | | |
Collapse
|
6
|
Tsujikawa H, Sato K, Wei C, Saad G, Sumikoshi K, Nakamura S, Terada T, Shimizu K. Development of a protein-ligand-binding site prediction method based on interaction energy and sequence conservation. ACTA ACUST UNITED AC 2016; 17:39-49. [PMID: 27400687 PMCID: PMC5002282 DOI: 10.1007/s10969-016-9204-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2016] [Accepted: 06/20/2016] [Indexed: 11/30/2022]
Abstract
We present a new method for predicting protein–ligand-binding sites based on protein three-dimensional structure and amino acid conservation. This method involves calculation of the van der Waals interaction energy between a protein and many probes placed on the protein surface and subsequent clustering of the probes with low interaction energies to identify the most energetically favorable locus. In addition, it uses amino acid conservation among homologous proteins. Ligand-binding sites were predicted by combining the interaction energy and the amino acid conservation score. The performance of our prediction method was evaluated using a non-redundant dataset of 348 ligand-bound and ligand-unbound protein structure pairs, constructed by filtering entries in a ligand-binding site structure database, LigASite. Ligand-bound structure prediction (bound prediction) indicated that 74.0 % of predicted ligand-binding sites overlapped with real ligand-binding sites by over 25 % of their volume. Ligand-unbound structure prediction (unbound prediction) indicated that 73.9 % of predicted ligand-binding residues overlapped with real ligand-binding residues. The amino acid conservation score improved the average prediction accuracy by 17.0 and 17.6 points for the bound and unbound predictions, respectively. These results demonstrate the effectiveness of the combined use of the interaction energy and amino acid conservation in the ligand-binding site prediction.
Collapse
Affiliation(s)
- Hiroto Tsujikawa
- Department of Biotechnology, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-Ku, Tokyo, 113-8657, Japan
| | - Kenta Sato
- Department of Biotechnology, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-Ku, Tokyo, 113-8657, Japan
| | - Cao Wei
- Department of Biotechnology, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-Ku, Tokyo, 113-8657, Japan
| | - Gul Saad
- Department of Biotechnology, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-Ku, Tokyo, 113-8657, Japan
| | - Kazuya Sumikoshi
- Department of Biotechnology, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-Ku, Tokyo, 113-8657, Japan
| | - Shugo Nakamura
- Department of Biotechnology, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-Ku, Tokyo, 113-8657, Japan
| | - Tohru Terada
- Department of Biotechnology, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-Ku, Tokyo, 113-8657, Japan
| | - Kentaro Shimizu
- Department of Biotechnology, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-Ku, Tokyo, 113-8657, Japan.
| |
Collapse
|
7
|
Grove LE, Vajda S, Kozakov D. Computational Methods to Support Fragment-based Drug Discovery. FRAGMENT-BASED DRUG DISCOVERY LESSONS AND OUTLOOK 2016. [DOI: 10.1002/9783527683604.ch09] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
|
8
|
Krivák R, Hoksza D. Improving protein-ligand binding site prediction accuracy by classification of inner pocket points using local features. J Cheminform 2015; 7:12. [PMID: 25932051 PMCID: PMC4414931 DOI: 10.1186/s13321-015-0059-5] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2014] [Accepted: 02/24/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein-ligand binding site prediction from a 3D protein structure plays a pivotal role in rational drug design and can be helpful in drug side-effects prediction or elucidation of protein function. Embedded within the binding site detection problem is the problem of pocket ranking - how to score and sort candidate pockets so that the best scored predictions correspond to true ligand binding sites. Although there exist multiple pocket detection algorithms, they mostly employ a fairly simple ranking function leading to sub-optimal prediction results. RESULTS We have developed a new pocket scoring approach (named PRANK) that prioritizes putative pockets according to their probability to bind a ligand. The method first carefully selects pocket points and labels them by physico-chemical characteristics of their local neighborhood. Random Forests classifier is subsequently applied to assign a ligandability score to each of the selected pocket point. The ligandability scores are finally merged into the resulting pocket score to be used for prioritization of the putative pockets. With the used of multiple datasets the experimental results demonstrate that the application of our method as a post-processing step greatly increases the quality of the prediction of Fpocket and ConCavity, two state of the art protein-ligand binding site prediction algorithms. CONCLUSIONS The positive experimental results show that our method can be used to improve the success rate, validity and applicability of existing protein-ligand binding site prediction tools. The method was implemented as a stand-alone program that currently contains support for Fpocket and Concavity out of the box, but is easily extendible to support other tools. PRANK is made freely available at http://siret.ms.mff.cuni.cz/prank.
Collapse
Affiliation(s)
- Radoslav Krivák
- Department of Software Engineering, Charles University in Prague, Prague, Czech Republic
| | - David Hoksza
- Department of Software Engineering, Charles University in Prague, Prague, Czech Republic
| |
Collapse
|
9
|
Flexibility and small pockets at protein-protein interfaces: New insights into druggability. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2015; 119:2-9. [PMID: 25662442 PMCID: PMC4726663 DOI: 10.1016/j.pbiomolbio.2015.01.009] [Citation(s) in RCA: 109] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/05/2014] [Revised: 01/06/2015] [Accepted: 01/28/2015] [Indexed: 01/04/2023]
Abstract
The transient assembly of multiprotein complexes mediates many aspects of cell regulation and signalling in living organisms. Modulation of the formation of these complexes through targeting protein-protein interfaces can offer greater selectivity than the inhibition of protein kinases, proteases or other post-translational regulatory enzymes using substrate, co-factor or transition state mimetics. However, capitalising on protein-protein interaction interfaces as drug targets has been hindered by the nature of interfaces that tend to offer binding sites lacking the well-defined large cavities of classical drug targets. In this review we posit that interfaces formed by concerted folding and binding (disorder-to-order transitions on binding) of one partner and other examples of interfaces where a protein partner is bound through a continuous epitope from a surface-exposed helix, flexible loop or chain extension may be more tractable for the development of "orthosteric", competitive chemical modulators; these interfaces tend to offer small-volume but deep pockets and/or larger grooves that may be bound tightly by small chemical entities. We discuss examples of such protein-protein interaction interfaces for which successful chemical modulators are being developed.
Collapse
|
10
|
Abstract
Ligand binding is required for many proteins to function properly. A large number of bioinformatics tools have been developed to predict ligand binding sites as a first step in understanding a protein's function or to facilitate docking computations in virtual screening based drug design. The prediction usually requires only the three-dimensional structure (experimentally determined or computationally modeled) of the target protein to be searched for ligand binding site(s), and Web servers have been built, allowing the free and simple use of prediction tools. In this chapter, we review the underlying concepts of the methods used by various tools, and discuss their different features and the related issues of ligand binding site prediction. Some cautionary notes about the use of these tools are also provided.
Collapse
Affiliation(s)
- Zhong-Ru Xie
- Institute of Biomedical Sciences, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei, 115, Taiwan
| | | |
Collapse
|
11
|
P2RANK: Knowledge-Based Ligand Binding Site Prediction Using Aggregated Local Features. ALGORITHMS FOR COMPUTATIONAL BIOLOGY 2015. [DOI: 10.1007/978-3-319-21233-3_4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
12
|
Dukka BK. Structure-based Methods for Computational Protein Functional Site Prediction. Comput Struct Biotechnol J 2013; 8:e201308005. [PMID: 24688745 PMCID: PMC3962076 DOI: 10.5936/csbj.201308005] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2013] [Revised: 11/07/2013] [Accepted: 11/11/2013] [Indexed: 11/22/2022] Open
Abstract
Due to the advent of high throughput sequencing techniques and structural genomic projects, the number of gene and protein sequences has been ever increasing. Computational methods to annotate these genes and proteins are even more indispensable. Proteins are important macromolecules and study of the function of proteins is an important problem in structural bioinformatics. This paper discusses a number of methods to predict protein functional site especially focusing on protein ligand binding site prediction. Initially, a short overview is presented on recent advances in methods for selection of homologous sequences. Furthermore, a few recent structural based approaches and sequence-and-structure based approaches for protein functional sites are discussed in details.
Collapse
Affiliation(s)
- B Kc Dukka
- Department of Computational Science and Engineering, North Carolina A&T State University, Greensboro, NC, 27411, USA
| |
Collapse
|
13
|
Bianchi V, Mangone I, Ferrè F, Helmer-Citterich M, Ausiello G. webPDBinder: a server for the identification of ligand binding sites on protein structures. Nucleic Acids Res 2013; 41:W308-13. [PMID: 23737450 PMCID: PMC3692056 DOI: 10.1093/nar/gkt457] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
The webPDBinder (http://pdbinder.bio.uniroma2.it/PDBinder) is a web server for the identification of small ligand-binding sites in a protein structure. webPDBinder searches a protein structure against a library of known binding sites and a collection of control non-binding pockets. The number of similarities identified with the residues in the two sets is then used to derive a propensity value for each residue of the query protein associated to the likelihood that the residue is part of a ligand binding site. The predicted binding residues can be further refined using conservation scores derived from the multiple alignment of the PFAM protein family. webPDBinder correctly identifies residues belonging to the binding site in 77% of the cases and is able to identify binding pockets starting from holo or apo structures with comparable performances. This is important for all the real world cases where the query protein has been crystallized without a ligand and is also difficult to obtain clear similarities with bound pockets from holo pocket libraries. The input is either a PDB code or a user-submitted structure. The output is a list of predicted binding pocket residues with propensity and conservation values both in text and graphical format.
Collapse
Affiliation(s)
- Valerio Bianchi
- Centre for Molecular Bioinformatics, Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica snc, 00133 Rome, Italy
| | | | | | | | | |
Collapse
|
14
|
Wang X, Mi G, Wang C, Zhang Y, Li J, Guo Y, Pu X, Li M. Prediction of flavin mono-nucleotide binding sites using modified PSSM profile and ensemble support vector machine. Comput Biol Med 2012; 42:1053-9. [PMID: 22985817 DOI: 10.1016/j.compbiomed.2012.08.005] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2012] [Revised: 07/12/2012] [Accepted: 08/13/2012] [Indexed: 11/25/2022]
Abstract
Flavin mono-nucleotide (FMN) closely evolves in many biological processes. In this study, a computational method was proposed to identify FMN binding sites based on amino acid sequences of proteins only. A modified Position Specific Score Matrix was used to characterize the local environmental sequence information, and a visible improvement of performance was obtained. Also, the ensemble SVM was applied to solve the imbalanced data problem. Additionally, an independent dataset was built to evaluate the practical performance of the method, and a satisfactory accuracy of 87.87% was achieved. It demonstrates that the method is effective in predicting FMN-binding sites.
Collapse
Affiliation(s)
- Xia Wang
- College of Chemistry, Sichuan University, Chengdu 610064, PR China
| | | | | | | | | | | | | | | |
Collapse
|
15
|
Biophysical and computational fragment-based approaches to targeting protein-protein interactions: applications in structure-guided drug discovery. Q Rev Biophys 2012; 45:383-426. [PMID: 22971516 DOI: 10.1017/s0033583512000108] [Citation(s) in RCA: 70] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Drug discovery has classically targeted the active sites of enzymes or ligand-binding sites of receptors and ion channels. In an attempt to improve selectivity of drug candidates, modulation of protein-protein interfaces (PPIs) of multiprotein complexes that mediate conformation or colocation of components of cell-regulatory pathways has become a focus of interest. However, PPIs in multiprotein systems continue to pose significant challenges, as they are generally large, flat and poor in distinguishing features, making the design of small molecule antagonists a difficult task. Nevertheless, encouragement has come from the recognition that a few amino acids - so-called hotspots - may contribute the majority of interaction-free energy. The challenges posed by protein-protein interactions have led to a wellspring of creative approaches, including proteomimetics, stapled α-helical peptides and a plethora of antibody inspired molecular designs. Here, we review a more generic approach: fragment-based drug discovery. Fragments allow novel areas of chemical space to be explored more efficiently, but the initial hits have low affinity. This means that they will not normally disrupt PPIs, unless they are tethered, an approach that has been pioneered by Wells and co-workers. An alternative fragment-based approach is to stabilise the uncomplexed components of the multiprotein system in solution and employ conventional fragment-based screening. Here, we describe the current knowledge of the structures and properties of protein-protein interactions and the small molecules that can modulate them. We then describe the use of sensitive biophysical methods - nuclear magnetic resonance, X-ray crystallography, surface plasmon resonance, differential scanning fluorimetry or isothermal calorimetry - to screen and validate fragment binding. Fragment hits can subsequently be evolved into larger molecules with higher affinity and potency. These may provide new leads for drug candidates that target protein-protein interactions and have therapeutic value.
Collapse
|
16
|
Ghersi D, Sanchez R. Automated identification of binding sites for phosphorylated ligands in protein structures. Proteins 2012; 80:2347-58. [PMID: 22619105 DOI: 10.1002/prot.24117] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2011] [Revised: 04/02/2012] [Accepted: 05/02/2012] [Indexed: 11/08/2022]
Abstract
Phosphorylation is a crucial step in many cellular processes, ranging from metabolic reactions involved in energy transformation to signaling cascades. In many instances, protein domains specifically recognize the phosphogroup. Knowledge of the binding site provides insights into the interaction, and it can also be exploited for therapeutic purposes. Previous studies have shown that proteins interacting with phosphogroups are highly heterogeneous, and no single property can be used to reliably identify the binding site. Here we present an energy-based computational procedure that exploits the protein three-dimensional structure to identify binding sites involved in the recognition of phosphogroups. The procedure is validated on three datasets containing more than 200 proteins binding to ATP, phosphopeptides, and phosphosugars. A comparison against other three generic binding site identification approaches shows higher accuracy values for our method, with a correct identification rate in the 80-90% range for the top three predicted sites. Addition of conservation information further improves the performance. The method presented here can be used as a first step in functional annotation or to guide mutagenesis experiments and further studies such as molecular docking.
Collapse
Affiliation(s)
- Dario Ghersi
- Department of Structural and Chemical Biology, Mount Sinai School of Medicine, New York, New York, USA
| | | |
Collapse
|
17
|
Xie ZR, Hwang MJ. Ligand-binding site prediction using ligand-interacting and binding site-enriched protein triangles. ACTA ACUST UNITED AC 2012; 28:1579-85. [PMID: 22495747 DOI: 10.1093/bioinformatics/bts182] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Knowledge about the site at which a ligand binds provides an important clue for predicting the function of a protein and is also often a prerequisite for performing docking computations in virtual drug design and screening. We have previously shown that certain ligand-interacting triangles of protein atoms, called protein triangles, tend to occur more frequently at ligand-binding sites than at other parts of the protein. RESULTS In this work, we describe a new ligand-binding site prediction method that was developed based on binding site-enriched protein triangles. The new method was tested on 2 benchmark datasets and on 19 targets from two recent community-based studies of such predictions, and excellent results were obtained. Where comparisons were made, the success rates for the new method for the first predicted site were significantly better than methods that are not a meta-predictor. Further examination showed that, for most of the unsuccessful predictions, the pocket of the ligand-binding site was identified, but not the site itself, whereas for some others, the failure was not due to the method itself but due to the use of an incorrect biological unit in the structure examined, although using correct biological units would not necessarily improve the prediction success rates. These results suggest that the new method is a valuable new addition to a suite of existing structure-based bioinformatics tools for studies of molecular recognition and related functions of proteins in post-genomics research. AVAILABILITY The executable binaries and a web server for our method are available from http://sourceforge.net/projects/msdock/ and http://lise.ibms.sinica.edu.tw, respectively, free for academic users.
Collapse
Affiliation(s)
- Zhong-Ru Xie
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei 112, Taiwan
| | | |
Collapse
|
18
|
Bianchi V, Gherardini PF, Helmer-Citterich M, Ausiello G. Identification of binding pockets in protein structures using a knowledge-based potential derived from local structural similarities. BMC Bioinformatics 2012; 13 Suppl 4:S17. [PMID: 22536963 PMCID: PMC3434446 DOI: 10.1186/1471-2105-13-s4-s17] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Background The identification of ligand binding sites is a key task in the annotation of proteins with known structure but uncharacterized function. Here we describe a knowledge-based method exploiting the observation that unrelated binding sites share small structural motifs that bind the same chemical fragments irrespective of the nature of the ligand as a whole. Results PDBinder compares a query protein against a library of binding and non-binding protein surface regions derived from the PDB. The results of the comparison are used to derive a propensity value for each residue which is correlated with the likelihood that the residue is part of a ligand binding site. The method was applied to two different problems: i) the prediction of ligand binding residues and ii) the identification of which surface cleft harbours the binding site. In both cases PDBinder performed consistently better than existing methods. PDBinder has been trained on a non-redundant set of 1356 high-quality protein-ligand complexes and tested on a set of 239 holo and apo complex pairs. We obtained an MCC of 0.313 on the holo set with a PPV of 0.413 while on the apo set we achieved an MCC of 0.271 and a PPV of 0.372. Conclusions We show that PDBinder performs better than existing methods. The good performance on the unbound proteins is extremely important for real-world applications where the location of the binding site is unknown. Moreover, since our approach is orthogonal to those used in other programs, the PDBinder propensity value can be integrated in other algorithms further increasing the final performance.
Collapse
Affiliation(s)
- Valerio Bianchi
- Centre for Molecular Bioinformatics, Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica snc, Rome 00133, Italy
| | | | | | | |
Collapse
|
19
|
Surade S, Blundell T. Structural Biology and Drug Discovery of Difficult Targets: The Limits of Ligandability. ACTA ACUST UNITED AC 2012; 19:42-50. [DOI: 10.1016/j.chembiol.2011.12.013] [Citation(s) in RCA: 156] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2011] [Revised: 11/08/2011] [Accepted: 12/09/2011] [Indexed: 02/05/2023]
|
20
|
Singh T, Biswas D, Jayaram B. AADS--an automated active site identification, docking, and scoring protocol for protein targets based on physicochemical descriptors. J Chem Inf Model 2011; 51:2515-27. [PMID: 21877713 DOI: 10.1021/ci200193z] [Citation(s) in RCA: 94] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
We report here a robust automated active site detection, docking, and scoring (AADS) protocol for proteins with known structures. The active site finder identifies all cavities in a protein and scores them based on the physicochemical properties of functional groups lining the cavities in the protein. The accuracy realized on 620 proteins with sizes ranging from 100 to 600 amino acids with known drug active sites is 100% when the top ten cavity points are considered. These top ten cavity points identified are then submitted for an automated docking of an input ligand/candidate molecule. The docking protocol uses an all atom energy based Monte Carlo method. Eight low energy docked structures corresponding to different locations and orientations of the candidate molecule are stored at each cavity point giving 80 docked structures overall which are then ranked using an effective free energy function and top five structures are selected. The predicted structure and energetics of the complexes agree quite well with experiment when tested on a data set of 170 protein-ligand complexes with known structures and binding affinities. The AADS methodology is implemented on an 80 processor cluster and presented as a freely accessible, easy to use tool at http://www.scfbio-iitd.res.in/dock/ActiveSite_new.jsp .
Collapse
Affiliation(s)
- Tanya Singh
- Department of Chemistry, Indian Institute of Technology, Hauz Khas, New Delhi 110016, India
| | | | | |
Collapse
|
21
|
Ghersi D, Sanchez R. Beyond structural genomics: computational approaches for the identification of ligand binding sites in protein structures. JOURNAL OF STRUCTURAL AND FUNCTIONAL GENOMICS 2011; 12:109-17. [PMID: 21537951 PMCID: PMC3127736 DOI: 10.1007/s10969-011-9110-6] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 12/02/2010] [Accepted: 04/20/2011] [Indexed: 10/18/2022]
Abstract
Structural genomics projects have revealed structures for a large number of proteins of unknown function. Understanding the interactions between these proteins and their ligands would provide an initial step in their functional characterization. Binding site identification methods are a fast and cost-effective way to facilitate the characterization of functionally important protein regions. In this review we describe our recently developed methods for binding site identification in the context of existing methods. The advantage of energy-based approaches is emphasized, since they provide flexibility in the identification and characterization of different types of binding sites.
Collapse
Affiliation(s)
- Dario Ghersi
- Department of Structural and Chemical Biology, Mount Sinai School of Medicine, 1425 Madison Avenue, New York, NY 10029, USA
| | | |
Collapse
|
22
|
Zhao J, Dundas J, Kachalo S, Ouyang Z, Liang J. Accuracy of functional surfaces on comparatively modeled protein structures. JOURNAL OF STRUCTURAL AND FUNCTIONAL GENOMICS 2011; 12:97-107. [PMID: 21541664 PMCID: PMC3415962 DOI: 10.1007/s10969-011-9109-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/24/2010] [Accepted: 04/20/2011] [Indexed: 12/18/2022]
Abstract
Identification and characterization of protein functional surfaces are important for predicting protein function, understanding enzyme mechanism, and docking small compounds to proteins. As the rapid speed of accumulation of protein sequence information far exceeds that of structures, constructing accurate models of protein functional surfaces and identify their key elements become increasingly important. A promising approach is to build comparative models from sequences using known structural templates such as those obtained from structural genome projects. Here we assess how well this approach works in modeling binding surfaces. By systematically building three-dimensional comparative models of proteins using MODELLER: , we determine how well functional surfaces can be accurately reproduced. We use an alpha shape based pocket algorithm to compute all pockets on the modeled structures, and conduct a large-scale computation of similarity measurements (pocket RMSD and fraction of functional atoms captured) for 26,590 modeled enzyme protein structures. Overall, we find that when the sequence fragment of the binding surfaces has more than 45% identity to that of the template protein, the modeled surfaces have on average an RMSD of 0.5 Å, and contain 48% or more of the binding surface atoms, with nearly all of the important atoms in the signatures of binding pockets captured.
Collapse
Affiliation(s)
- Jieling Zhao
- Department of Bioengineering, University of Illinois at Chicago, 851 S. Morgan Street, Room 218, SEO, MC-063, Chicago, Illinois, 60607
| | - Joe Dundas
- Department of Bioengineering, University of Illinois at Chicago, 851 S. Morgan Street, Room 218, SEO, MC-063, Chicago, Illinois, 60607
| | - Sema Kachalo
- Department of Bioengineering, University of Illinois at Chicago, 851 S. Morgan Street, Room 218, SEO, MC-063, Chicago, Illinois, 60607
| | - Zheng Ouyang
- Department of Bioengineering, University of Illinois at Chicago, 851 S. Morgan Street, Room 218, SEO, MC-063, Chicago, Illinois, 60607
| | - Jie Liang
- Department of Bioengineering, University of Illinois at Chicago, 851 S. Morgan Street, Room 218, SEO, MC-063, Chicago, Illinois, 60607
| |
Collapse
|
23
|
Morita M, Terada T, Nakamura S, Shimizu K. BUDDY-system: A web site for constructing a dataset of protein pairs between ligand-bound and unbound states. BMC Res Notes 2011; 4:143. [PMID: 21600047 PMCID: PMC3124414 DOI: 10.1186/1756-0500-4-143] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2010] [Accepted: 05/22/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Elucidating molecular recognition by proteins, such as in enzyme-substrate and receptor-ligand interactions, is a key to understanding biological phenomena. To delineate these protein interactions, it is important to perform structural bioinformatics studies relevant to molecular recognition. Such studies require a dataset of protein structure pairs between ligand-bound and unbound states. In many studies, the same well-designed and high-quality dataset has been used repeatedly, which has spurred the development of subsequent relevant research. Using previously constructed datasets, researchers are able to fairly compare obtained results with those of other studies; in addition, much effort and time is saved. Therefore, it is important to construct a refined dataset that will appeal to many researchers. However, constructing such datasets is not a trivial task. FINDINGS We have developed the BUDDY-system, a web site designed to support the building of a dataset comprising pairs of protein structures between ligand-bound and unbound states, which are widely used in various areas associated with molecular recognition. In addition to constructing a dataset, the BUDDY-system also allows the user to search for ligand-bound protein structures by its unbound state or by its ligand; and to search for ligands by a particular receptor protein. CONCLUSIONS The BUDDY-system receives input from the user as a single entry or a dataset consisting of a list of ligand-bound state protein structures, unbound state protein structures, or ligands and returns to the user a list of protein structure pairs between the ligand-bound and the corresponding unbound states. This web site is designed for researchers who are involved not only in structural bioinformatics but also in experimental studies. The BUDDY-system is freely available on the web.
Collapse
Affiliation(s)
- Mizuki Morita
- Department of Fundamental Research, National Institute of Biomedical Innovation (NIBIO), 7-6-8 Saito Asagi, Ibaraki, Osaka 567-0085, Japan.
| | | | | | | |
Collapse
|
24
|
Mehio W, Kemp GJ, Taylor P, Walkinshaw MD. Identification of protein binding surfaces using surface triplet propensities. Bioinformatics 2010; 26:2549-55. [DOI: 10.1093/bioinformatics/btq490] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
25
|
Vorobjev YN. Blind docking method combining search of low-resolution binding sites with ligand pose refinement by molecular dynamics-based global optimization. J Comput Chem 2010; 31:1080-92. [PMID: 19821514 DOI: 10.1002/jcc.21394] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
This study describes the development of a new blind hierarchical docking method, bhDock, its implementation, and accuracy assessment. The bhDock method uses two-step algorithm. First, a comprehensive set of low-resolution binding sites is determined by analyzing entire protein surface and ranked by a simple score function. Second, ligand position is determined via a molecular dynamics-based method of global optimization starting from a small set of high ranked low-resolution binding sites. The refinement of the ligand binding pose starts from uniformly distributed multiple initial ligand orientations and uses simulated annealing molecular dynamics coupled with guided force-field deformation of protein-ligand interactions to find the global minimum. Assessment of the bhDock method on the set of 37 protein-ligand complexes has shown the success rate of predictions of 78%, which is better than the rate reported for the most cited docking methods, such as AutoDock, DOCK, GOLD, and FlexX, on the same set of complexes.
Collapse
Affiliation(s)
- Yury N Vorobjev
- Institute of Chemical Biology and Fundamental Medicine of the Siberian Branch of the Russian Academy of Science, Novosibirsk, Russia.
| |
Collapse
|
26
|
Kasahara K, Kinoshita K, Takagi T. Ligand-binding site prediction of proteins based on known fragment-fragment interactions. ACTA ACUST UNITED AC 2010; 26:1493-9. [PMID: 20472546 PMCID: PMC2881410 DOI: 10.1093/bioinformatics/btq232] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Motivation: The identification of putative ligand-binding sites on proteins is important for the prediction of protein function. Knowledge-based approaches using structure databases have become interesting, because of the recent increase in structural information. Approaches using binding motif information are particularly effective. However, they can only be applied to well-known ligands that frequently appear in the structure databases. Results: We have developed a new method for predicting the binding sites of chemically diverse ligands, by using information about the interactions between fragments. The selection of the fragment size is important. If the fragments are too small, then the patterns derived from the binding motifs cannot be used, since they are many-body interactions, while using larger fragments limits the application to well-known ligands. In our method, we used the main and side chains for proteins, and three successive atoms for ligands, as fragments. After superposition of the fragments, our method builds the conformations of ligands and predicts the binding sites. As a result, our method could accurately predict the binding sites of chemically diverse ligands, even though the Protein Data Bank currently contains a large number of nucleotides. Moreover, a further evaluation for the unbound forms of proteins revealed that our building up procedure was robust to conformational changes induced by ligand binding. Availability: Our method, named ‘BUMBLE’, is available at http://bumble.hgc.jp/ Contact:kasahara@cb.k.u-tokyo.ac.jp Supplementary information:Supplementary Material is available at Bioinformatics online.
Collapse
Affiliation(s)
- Kota Kasahara
- Department of Computational Biology, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8568, Japan.
| | | | | |
Collapse
|
27
|
Henrich S, Salo-Ahen OMH, Huang B, Rippmann FF, Cruciani G, Wade RC. Computational approaches to identifying and characterizing protein binding sites for ligand design. J Mol Recognit 2010; 23:209-19. [PMID: 19746440 DOI: 10.1002/jmr.984] [Citation(s) in RCA: 80] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Given the three-dimensional structure of a protein, how can one find the sites where other molecules might bind to it? Do these sites have the properties necessary for high affinity binding? Is this protein a suitable target for drug design? Here, we discuss recent developments in computational methods to address these and related questions. Geometric methods to identify pockets on protein surfaces have been developed over many years but, with new algorithms, their performance is still improving. Simulation methods show promise in accounting for protein conformational variability to identify transient pockets but lack the ease of use of many of the (rigid) shape-based tools. Sequence and structure comparison approaches are benefiting from the constantly increasing size of sequence and structure databases. Energetic methods can aid identification and characterization of binding pockets, and have undergone recent improvements in the treatment of solvation and hydrophobicity. The "druggability" of a binding site is still difficult to predict with an automated procedure. The methodologies available for this purpose range from simple shape and hydrophobicity scores to computationally demanding free energy simulations.
Collapse
Affiliation(s)
- Stefan Henrich
- Molecular and Cellular Modeling Group, EML Research, Schloss-Wolfsbrunnenweg 33, 69118 Heidelberg, Germany
| | | | | | | | | | | |
Collapse
|
28
|
Prediction of calcium-binding sites by combining loop-modeling with machine learning. BMC STRUCTURAL BIOLOGY 2009; 9:72. [PMID: 20003365 PMCID: PMC2808310 DOI: 10.1186/1472-6807-9-72] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/16/2009] [Accepted: 12/11/2009] [Indexed: 01/23/2023]
Abstract
Background Protein ligand-binding sites in the apo state exhibit structural flexibility. This flexibility often frustrates methods for structure-based recognition of these sites because it leads to the absence of electron density for these critical regions, particularly when they are in surface loops. Methods for recognizing functional sites in these missing loops would be useful for recovering additional functional information. Results We report a hybrid approach for recognizing calcium-binding sites in disordered regions. Our approach combines loop modeling with a machine learning method (FEATURE) for structure-based site recognition. For validation, we compared the performance of our method on known calcium-binding sites for which there are both holo and apo structures. When loops in the apo structures are rebuilt using modeling methods, FEATURE identifies 14 out of 20 crystallographically proven calcium-binding sites. It only recognizes 7 out of 20 calcium-binding sites in the initial apo crystal structures. We applied our method to unstructured loops in proteins from SCOP families known to bind calcium in order to discover potential cryptic calcium binding sites. We built 2745 missing loops and evaluated them for potential calcium binding. We made 102 predictions of calcium-binding sites. Ten predictions are consistent with independent experimental verifications. We found indirect experimental evidence for 14 other predictions. The remaining 78 predictions are novel predictions, some with intriguing potential biological significance. In particular, we see an enrichment of beta-sheet folds with predicted calcium binding sites in the connecting loops on the surface that may be important for calcium-mediated function switches. Conclusion Protein crystal structures are a potentially rich source of functional information. When loops are missing in these structures, we may be losing important information about binding sites and active sites. We have shown that limited loop modeling (e.g. loops less than 17 residues) combined with pattern matching algorithms can recover functions and propose putative conformations associated with these functions.
Collapse
|
29
|
Oda A, Yamaotsu N, Hirono S. Evaluation of the searching abilities of HBOP and HBSITE for binding pocket detection. J Comput Chem 2009; 30:2728-37. [DOI: 10.1002/jcc.21299] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
30
|
Kawabata T. Detection of multiscale pockets on protein surfaces using mathematical morphology. Proteins 2009; 78:1195-211. [DOI: 10.1002/prot.22639] [Citation(s) in RCA: 151] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
31
|
Oh M, Joo K, Lee J. Protein-binding site prediction based on three-dimensional protein modeling. Proteins 2009; 77 Suppl 9:152-6. [DOI: 10.1002/prot.22572] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
32
|
Huang B. MetaPocket: A Meta Approach to Improve Protein Ligand Binding Site Prediction. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2009; 13:325-30. [DOI: 10.1089/omi.2009.0045] [Citation(s) in RCA: 290] [Impact Index Per Article: 18.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Affiliation(s)
- Bingding Huang
- EML Research gGmbH, Schloss-Wolfsbrunnenweg 33, 69118, Heidelberg, Germany
| |
Collapse
|