51
|
Abstract
Motivation: Calmodulin (CaM) is a ubiquitously conserved protein that acts as a calcium sensor, and interacts with a large number of proteins. Detection of CaM binding proteins and their interaction sites experimentally requires a significant effort, so accurate methods for their prediction are important. Results: We present a novel algorithm (MI-1 SVM) for binding site prediction and evaluate its performance on a set of CaM-binding proteins extracted from the Calmodulin Target Database. Our approach directly models the problem of binding site prediction as a large-margin classification problem, and is able to take into account uncertainty in binding site location. We show that the proposed algorithm performs better than the standard SVM formulation, and illustrate its ability to recover known CaM binding motifs. A highly accurate cascaded classification approach using the proposed binding site prediction method to predict CaM binding proteins in Arabidopsis thaliana is also presented. Availability: Matlab code for training MI-1 SVM and the cascaded classification approach is available on request. Contact:fayyazafsar@gmail.com or asa@cs.colostate.edu
Collapse
|
52
|
Kysilka J, Vondrášek J. Towards a better understanding of the specificity of protein-protein interaction. J Mol Recognit 2013; 25:604-15. [PMID: 23108620 DOI: 10.1002/jmr.2219] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
In order to predict interaction interface for proteins, it is crucial to identify their characteristic features controlling the interaction process. We present analysis of 69 crystal structures of dimer protein complexes that provides a basis for reasonable description of the phenomenon. Interaction interfaces of two proteins at amino acids level were localized and described in terms of their chemical composition, binding preferences, and residue interaction energies utilizing Amber empirical force field. The characteristic properties of the interaction interface were compared against set of corresponding intramolecular binding parameters for amino acids in proteins. It has been found that geometrically distinct clusters of large hydrophobic amino acids (leucine, valine, isoleucine, and phenylalanine) as well as polar tyrosines and charged arginines are signatures of the protein-protein interaction interface. At some extent, we can generalize that protein-protein interaction (seen through interaction between amino acids) is very similar to the intramolecular arrangement of amino acids, although intermolecular pairs have generally lower interaction energies with their neighbors. Interfaces, therefore, possess high degree of complementarity suggesting also high selectivity of the process. The utilization of our results can improve interface prediction algorithms and improve our understanding of protein-protein recognition.
Collapse
Affiliation(s)
- Jiří Kysilka
- Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Prague, Czech Republic
| | | |
Collapse
|
53
|
Boyen P, Neven F, van Dyck D, Valentim FL, van Dijk ADJ. Mining minimal motif pair sets maximally covering interactions in a protein-protein interaction network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:73-86. [PMID: 23702545 DOI: 10.1109/tcbb.2012.165] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Correlated motif covering (CMC) is the problem of finding a set of motif pairs, i.e., pairs of patterns, in the sequences of proteins from a protein-protein interaction network (PPI-network) that describe the interactions in the network as concisely as possible. In other words, a perfect solution for CMC would be a minimal set of motif pairs that describes the interaction behavior perfectly in the sense that two proteins from the network interact if and only if their sequences match a motif pair in the minimal set. In this paper, we introduce and formally define CMC and show that it is closely related to the red-blue set cover (RBSC) problem and its weighted version (WRBSC)--both well-known NP-hard problems for that there exist several algorithms with known approximation factor guarantees. We prove the hardness of approximation of CMC by providing an approximation factor preserving reduction from RBSC to CMC. We show the existence of a theoretical approximation algorithm for CMC by providing an approximation factor preserving reduction from CMC to WRBSC. We adapt the latter algorithm into a functional heuristic for CMC, called CMC-approx, and experimentally assess its performance and biological relevance. The implementation in Java can be found at >http://bioinformatics.uhasselt.be.
Collapse
Affiliation(s)
- Peter Boyen
- Hasselt University and Transnational University of Limburg, Agoralaan, Diepenbeek, Belgium.
| | | | | | | | | |
Collapse
|
54
|
Structural and functional analysis of multi-interface domains. PLoS One 2012; 7:e50821. [PMID: 23272073 PMCID: PMC3522720 DOI: 10.1371/journal.pone.0050821] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2012] [Accepted: 10/29/2012] [Indexed: 02/03/2023] Open
Abstract
A multi-interface domain is a domain that can shape multiple and distinctive binding sites to contact with many other domains, forming a hub in domain-domain interaction networks. The functions played by the multiple interfaces are usually different, but there is no strict bijection between the functions and interfaces as some subsets of the interfaces play the same function. This work applies graph theory and algorithms to discover fingerprints for the multiple interfaces of a domain and to establish associations between the interfaces and functions, based on a huge set of multi-interface proteins from PDB. We found that about 40% of proteins have the multi-interface property, however the involved multi-interface domains account for only a tiny fraction (1.8%) of the total number of domains. The interfaces of these domains are distinguishable in terms of their fingerprints, indicating the functional specificity of the multiple interfaces in a domain. Furthermore, we observed that both cooperative and distinctive structural patterns, which will be useful for protein engineering, exist in the multiple interfaces of a domain.
Collapse
|
55
|
Jaeger IS, Kretzschmar I, Körner J, Weiser AA, Mahrenholz CC, Potty A, Kourentzi K, Willson RC, Volkmer R, Preissner R. Mapping discontinuous protein-binding sites via structure-based peptide libraries: combiningin silicoandin vitroapproaches. J Mol Recognit 2012; 26:23-31. [DOI: 10.1002/jmr.2237] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2012] [Revised: 08/23/2012] [Accepted: 08/24/2012] [Indexed: 11/09/2022]
Affiliation(s)
- Ines S. Jaeger
- Institute for Physiology, Structural Bioinformatics Group; Charité-Universitätsmedizin Berlin; Lindenberger Weg 80; 13125; Berlin; Germany
| | - Ines Kretzschmar
- Institut für Medizinische Immunologie, Molecular Libraries and Recognition Group; Charité-Universitätsmedizin Berlin; Hessische Strasse 3-4; 10115; Berlin; Germany
| | - Jana Körner
- Leibniz-Institut für Molekulare Pharmakologie im Forschungsverbund Berlin e.V. (FMP); R.-Rössle-Strasse 10; 13125; Berlin; Germany
| | | | - Carsten C. Mahrenholz
- Institut für Medizinische Immunologie, Molecular Libraries and Recognition Group; Charité-Universitätsmedizin Berlin; Hessische Strasse 3-4; 10115; Berlin; Germany
| | | | - Katerina Kourentzi
- University of Houston; Department of Chemical and Biomolecular Engineering; Houston; TX; 77204-4004; USA
| | - Richard C. Willson
- University of Houston; Department of Chemical and Biomolecular Engineering; Houston; TX; 77204-4004; USA
| | - Rudolf Volkmer
- Institut für Medizinische Immunologie, Molecular Libraries and Recognition Group; Charité-Universitätsmedizin Berlin; Hessische Strasse 3-4; 10115; Berlin; Germany
| | - Robert Preissner
- Institute for Physiology, Structural Bioinformatics Group; Charité-Universitätsmedizin Berlin; Lindenberger Weg 80; 13125; Berlin; Germany
| |
Collapse
|
56
|
Zawaira A, Shibayama Y. A simple recipe for the non-expert bioinformaticist for building experimentally-testable hypotheses for proteins with no known homologs. JOURNAL OF STRUCTURAL AND FUNCTIONAL GENOMICS 2012; 13:185-200. [PMID: 22956349 DOI: 10.1007/s10969-012-9141-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/29/2012] [Accepted: 08/08/2012] [Indexed: 06/01/2023]
Abstract
The study of the protein-protein interactions (PPIs) of unique ORFs is a strategy for deciphering the biological roles of unique ORFs of interest. For uniform reference, we define unique ORFs as those for which no matching protein is found after PDB-BLAST search with default parameters. The uniqueness of the ORFs generally precludes the straightforward use of structure-based approaches in the design of experiments to explore PPIs. Many open-source bioinformatics tools, from the commonly-used to the relatively esoteric, have been built and validated to perform analyses and/or predictions of sorts on proteins. How can these available tools be combined into a protocol that helps the non-expert bioinformaticist researcher to design experiments to explore the PPIs of their unique ORF? Here we define a pragmatic protocol based on accessibility of software to achieve this and we make it concrete by applying it on two proteins-the ImuB and ImuA' proteins from Mycobacterium tuberculosis. The protocol is pragmatic in that decisions are made largely based on the availability of easy-to-use freeware. We define the following basic and user-friendly software pathway to build testable PPI hypotheses for a query protein sequence: PSI-PRED → MUSTER → metaPPISP → ASAView and ConSurf. Where possible, other analytical and/or predictive tools may be included. Our protocol combines the software predictions and analyses with general bioinformatics principles to arrive at consensus, prioritised and testable PPI hypotheses.
Collapse
Affiliation(s)
- Alexander Zawaira
- Gene Expression and Biophysics Group, Synthetic Biology, ERA, CSIR Biosciences, Brummeria, Pretoria, South Africa.
| | | |
Collapse
|
57
|
Qin S, Zhou HX. PI 2PE: A Suite of Web Servers for Predictions Ranging From Protein Structure to Binding Kinetics. Biophys Rev 2012; 5:41-46. [PMID: 23526172 DOI: 10.1007/s12551-012-0086-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
PI2PE (http://pipe.sc.fsu.edu) is a suite of four web servers for predicting a variety of folding- and binding-related properties of proteins. These include the solvent accessibility of amino acids upon protein folding, the amino acids forming the interfaces of protein-protein and protein-nucleic acid complexes, and the binding rate constants of these complexes. Three of the servers debuted in 2007, and have garnered ~2,500 unique users and finished over 30,000 jobs. The functionalities of these servers are now enhanced, and a new sever, for predicting the binding rate constants, is added. Together, these web servers form a pipeline from protein sequence to tertiary structure, then to quaternary structure, and finally to binding kinetics.
Collapse
Affiliation(s)
- Sanbo Qin
- Department of Physics and Institute of Molecular Biophysics, Florida State University, Tallahassee, Florida 32306, USA
| | | |
Collapse
|
58
|
Chen P, Wong L, Li J. Detection of outlier residues for improving interface prediction in protein heterocomplexes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:1155-1165. [PMID: 22529331 DOI: 10.1109/tcbb.2012.58] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Sequence-based understanding and identification of protein binding interfaces is a challenging research topic due to the complexity in protein systems and the imbalanced distribution between interface and noninterface residues. This paper presents an outlier detection idea to address the redundancy problem in protein interaction data. The cleaned training data are then used for improving the prediction performance. We use three novel measures to describe the extent a residue is considered as an outlier in comparison to the other residues: the distance of a residue instance from the center instance of all residue instances of the same class label (Dist), the probability of the class label of the residue instance (PCL), and the importance of within-class and between-class (IWB) residue instances. Outlier scores are computed by integrating the three factors; instances with a sufficiently large score are treated as outliers and removed. The data sets without outliers are taken as input for a support vector machine (SVM) ensemble. The proposed SVM ensemble trained on input data without outliers performs better than that with outliers. Our method is also more accurate than many literature methods on benchmark data sets. From our empirical studies, we found that some outlier interface residues are truly near to noninterface regions, and some outlier noninterface residues are close to interface regions.
Collapse
Affiliation(s)
- Peng Chen
- Institute of Intelligent Machines, Chinese Academy of Sciences, PO Box 1130, Hefei 230031, China.
| | | | | |
Collapse
|
59
|
De Ingeniis J, Kazanov MD, Shatalin K, Gelfand MS, Osterman AL, Sorci L. Glutamine versus ammonia utilization in the NAD synthetase family. PLoS One 2012; 7:e39115. [PMID: 22720044 PMCID: PMC3376133 DOI: 10.1371/journal.pone.0039115] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2012] [Accepted: 05/16/2012] [Indexed: 11/18/2022] Open
Abstract
NAD is a ubiquitous and essential metabolic redox cofactor which also functions as a substrate in certain regulatory pathways. The last step of NAD synthesis is the ATP-dependent amidation of deamido-NAD by NAD synthetase (NADS). Members of the NADS family are present in nearly all species across the three kingdoms of Life. In eukaryotic NADS, the core synthetase domain is fused with a nitrilase-like glutaminase domain supplying ammonia for the reaction. This two-domain NADS arrangement enabling the utilization of glutamine as nitrogen donor is also present in various bacterial lineages. However, many other bacterial members of NADS family do not contain a glutaminase domain, and they can utilize only ammonia (but not glutamine) in vitro. A single-domain NADS is also characteristic for nearly all Archaea, and its dependence on ammonia was demonstrated here for the representative enzyme from Methanocaldococcus jannaschi. However, a question about the actual in vivo nitrogen donor for single-domain members of the NADS family remained open: Is it glutamine hydrolyzed by a committed (but yet unknown) glutaminase subunit, as in most ATP-dependent amidotransferases, or free ammonia as in glutamine synthetase? Here we addressed this dilemma by combining evolutionary analysis of the NADS family with experimental characterization of two representative bacterial systems: a two-subunit NADS from Thermus thermophilus and a single-domain NADS from Salmonella typhimurium providing evidence that ammonia (and not glutamine) is the physiological substrate of a typical single-domain NADS. The latter represents the most likely ancestral form of NADS. The ability to utilize glutamine appears to have evolved via recruitment of a glutaminase subunit followed by domain fusion in an early branch of Bacteria. Further evolution of the NADS family included lineage-specific loss of one of the two alternative forms and horizontal gene transfer events. Lastly, we identified NADS structural elements associated with glutamine-utilizing capabilities.
Collapse
Affiliation(s)
- Jessica De Ingeniis
- Sanford-Burnham Medical Research Institute, La Jolla, California, United States of America
| | - Marat D. Kazanov
- Sanford-Burnham Medical Research Institute, La Jolla, California, United States of America
- A. A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia
| | - Konstantin Shatalin
- Department of Biochemistry, New York University School of Medicine, New York, United States of America
| | - Mikhail S. Gelfand
- A. A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia
- Faculty of Bioengineering and Bioinformatics, M.V. Lomonosov Moscow State University, Moscow, Russia
| | - Andrei L. Osterman
- Sanford-Burnham Medical Research Institute, La Jolla, California, United States of America
- * E-mail: (LS); (ALO)
| | - Leonardo Sorci
- Sanford-Burnham Medical Research Institute, La Jolla, California, United States of America
- Department of Clinical Sciences, Section of Biochemistry, Polytechnic University of Marche, Ancona, Italy
- * E-mail: (LS); (ALO)
| |
Collapse
|
60
|
Chen CT, Peng HP, Jian JW, Tsai KC, Chang JY, Yang EW, Chen JB, Ho SY, Hsu WL, Yang AS. Protein-protein interaction site predictions with three-dimensional probability distributions of interacting atoms on protein surfaces. PLoS One 2012; 7:e37706. [PMID: 22701576 PMCID: PMC3368894 DOI: 10.1371/journal.pone.0037706] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2011] [Accepted: 04/23/2012] [Indexed: 11/18/2022] Open
Abstract
Protein-protein interactions are key to many biological processes. Computational methodologies devised to predict protein-protein interaction (PPI) sites on protein surfaces are important tools in providing insights into the biological functions of proteins and in developing therapeutics targeting the protein-protein interaction sites. One of the general features of PPI sites is that the core regions from the two interacting protein surfaces are complementary to each other, similar to the interior of proteins in packing density and in the physicochemical nature of the amino acid composition. In this work, we simulated the physicochemical complementarities by constructing three-dimensional probability density maps of non-covalent interacting atoms on the protein surfaces. The interacting probabilities were derived from the interior of known structures. Machine learning algorithms were applied to learn the characteristic patterns of the probability density maps specific to the PPI sites. The trained predictors for PPI sites were cross-validated with the training cases (consisting of 432 proteins) and were tested on an independent dataset (consisting of 142 proteins). The residue-based Matthews correlation coefficient for the independent test set was 0.423; the accuracy, precision, sensitivity, specificity were 0.753, 0.519, 0.677, and 0.779 respectively. The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces. In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence. The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.
Collapse
Affiliation(s)
- Ching-Tai Chen
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
- Institute of Bioinformatics and Systems Biology, National Chiao-Tung University, Hsinchu, Taiwan
- Institute of Information Science, Academia Sinica, Taipei, Taiwan
- Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan
| | - Hung-Pin Peng
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan
- Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan
| | - Jhih-Wei Jian
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan
- Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan
| | | | - Jeng-Yih Chang
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Ei-Wen Yang
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
- Institute of Information Science, Academia Sinica, Taipei, Taiwan
| | - Jun-Bo Chen
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
- Department of Computer Science, National Tsing-Hua University, Hsinchu, Taiwan
| | - Shinn-Ying Ho
- Institute of Bioinformatics and Systems Biology, National Chiao-Tung University, Hsinchu, Taiwan
| | - Wen-Lian Hsu
- Institute of Information Science, Academia Sinica, Taipei, Taiwan
- * E-mail: (AY); (WH)
| | - An-Suei Yang
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
- * E-mail: (AY); (WH)
| |
Collapse
|
61
|
Talavera D, Williams SG, Norris MG, Robertson DL, Lovell SC. Evolvability of Yeast Protein–Protein Interaction Interfaces. J Mol Biol 2012; 419:387-96. [DOI: 10.1016/j.jmb.2012.03.021] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2011] [Revised: 03/24/2012] [Accepted: 03/27/2012] [Indexed: 01/27/2023]
|
62
|
Structural characterization of the PliG lysozyme inhibitor family. J Struct Biol 2012; 180:235-42. [PMID: 22634186 DOI: 10.1016/j.jsb.2012.05.006] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2012] [Revised: 05/08/2012] [Accepted: 05/15/2012] [Indexed: 11/22/2022]
Abstract
Several Gram-negative bacteria protect themselves against the lytic action of host lysozymes by producing specific proteinaceous inhibitors. So far, four different families of lysozyme inhibitors have been identified including Ivy (Inhibitor of vertebrate lysozyme), MliC/PliC (Membrane associated/periplasmic inhibitor of C-type lysozyme), PliI and PliG (periplasmic inhibitors of I- and G-type lysozymes, respectively). Here we provide the first crystallographic description of the PliG family. Crystal structures were obtained for the PliG homologues from Escherichia coli, Salmonella enterica serotype Typhimurium and Aeromonas hydrophila. These structures show that the fold of the PliG family is very distinct from that of all other families of lysozyme inhibitors. Small-angle X-ray scattering studies reveal that PliG is monomeric in solution as opposed to the dimeric PliC and PliI. The PliG family shares a highly conserved SG(x)xY sequence motif with the MliC/PliC and PliI families where it was shown to reside on a loop that blocks the active site of lysozyme leading to inhibition. Surprisingly, we found that in PliG this motif is not well exposed and not involved in the inhibitory action. Instead, we could identify a distinct cluster of surface residues that are conserved across the PliG family and are essential for efficient G-type lysozyme inhibition, as evidenced by mutagenesis studies.
Collapse
|
63
|
Qin S, Zhou HX. Structural models of protein-DNA complexes based on interface prediction and docking. Curr Protein Pept Sci 2012; 12:531-9. [PMID: 21787304 DOI: 10.2174/138920311796957694] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2011] [Revised: 04/01/2011] [Accepted: 05/04/2011] [Indexed: 11/22/2022]
Abstract
Protein-DNA interactions are the physical basis of gene expression and DNA modification. Structural models that reveal these interactions are essential for their understanding. As only a limited number of structures for protein-DNA complexes have been determined by experimental methods, computation methods provide a potential way to fill the need. We have developed the DISPLAR method to predict DNA binding sites on proteins. Predicted binding sites have been used to assist the building of structural models by docking, either by guiding the docking or by selecting near-native candidates from the docked poses. Here we applied the DISPLAR method to predict the DNA binding sites for 20 DNA-binding proteins, which have had their DNA binding sites characterized by NMR chemical shift perturbation. For two of these proteins, the structures of their complexes with DNA have also been determined. With the help of the DISPLAR predictions, we built structural models for these two complexes. Evaluations of both the DNA binding sites for 20 proteins and the structural models of the two protein-DNA complexes against experimental results demonstrate the significant promise of our model-building approach.
Collapse
Affiliation(s)
- Sanbo Qin
- Department of Physics and Institute of Molecular Biophysics, Florida State University, Tallahassee, FL 32306, USA
| | | |
Collapse
|
64
|
Arbitrary protein-protein docking targets biologically relevant interfaces. BMC BIOPHYSICS 2012; 5:7. [PMID: 22559010 PMCID: PMC3441232 DOI: 10.1186/2046-1682-5-7] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/06/2012] [Accepted: 04/11/2012] [Indexed: 11/10/2022]
Abstract
BACKGROUND Protein-protein recognition is of fundamental importance in the vast majority of biological processes. However, it has already been demonstrated that it is very hard to distinguish true complexes from false complexes in so-called cross-docking experiments, where binary protein complexes are separated and the isolated proteins are all docked against each other and scored. Does this result, at least in part, reflect a physical reality? False complexes could reflect possible nonspecific or weak associations. RESULTS In this paper, we investigate the twilight zone of protein-protein interactions, building on an interesting outcome of cross-docking experiments: false complexes seem to favor residues from the true interaction site, suggesting that randomly chosen partners dock in a non-random fashion on protein surfaces. Here, we carry out arbitrary docking of a non-redundant data set of 198 proteins, with more than 300 randomly chosen "probe" proteins. We investigate the tendency of arbitrary partners to aggregate at localized regions of the protein surfaces, the shape and compositional bias of the generated interfaces, and the potential of this property to predict biologically relevant binding sites. We show that the non-random localization of arbitrary partners after protein-protein docking is a generic feature of protein structures. The interfaces generated in this way are not systematically planar or curved, but tend to be closer than average to the center of the proteins. These results can be used to predict biological interfaces with an AUC value up to 0.69 alone, and 0.72 when used in combination with evolutionary information. An appropriate choice of random partners and number of docking models make this method computationally practical. It is also noted that nonspecific interfaces can point to alternate interaction sites in the case of proteins with multiple interfaces. We illustrate the usefulness of arbitrary docking using PEBP (Phosphatidylethanolamine binding protein), a kinase inhibitor with multiple partners. CONCLUSIONS An approach using arbitrary docking, and based solely on physical properties, can successfully identify biologically pertinent protein interfaces.
Collapse
|
65
|
The distribution of ligand-binding pockets around protein-protein interfaces suggests a general mechanism for pocket formation. Proc Natl Acad Sci U S A 2012; 109:3784-9. [PMID: 22355140 DOI: 10.1073/pnas.1117768109] [Citation(s) in RCA: 75] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Protein-protein and protein-ligand interactions are ubiquitous in a biological cell. Here, we report a comprehensive study of the distribution of protein-ligand interaction sites, namely ligand-binding pockets, around protein-protein interfaces where protein-protein interactions occur. We inspected a representative set of 1,611 representative protein-protein complexes and identified pockets with a potential for binding small molecule ligands. The majority of these pockets are within a 6 Å distance from protein interfaces. Accordingly, in about half of ligand-bound protein-protein complexes, amino acids from both sides of a protein interface are involved in direct contacts with at least one ligand. Statistically, ligands are closer to a protein-protein interface than a random surface patch of the same solvent accessible surface area. Similar results are obtained in an analysis of the ligand distribution around domain-domain interfaces of 1,416 nonredundant, two-domain protein structures. Furthermore, comparable sized pockets as observed in experimental structures are present in artificially generated protein complexes, suggesting that the prominent appearance of pockets around protein interfaces is mainly a structural consequence of protein packing and thus, is an intrinsic geometric feature of protein structure. Nature may take advantage of such a structural feature by selecting and further optimizing for biological function. We propose that packing nearby protein-protein or domain-domain interfaces is a major route to the formation of ligand-binding pockets.
Collapse
|
66
|
Sinha R, Kundrotas PJ, Vakser IA. Protein docking by the interface structure similarity: how much structure is needed? PLoS One 2012; 7:e31349. [PMID: 22348074 PMCID: PMC3278447 DOI: 10.1371/journal.pone.0031349] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2011] [Accepted: 01/08/2012] [Indexed: 11/19/2022] Open
Abstract
The increasing availability of co-crystallized protein-protein complexes provides an opportunity to use template-based modeling for protein-protein docking. Structure alignment techniques are useful in detection of remote target-template similarities. The size of the structure involved in the alignment is important for the success in modeling. This paper describes a systematic large-scale study to find the optimal definition/size of the interfaces for the structure alignment-based docking applications. The results showed that structural areas corresponding to the cutoff values <12 Å across the interface inadequately represent structural details of the interfaces. With the increase of the cutoff beyond 12 Å, the success rate for the benchmark set of 99 protein complexes, did not increase significantly for higher accuracy models, and decreased for lower-accuracy models. The 12 Å cutoff was optimal in our interface alignment-based docking, and a likely best choice for the large-scale (e.g., on the scale of the entire genome) applications to protein interaction networks. The results provide guidelines for the docking approaches, including high-throughput applications to modeled structures.
Collapse
Affiliation(s)
- Rohita Sinha
- Center for Bioinformatics, The University of Kansas, Lawrence, Kansas, United States of America
| | - Petras J. Kundrotas
- Center for Bioinformatics, The University of Kansas, Lawrence, Kansas, United States of America
- * E-mail: (PJK); (IAV)
| | - Ilya A. Vakser
- Center for Bioinformatics, The University of Kansas, Lawrence, Kansas, United States of America
- Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas, United States of America
- * E-mail: (PJK); (IAV)
| |
Collapse
|
67
|
Li B, Kihara D. Protein docking prediction using predicted protein-protein interface. BMC Bioinformatics 2012; 13:7. [PMID: 22233443 PMCID: PMC3287255 DOI: 10.1186/1471-2105-13-7] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2011] [Accepted: 01/10/2012] [Indexed: 11/10/2022] Open
Abstract
Background Many important cellular processes are carried out by protein complexes. To provide physical pictures of interacting proteins, many computational protein-protein prediction methods have been developed in the past. However, it is still difficult to identify the correct docking complex structure within top ranks among alternative conformations. Results We present a novel protein docking algorithm that utilizes imperfect protein-protein binding interface prediction for guiding protein docking. Since the accuracy of protein binding site prediction varies depending on cases, the challenge is to develop a method which does not deteriorate but improves docking results by using a binding site prediction which may not be 100% accurate. The algorithm, named PI-LZerD (using Predicted Interface with Local 3D Zernike descriptor-based Docking algorithm), is based on a pair wise protein docking prediction algorithm, LZerD, which we have developed earlier. PI-LZerD starts from performing docking prediction using the provided protein-protein binding interface prediction as constraints, which is followed by the second round of docking with updated docking interface information to further improve docking conformation. Benchmark results on bound and unbound cases show that PI-LZerD consistently improves the docking prediction accuracy as compared with docking without using binding site prediction or using the binding site prediction as post-filtering. Conclusion We have developed PI-LZerD, a pairwise docking algorithm, which uses imperfect protein-protein binding interface prediction to improve docking accuracy. PI-LZerD consistently showed better prediction accuracy over alternative methods in the series of benchmark experiments including docking using actual docking interface site predictions as well as unbound docking cases.
Collapse
Affiliation(s)
- Bin Li
- Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA
| | | |
Collapse
|
68
|
Schneider S, Zacharias M. Scoring optimisation of unbound protein-protein docking including protein binding site predictions. J Mol Recognit 2011; 25:15-23. [DOI: 10.1002/jmr.1165] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
- Sebastian Schneider
- Physik-Department T38; Technische Universität München; James Franck Str. 1; 85748; Garching; Germany
| | - Martin Zacharias
- Physik-Department T38; Technische Universität München; James Franck Str. 1; 85748; Garching; Germany
| |
Collapse
|
69
|
Zhou HX. Intrinsic disorder: signaling via highly specific but short-lived association. Trends Biochem Sci 2011; 37:43-8. [PMID: 22154231 DOI: 10.1016/j.tibs.2011.11.002] [Citation(s) in RCA: 116] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2011] [Revised: 11/01/2011] [Accepted: 11/04/2011] [Indexed: 01/22/2023]
Abstract
Association between signaling proteins and their cellular targets is generally thought to be highly specific (implicating a high association constant, K(a)) and, at the same time, transient or short-lived (corresponding to a high dissociation rate constant, k(d)). However, a combination of high K(a) and high k(d) would lead to a high association rate constant (k(a) = K(a)k(d)), which poses a problem because there is a limit to which k(a) can be increased, set by the diffusional approach to form the complex. In this Opinion article, I propose that having the signaling protein disordered before binding to the target provides a way out of this quandary. The intrinsic disorder of the signaling protein would decrease K(a) without sacrificing the specificity of the complex, and thus would allow k(d) to be increased to a range appropriate for signaling.
Collapse
Affiliation(s)
- Huan-Xiang Zhou
- Department of Physics and Institute of Molecular Biophysics, Florida State University, Tallahassee, FL 32306, USA.
| |
Collapse
|
70
|
Zellner H, Staudigel M, Trenner T, Bittkowski M, Wolowski V, Icking C, Merkl R. Prescont: Predicting protein-protein interfaces utilizing four residue properties. Proteins 2011; 80:154-68. [DOI: 10.1002/prot.23172] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2011] [Revised: 08/18/2011] [Accepted: 08/29/2011] [Indexed: 12/26/2022]
|
71
|
Qiu Z, Wang X. Prediction of protein-protein interaction sites using patch-based residue characterization. J Theor Biol 2011; 293:143-50. [PMID: 22037062 DOI: 10.1016/j.jtbi.2011.10.021] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2011] [Revised: 09/13/2011] [Accepted: 10/15/2011] [Indexed: 10/15/2022]
Abstract
Identifying protein-protein interaction sites provides important clues to the function of a protein and is becoming increasingly relevant in topics such as systems biology and drug discovery. Using a patch-based model for residue characterization, we trained random forest classifiers for residue-based interface prediction, which was followed by a clustering procedure to produce patches for patch-based interface prediction. For residue-based interface prediction, our method achieves a specificity rate of 0.7 and a sensitivity rate of 0.78. For patch-based interface prediction, a success rate of 0.80 is achieved. Based on same datasets, we also compare it with several published methods. The results show that our method is a successful predictor for residue-based and patch-based interface prediction.
Collapse
Affiliation(s)
- Zhijun Qiu
- The State Key Laboratory of Structural Analysis of Industrial Equipment, Dalian University of Technology, 2 Ling-Gong Road, Dalian 116024, China
| | | |
Collapse
|
72
|
Characterization of protein-protein interaction interfaces from a single species. PLoS One 2011; 6:e21053. [PMID: 21738603 PMCID: PMC3124478 DOI: 10.1371/journal.pone.0021053] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2011] [Accepted: 05/18/2011] [Indexed: 01/07/2023] Open
Abstract
Most proteins attain their biological functions through specific interactions with other proteins. Thus, the study of protein-protein interactions and the interfaces that mediate these interactions is of prime importance for the understanding of biological function. In particular the precise determinants of binding specificity and their contributions to binding energy within protein interfaces are not well understood. In order to better understand these determinants an appropriate description of the interaction surface is needed. Available data from the yeast Saccharomyces cerevisiae allow us to focus on a single species and to use all the available structures, correcting for redundancy, instead of using structural representatives. This allows us to control for potentially confounding factors that may affect sequence propensities. We find a significant contribution of main-chain atoms to protein-protein interactions. These include interactions both with other main-chain and side-chain atoms on the interacting chain. We find that the type of interaction depends on both amino acid and secondary structure type involved in the contact. For example, residues in α-helices and large amino acids are the most likely to be involved in interactions through their side-chain atoms. We find an intriguing homogeneity when calculating the average solvation energy of different areas of the protein surface. Unexpectedly, homo- and hetero-complexes have quite similar results for all analyses. Our findings demonstrate that the manner in which protein-protein interactions are formed is determined by the residue type and the secondary structure found in the interface. However the homogeneity of the desolvation energy despite heterogeneity of interface properties suggests a complex relationship between interface composition and binding energy.
Collapse
|
73
|
Xue LC, Dobbs D, Honavar V. HomPPI: a class of sequence homology based protein-protein interface prediction methods. BMC Bioinformatics 2011; 12:244. [PMID: 21682895 PMCID: PMC3213298 DOI: 10.1186/1471-2105-12-244] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2010] [Accepted: 06/17/2011] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND Although homology-based methods are among the most widely used methods for predicting the structure and function of proteins, the question as to whether interface sequence conservation can be effectively exploited in predicting protein-protein interfaces has been a subject of debate. RESULTS We studied more than 300,000 pair-wise alignments of protein sequences from structurally characterized protein complexes, including both obligate and transient complexes. We identified sequence similarity criteria required for accurate homology-based inference of interface residues in a query protein sequence.Based on these analyses, we developed HomPPI, a class of sequence homology-based methods for predicting protein-protein interface residues. We present two variants of HomPPI: (i) NPS-HomPPI (Non partner-specific HomPPI), which can be used to predict interface residues of a query protein in the absence of knowledge of the interaction partner; and (ii) PS-HomPPI (Partner-specific HomPPI), which can be used to predict the interface residues of a query protein with a specific target protein.Our experiments on a benchmark dataset of obligate homodimeric complexes show that NPS-HomPPI can reliably predict protein-protein interface residues in a given protein, with an average correlation coefficient (CC) of 0.76, sensitivity of 0.83, and specificity of 0.78, when sequence homologs of the query protein can be reliably identified. NPS-HomPPI also reliably predicts the interface residues of intrinsically disordered proteins. Our experiments suggest that NPS-HomPPI is competitive with several state-of-the-art interface prediction servers including those that exploit the structure of the query proteins. The partner-specific classifier, PS-HomPPI can, on a large dataset of transient complexes, predict the interface residues of a query protein with a specific target, with a CC of 0.65, sensitivity of 0.69, and specificity of 0.70, when homologs of both the query and the target can be reliably identified. The HomPPI web server is available at http://homppi.cs.iastate.edu/. CONCLUSIONS Sequence homology-based methods offer a class of computationally efficient and reliable approaches for predicting the protein-protein interface residues that participate in either obligate or transient interactions. For query proteins involved in transient interactions, the reliability of interface residue prediction can be improved by exploiting knowledge of putative interaction partners.
Collapse
Affiliation(s)
- Li C Xue
- Department of Computer Science, Iowa State University, Ames, IA 50011, USA.
| | | | | |
Collapse
|
74
|
Wass MN, David A, Sternberg MJE. Challenges for the prediction of macromolecular interactions. Curr Opin Struct Biol 2011; 21:382-90. [DOI: 10.1016/j.sbi.2011.03.013] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2010] [Revised: 03/04/2011] [Accepted: 03/24/2011] [Indexed: 12/14/2022]
|
75
|
Zhang C, Lai L. SDOCK: a global protein-protein docking program using stepwise force-field potentials. J Comput Chem 2011; 32:2598-612. [PMID: 21618559 DOI: 10.1002/jcc.21839] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2011] [Revised: 03/24/2011] [Accepted: 04/16/2011] [Indexed: 11/10/2022]
Abstract
Fast Fourier transform (FFT) method limits the forms of scoring functions in global protein-protein docking. On the other hand, force field potentials can effectively describe the energy hyper surface of biological macromolecules. In this study, we developed a new protein-protein docking program, SDOCK, that incorporates van der Waals attractive potential, geometric collision, screened electrostatic potential, and Lazaridis-Karplus desolvation energy into the scoring function in the global searching process. Stepwise potentials were generated from the corresponding continuous forms to treat the structure flexibility. After optimization of the atom solvation parameters and the weights of different potential terms based on a new docking test set that contains 142 cases with small or moderate conformational changes upon binding, SDOCK slightly outperformed the well-known FFT based global docking program ZDOCK3.0. Among the 142 cases tested, 52.8% gave at least one near-native solutions in the top 100 solutions. SDOCK was also tested on six blind testing cases in Critical Assessment of Predicted Interactions rounds 13 to 18. In all six cases, the near-native solutions could be found within the top 350 solutions. Because the SDOCK approach performs global docking based on force-field potentials, one of its advantages is that it provides global binding free energy surface profiles for further analysis. The efficiency of the program is also comparable with that of other FFT based protein-protein docking programs. SDOCK is available for noncommercial applications at http://mdl.ipc.pku.edu.cn/cgi-bin/down.cgi.
Collapse
Affiliation(s)
- Changsheng Zhang
- Beijing National Laboratory for Molecular Sciences, State Key Laboratory for Structural Chemistry of Unstable and Stable Species, College of Chemistry and Molecular engineering, Peking University, Beijing, China
| | | |
Collapse
|
76
|
Zhang QC, Deng L, Fisher M, Guan J, Honig B, Petrey D. PredUs: a web server for predicting protein interfaces using structural neighbors. Nucleic Acids Res 2011; 39:W283-7. [PMID: 21609948 PMCID: PMC3125747 DOI: 10.1093/nar/gkr311] [Citation(s) in RCA: 86] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We describe PredUs, an interactive web server for the prediction of protein-protein interfaces. Potential interfacial residues for a query protein are identified by 'mapping' contacts from known interfaces of the query protein's structural neighbors to surface residues of the query. We calculate a score for each residue to be interfacial with a support vector machine. Results can be visualized in a molecular viewer and a number of interactive features allow users to tailor a prediction to a particular hypothesis. The PredUs server is available at: http://wiki.c2b2.columbia.edu/honiglab_public/index.php/Software:PredUs.
Collapse
Affiliation(s)
- Qiangfeng Cliff Zhang
- Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics, Howard Hughes Medical Institute, Columbia University, 1130 St. Nicholas Avenue, Room 815, New York, NY 10032, USA
| | | | | | | | | | | |
Collapse
|
77
|
Laage D, Stirnemann G, Sterpone F, Rey R, Hynes JT. Reorientation and Allied Dynamics in Water and Aqueous Solutions. Annu Rev Phys Chem 2011; 62:395-416. [DOI: 10.1146/annurev.physchem.012809.103503] [Citation(s) in RCA: 271] [Impact Index Per Article: 19.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Damien Laage
- Department of Chemistry, Ecole Normale Supérieure, UMR ENS-CNRS-UPMC 8640, 75005 Paris, France;
| | - Guillaume Stirnemann
- Department of Chemistry, Ecole Normale Supérieure, UMR ENS-CNRS-UPMC 8640, 75005 Paris, France;
| | - Fabio Sterpone
- Department of Chemistry, Ecole Normale Supérieure, UMR ENS-CNRS-UPMC 8640, 75005 Paris, France;
| | - Rossend Rey
- Departament de Física i Enginyeria Nuclear, Universitat Politècnica de Catalunya, Barcelona 08034, Spain;
| | - James T. Hynes
- Department of Chemistry, Ecole Normale Supérieure, UMR ENS-CNRS-UPMC 8640, 75005 Paris, France;
- Department of Chemistry and Biochemistry, University of Colorado, Boulder, Colorado 80309-0215;
| |
Collapse
|
78
|
Fernández‐Recio J. Prediction of protein binding sites and hot spots. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2011. [DOI: 10.1002/wcms.45] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
79
|
de Vries SJ, Bonvin AMJJ. CPORT: a consensus interface predictor and its performance in prediction-driven docking with HADDOCK. PLoS One 2011; 6:e17695. [PMID: 21464987 PMCID: PMC3064578 DOI: 10.1371/journal.pone.0017695] [Citation(s) in RCA: 245] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2010] [Accepted: 02/08/2011] [Indexed: 11/19/2022] Open
Abstract
Background Macromolecular complexes are the molecular machines of the cell. Knowledge at the atomic level is essential to understand and influence their function. However, their number is huge and a significant fraction is extremely difficult to study using classical structural methods such as NMR and X-ray crystallography. Therefore, the importance of large-scale computational approaches in structural biology is evident. This study combines two of these computational approaches, interface prediction and docking, to obtain atomic-level structures of protein-protein complexes, starting from their unbound components. Methodology/Principal Findings Here we combine six interface prediction web servers into a consensus method called CPORT (Consensus Prediction Of interface Residues in Transient complexes). We show that CPORT gives more stable and reliable predictions than each of the individual predictors on its own. A protocol was developed to integrate CPORT predictions into our data-driven docking program HADDOCK. For cases where experimental information is limited, this prediction-driven docking protocol presents an alternative to ab initio docking, the docking of complexes without the use of any information. Prediction-driven docking was performed on a large and diverse set of protein-protein complexes in a blind manner. Our results indicate that the performance of the HADDOCK-CPORT combination is competitive with ZDOCK-ZRANK, a state-of-the-art ab initio docking/scoring combination. Finally, the original interface predictions could be further improved by interface post-prediction (contact analysis of the docking solutions). Conclusions/Significance The current study shows that blind, prediction-driven docking using CPORT and HADDOCK is competitive with ab initio docking methods. This is encouraging since prediction-driven docking represents the absolute bottom line for data-driven docking: any additional biological knowledge will greatly improve the results obtained by prediction-driven docking alone. Finally, the fact that original interface predictions could be further improved by interface post-prediction suggests that prediction-driven docking has not yet been pushed to the limit. A web server for CPORT is freely available at http://haddock.chem.uu.nl/services/CPORT.
Collapse
Affiliation(s)
- Sjoerd J de Vries
- Faculty of Science, Bijvoet Center for Biomolecular Research, Utrecht University, Utrecht, The Netherlands.
| | | |
Collapse
|
80
|
Monji H, Koizumi S, Ozaki T, Ohkawa T. Interaction site prediction by structural similarity to neighboring clusters in protein-protein interaction networks. BMC Bioinformatics 2011; 12 Suppl 1:S39. [PMID: 21342570 PMCID: PMC3044295 DOI: 10.1186/1471-2105-12-s1-s39] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Recently, revealing the function of proteins with protein-protein interaction (PPI) networks is regarded as one of important issues in bioinformatics. With the development of experimental methods such as the yeast two-hybrid method, the data of protein interaction have been increasing extremely. Many databases dealing with these data comprehensively have been constructed and applied to analyzing PPI networks. However, few research on prediction interaction sites using both PPI networks and the 3D protein structures complementarily has explored. RESULTS We propose a method of predicting interaction sites in proteins with unknown function by using both of PPI networks and protein structures. For a protein with unknown function as a target, several clusters are extracted from the neighboring proteins based on their structural similarity. Then, interaction sites are predicted by extracting similar sites from the group of a protein cluster and the target protein. Moreover, the proposed method can improve the prediction accuracy by introducing repetitive prediction process. CONCLUSIONS The proposed method has been applied to small scale dataset, then the effectiveness of the method has been confirmed. The challenge will now be to apply the method to large-scale datasets.
Collapse
Affiliation(s)
- Hiroyuki Monji
- Graduate School of System Informatics, Kobe University, Rokkodai, Nada, Kobe 657-8501, Japan.
| | | | | | | |
Collapse
|
81
|
Abstract
Peptide-protein interactions are prevalent in the living cell and form a key component of the overall protein-protein interaction network. These interactions are drawing increasing interest due to their part in signaling and regulation, and are thus attractive targets for computational structural modeling. Here we report an overview of current techniques for the high resolution modeling of peptide-protein complexes. We dissect this complicated challenge into several smaller subproblems, namely: modeling the receptor protein, predicting the peptide binding site, sampling an initial peptide backbone conformation and the final refinement of the peptide within the receptor binding site. For each of these conceptual stages, we present available tools, approaches, and their reported performance. We summarize with an illustrative example of this process, highlighting the success and current challenges still facing the automated blind modeling of peptide-protein interactions. We believe that the upcoming years will see considerable progress in our ability to create accurate models of peptide-protein interactions, with applications in binding-specificity prediction, rational design of peptide-mediated interactions and the usage of peptides as therapeutic agents.
Collapse
Affiliation(s)
- Nir London
- Department of Microbiology and Molecular Genetics, Institute for Medical Research Israel-Canada, Hadassah Medical School, The Hebrew University, Jerusalem, Israel
| | | | | |
Collapse
|
82
|
Lateral acquisition of genes is affected by the friendliness of their products. Proc Natl Acad Sci U S A 2010; 108:343-8. [PMID: 21149709 DOI: 10.1073/pnas.1009775108] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
A major factor in the evolution of microbial genomes is the lateral acquisition of genes that evolved under the functional constraints of other species. Integration of foreign genes into a genome that has different components and circuits poses an evolutionary challenge. Moreover, genes belonging to complex modules in the pretransfer species are unlikely to maintain their functionality when transferred alone to new species. Thus, it is widely accepted that lateral gene transfer favors proteins with only a few protein-protein interactions. The propensity of proteins to participate in protein-protein interactions can be assessed using computational methods that identify putative interaction sites on the protein. Here we report that laterally acquired proteins contain significantly more putative interaction sites than native proteins. Thus, genes encoding proteins with multiple protein-protein interactions may in fact be more prone to transfer than genes with fewer interactions. We suggest that these proteins have a greater chance of forming new interactions in new species, thus integrating into existing modules. These results reveal basic principles for the incorporation of novel genes into existing systems.
Collapse
|
83
|
Davis FP. Proteome-wide prediction of overlapping small molecule and protein binding sites using structure. MOLECULAR BIOSYSTEMS 2010; 7:545-57. [PMID: 21103609 DOI: 10.1039/c0mb00200c] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Small molecules that modulate protein-protein interactions are of great interest for chemical biology and therapeutics. Here I present a structure-based approach to predict 'bi-functional' sites able to bind both small molecule ligands and proteins, in proteins of unknown structure. First, I develop a homology-based annotation method that transfers binding sites of known three-dimensional structure onto protein sequences, predicting residues in ligand and protein binding sites with estimated true positive rates of 98% and 88%, respectively, at 1% false positive rates. Applying this method to the human proteome predicts 8463 proteins with bi-functional residues and correctly recovers the targets of known interaction modulators. Proteins with significantly (p < 0.01) more bi-functional residues than expected were found to be enriched in regulatory and depleted in metabolism functions. Finally, I demonstrate the utility of the method by describing examples of predicted overlap and evidence of their biological and therapeutic relevance. The results suggest that combining the structures of known binding sites with established fold detection algorithms can predict regions of protein-protein interfaces that are amenable to small molecule modulation. Open-source software and the results for several complete proteomes are available at http://pibase.janelia.org/homolobind.
Collapse
Affiliation(s)
- Fred P Davis
- Howard Hughes Medical Institute, Janelia Farm Research Campus, 19700 Helix Dr, Ashburn, VA 20147, USA.
| |
Collapse
|
84
|
Gong X, Liu B, Chang S, Li C, Chen W, Wang C. A holistic molecular docking approach for predicting protein-protein complex structure. SCIENCE CHINA-LIFE SCIENCES 2010; 53:1152-61. [PMID: 21104376 DOI: 10.1007/s11427-010-4050-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/25/2009] [Accepted: 09/22/2009] [Indexed: 10/18/2022]
Abstract
A holistic protein-protein molecular docking approach, HoDock, was established, composed of such steps as binding site prediction, initial complex structure sampling, refined complex structure sampling, structure clustering, scoring and final structure selection. This article explains the detailed steps and applications for CAPRI Target 39. The CAPRI result showed that three predicted binding site residues, A191HIS, B512ARG and B531ARG, were correct, and there were five submitted structures with a high fraction of correct receptor-ligand interface residues, indicating that this docking approach may improve prediction accuracy for protein-protein complex structures.
Collapse
Affiliation(s)
- XinQi Gong
- College of Life Science and Bioengineering, Beijing University of Technology, Beijing 100124, China
| | | | | | | | | | | |
Collapse
|
85
|
Launay G, Simonson T. A large decoy set of protein-protein complexes produced by flexible docking. J Comput Chem 2010; 32:106-20. [DOI: 10.1002/jcc.21604] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
86
|
Lensink MF, Wodak SJ. Blind predictions of protein interfaces by docking calculations in CAPRI. Proteins 2010; 78:3085-95. [DOI: 10.1002/prot.22850] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|
87
|
Nekrasov AN, Zinchenko AA. Structural Features of the Interfaces in Enzyme-Inhibitor Complexes. J Biomol Struct Dyn 2010; 28:85-96. [DOI: 10.1080/07391102.2010.10507345] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
88
|
Murakami Y, Mizuguchi K. Applying the Naïve Bayes classifier with kernel density estimation to the prediction of protein-protein interaction sites. ACTA ACUST UNITED AC 2010; 26:1841-8. [PMID: 20529890 DOI: 10.1093/bioinformatics/btq302] [Citation(s) in RCA: 161] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION The limited availability of protein structures often restricts the functional annotation of proteins and the identification of their protein-protein interaction sites. Computational methods to identify interaction sites from protein sequences alone are, therefore, required for unraveling the functions of many proteins. This article describes a new method (PSIVER) to predict interaction sites, i.e. residues binding to other proteins, in protein sequences. Only sequence features (position-specific scoring matrix and predicted accessibility) are used for training a Naïve Bayes classifier (NBC), and conditional probabilities of each sequence feature are estimated using a kernel density estimation method (KDE). RESULTS The leave-one out cross-validation of PSIVER achieved a Matthews correlation coefficient (MCC) of 0.151, an F-measure of 35.3%, a precision of 30.6% and a recall of 41.6% on a non-redundant set of 186 protein sequences extracted from 105 heterodimers in the Protein Data Bank (consisting of 36 219 residues, of which 15.2% were known interface residues). Even though the dataset used for training was highly imbalanced, a randomization test demonstrated that the proposed method managed to avoid overfitting. PSIVER was also tested on 72 sequences not used in training (consisting of 18 140 residues, of which 10.6% were known interface residues), and achieved an MCC of 0.135, an F-measure of 31.5%, a precision of 25.0% and a recall of 46.5%, outperforming other publicly available servers tested on the same dataset. PSIVER enables experimental biologists to identify potential interface residues in unknown proteins from sequence information alone, and to mutate those residues selectively in order to unravel protein functions. AVAILABILITY Freely available on the web at http://tardis.nibio.go.jp/PSIVER/
Collapse
|
89
|
Abstract
With the advent of Systems Biology, the prediction of whether two proteins form a complex has become a problem of increased importance. A variety of experimental techniques have been applied to the problem, but three-dimensional structural information has not been widely exploited. Here we explore the range of applicability of such information by analyzing the extent to which the location of binding sites on protein surfaces is conserved among structural neighbors. We find, as expected, that interface conservation is most significant among proteins that have a clear evolutionary relationship, but that there is a significant level of conservation even among remote structural neighbors. This finding is consistent with recent evidence that information available from structural neighbors, independent of classification, should be exploited in the search for functional insights. The value of such structural information is highlighted through the development of a new protein interface prediction method, PredUs, that identifies what residues on protein surfaces are likely to participate in complexes with other proteins. The performance of PredUs, as measured through comparisons with other methods, suggests that relationships across protein structure space can be successfully exploited in the prediction of protein-protein interactions.
Collapse
|
90
|
Capra JA, Laskowski RA, Thornton JM, Singh M, Funkhouser TA. Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure. PLoS Comput Biol 2009; 5:e1000585. [PMID: 19997483 PMCID: PMC2777313 DOI: 10.1371/journal.pcbi.1000585] [Citation(s) in RCA: 302] [Impact Index Per Article: 18.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2009] [Accepted: 10/30/2009] [Indexed: 11/20/2022] Open
Abstract
Identifying a protein's functional sites is an important step towards characterizing its molecular function. Numerous structure- and sequence-based methods have been developed for this problem. Here we introduce ConCavity, a small molecule binding site prediction algorithm that integrates evolutionary sequence conservation estimates with structure-based methods for identifying protein surface cavities. In large-scale testing on a diverse set of single- and multi-chain protein structures, we show that ConCavity substantially outperforms existing methods for identifying both 3D ligand binding pockets and individual ligand binding residues. As part of our testing, we perform one of the first direct comparisons of conservation-based and structure-based methods. We find that the two approaches provide largely complementary information, which can be combined to improve upon either approach alone. We also demonstrate that ConCavity has state-of-the-art performance in predicting catalytic sites and drug binding pockets. Overall, the algorithms and analysis presented here significantly improve our ability to identify ligand binding sites and further advance our understanding of the relationship between evolutionary sequence conservation and structural and functional attributes of proteins. Data, source code, and prediction visualizations are available on the ConCavity web site (http://compbio.cs.princeton.edu/concavity/). Protein molecules are ubiquitous in the cell; they perform thousands of functions crucial for life. Proteins accomplish nearly all of these functions by interacting with other molecules. These interactions are mediated by specific amino acid positions in the proteins. Knowledge of these “functional sites” is crucial for understanding the molecular mechanisms by which proteins carry out their functions; however, functional sites have not been identified in the vast majority of proteins. Here, we present ConCavity, a computational method that predicts small molecule binding sites in proteins by combining analysis of evolutionary sequence conservation and protein 3D structure. ConCavity provides significant improvement over previous approaches, especially on large, multi-chain proteins. In contrast to earlier methods which only predict entire binding sites, ConCavity makes specific predictions of positions in space that are likely to overlap ligand atoms and of residues that are likely to contact bound ligands. These predictions can be used to aid computational function prediction, to guide experimental protein analysis, and to focus computationally intensive techniques used in drug discovery.
Collapse
Affiliation(s)
- John A. Capra
- Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Roman A. Laskowski
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Janet M. Thornton
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
- * E-mail: (MS); (TAF)
| | - Thomas A. Funkhouser
- Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America
- * E-mail: (MS); (TAF)
| |
Collapse
|
91
|
Liu B, Wang X, Lin L, Tang B, Dong Q, Wang X. Prediction of protein binding sites in protein structures using hidden Markov support vector machine. BMC Bioinformatics 2009; 10:381. [PMID: 19925685 PMCID: PMC2785799 DOI: 10.1186/1471-2105-10-381] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2009] [Accepted: 11/20/2009] [Indexed: 01/08/2023] Open
Abstract
Background Predicting the binding sites between two interacting proteins provides important clues to the function of a protein. Recent research on protein binding site prediction has been mainly based on widely known machine learning techniques, such as artificial neural networks, support vector machines, conditional random field, etc. However, the prediction performance is still too low to be used in practice. It is necessary to explore new algorithms, theories and features to further improve the performance. Results In this study, we introduce a novel machine learning model hidden Markov support vector machine for protein binding site prediction. The model treats the protein binding site prediction as a sequential labelling task based on the maximum margin criterion. Common features derived from protein sequences and structures, including protein sequence profile and residue accessible surface area, are used to train hidden Markov support vector machine. When tested on six data sets, the method based on hidden Markov support vector machine shows better performance than some state-of-the-art methods, including artificial neural networks, support vector machines and conditional random field. Furthermore, its running time is several orders of magnitude shorter than that of the compared methods. Conclusion The improved prediction performance and computational efficiency of the method based on hidden Markov support vector machine can be attributed to the following three factors. Firstly, the relation between labels of neighbouring residues is useful for protein binding site prediction. Secondly, the kernel trick is very advantageous to this field. Thirdly, the complexity of the training step for hidden Markov support vector machine is linear with the number of training samples by using the cutting-plane algorithm.
Collapse
Affiliation(s)
- Bin Liu
- Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, PR China.
| | | | | | | | | | | |
Collapse
|
92
|
Liang S, Zheng D, Zhang C, Zacharias M. Prediction of antigenic epitopes on protein surfaces by consensus scoring. BMC Bioinformatics 2009; 10:302. [PMID: 19772615 PMCID: PMC2761409 DOI: 10.1186/1471-2105-10-302] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2009] [Accepted: 09/22/2009] [Indexed: 12/05/2022] Open
Abstract
Background Prediction of antigenic epitopes on protein surfaces is important for vaccine design. Most existing epitope prediction methods focus on protein sequences to predict continuous epitopes linear in sequence. Only a few structure-based epitope prediction algorithms are available and they have not yet shown satisfying performance. Results We present a new antigen Epitope Prediction method, which uses ConsEnsus Scoring (EPCES) from six different scoring functions - residue epitope propensity, conservation score, side-chain energy score, contact number, surface planarity score, and secondary structure composition. Applied to unbounded antigen structures from an independent test set, EPCES was able to predict antigenic eptitopes with 47.8% sensitivity, 69.5% specificity and an AUC value of 0.632. The performance of the method is statistically similar to other published methods. The AUC value of EPCES is slightly higher compared to the best results of existing algorithms by about 0.034. Conclusion Our work shows consensus scoring of multiple features has a better performance than any single term. The successful prediction is also due to the new score of residue epitope propensity based on atomic solvent accessibility.
Collapse
Affiliation(s)
- Shide Liang
- School of Engineering and Science, Jacobs University Bremen, Campus Ring 1, D-28759 Bremen, Germany
| | | | | | | |
Collapse
|
93
|
Giard J, Ambroise J, Gala JL, Macq B. Regression applied to protein binding site prediction and comparison with classification. BMC Bioinformatics 2009; 10:276. [PMID: 19728868 PMCID: PMC2749839 DOI: 10.1186/1471-2105-10-276] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2009] [Accepted: 09/03/2009] [Indexed: 11/13/2022] Open
Abstract
Background The structural genomics centers provide hundreds of protein structures of unknown function. Therefore, developing methods enabling the determination of a protein function automatically is imperative. The determination of a protein function can be achieved by studying the network of its physical interactions. In this context, identifying a potential binding site between proteins is of primary interest. In the literature, methods for predicting a potential binding site location generally are based on classification tools. The aim of this paper is to show that regression tools are more efficient than classification tools for patches based binding site predictors. For this purpose, we developed a patches based binding site localization method usable with either regression or classification tools. Results We compared predictive performances of regression tools with performances of machine learning classifiers. Using leave-one-out cross-validation, we showed that regression tools provide better predictions than classification ones. Among regression tools, Multilayer Perceptron ranked highest in the quality of predictions. We compared also the predictive performance of our patches based method using Multilayer Perceptron with the performance of three other methods usable through a web server. Our method performed similarly to the other methods. Conclusion Regression is more efficient than classification when applied to our binding site localization method. When it is possible, using regression instead of classification for other existing binding site predictors will probably improve results. Furthermore, the method presented in this work is flexible because the size of the predicted binding site is adjustable. This adaptability is useful when either false positive or negative rates have to be limited.
Collapse
Affiliation(s)
- Joachim Giard
- Communications and Remote Sensing Laboratory, Université Catholique de Louvain, Place du Levant 2, 1348 Louvain-la-Neuve, Belgium.
| | | | | | | |
Collapse
|
94
|
Exploiting three kinds of interface propensities to identify protein binding sites. Comput Biol Chem 2009; 33:303-11. [DOI: 10.1016/j.compbiolchem.2009.07.001] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2008] [Revised: 06/22/2009] [Accepted: 07/01/2009] [Indexed: 11/21/2022]
|
95
|
Improved Prediction of Protein Binding Sites from Sequences Using Genetic Algorithm. Protein J 2009; 28:273-80. [DOI: 10.1007/s10930-009-9192-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
96
|
Wang C, Cheng J, Su S. Prediction of interacting protein pairs from sequence using a Bayesian method. Protein J 2009; 28:111-5. [PMID: 19194789 DOI: 10.1007/s10930-009-9170-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
With the development of bioinformatics, more and more protein sequence information has become available. Meanwhile, the number of known protein-protein interactions (PPIs) is still very limited. In this article, we propose a new method for predicting interacting protein pairs using a Bayesian method based on a new feature representation. We trained our model using data on 6,459 PPI pairs from the yeast Saccharomyces cerevisiae core subset. Using six species of DIP database, our model demonstrates an average prediction accuracy of 93.67%. The result showed that our method is superior to other methods in both computing time and prediction accuracy.
Collapse
Affiliation(s)
- Chishe Wang
- Key Laboratory of Intelligent Computing & Signal Processing, Ministry of Education, AnHui University, 230039, Hefei, China.
| | | | | |
Collapse
|
97
|
Liang S, Meroueh SO, Wang G, Qiu C, Zhou Y. Consensus scoring for enriching near-native structures from protein-protein docking decoys. Proteins 2009; 75:397-403. [PMID: 18831053 PMCID: PMC2656599 DOI: 10.1002/prot.22252] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
The identification of near native protein-protein complexes among a set of decoys remains highly challenging. A strategy for improving the success rate of near native detection is to enrich near native docking decoys in a small number of top ranked decoys. Recently, we found that a combination of three scoring functions (energy, conservation, and interface propensity) can predict the location of binding interface regions with reasonable accuracy. Here, these three scoring functions are modified and combined into a consensus scoring function called ENDES for enriching near native docking decoys. We found that all individual scores result in enrichment for the majority of 28 targets in ZDOCK2.3 decoy set and the 22 targets in Benchmark 2.0. Among the three scores, the interface propensity score yields the highest enrichment in both sets of protein complexes. When these scores are combined into the ENDES consensus score, a significant increase in enrichment of near-native structures is found. For example, when 2000 dock decoys are reduced to 200 decoys by ENDES, the fraction of near-native structures in docking decoys increases by a factor of about six in average. ENDES was implemented into a computer program that is available for download at http://sparks.informatics.iupui.edu.
Collapse
Affiliation(s)
- Shide Liang
- Indiana University School of Informatics, Indiana University-Purdue University, Indianapolis, Indiana 46202, USA
- Department of Biochemistry and Molecular Biology and Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana 46202, USA
- Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, P. R. China
| | - Samy O. Meroueh
- Department of Biochemistry and Molecular Biology and Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana 46202, USA
| | - Guangce Wang
- Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, P. R. China
| | - Chao Qiu
- Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, P. R. China
| | - Yaoqi Zhou
- Indiana University School of Informatics, Indiana University-Purdue University, Indianapolis, Indiana 46202, USA
- Department of Biochemistry and Molecular Biology and Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana 46202, USA
| |
Collapse
|
98
|
Ezkurdia I, Bartoli L, Fariselli P, Casadio R, Valencia A, Tress ML. Progress and challenges in predicting protein-protein interaction sites. Brief Bioinform 2009; 10:233-46. [PMID: 19346321 DOI: 10.1093/bib/bbp021] [Citation(s) in RCA: 113] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The identification of protein-protein interaction sites is an essential intermediate step for mutant design and the prediction of protein networks. In recent years a significant number of methods have been developed to predict these interface residues and here we review the current status of the field. Progress in this area requires a clear view of the methodology applied, the data sets used for training and testing the systems, and the evaluation procedures. We have analysed the impact of a representative set of features and algorithms and highlighted the problems inherent in generating reliable protein data sets and in the posterior analysis of the results. Although it is clear that there have been some improvements in methods for predicting interacting sites, several major bottlenecks remain. Proteins in complexes are still under-represented in the structural databases and in particular many proteins involved in transient complexes are still to be crystallized. We provide suggestions for effective feature selection, and make it clear that community standards for testing, training and performance measures are necessary for progress in the field.
Collapse
Affiliation(s)
- Iakes Ezkurdia
- Centro Nacional de Biotechnolgia, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | | | | | | | | | | |
Collapse
|
99
|
Tuncbag N, Kar G, Keskin O, Gursoy A, Nussinov R. A survey of available tools and web servers for analysis of protein-protein interactions and interfaces. Brief Bioinform 2009; 10:217-32. [PMID: 19240123 DOI: 10.1093/bib/bbp001] [Citation(s) in RCA: 98] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
The unanimous agreement that cellular processes are (largely) governed by interactions between proteins has led to enormous community efforts culminating in overwhelming information relating to these proteins; to the regulation of their interactions, to the way in which they interact and to the function which is determined by these interactions. These data have been organized in databases and servers. However, to make these really useful, it is essential not only to be aware of these, but in particular to have a working knowledge of which tools to use for a given problem; what are the tool advantages and drawbacks; and no less important how to combine these for a particular goal since usually it is not one tool, but some combination of tool-modules that is needed. This is the goal of this review.
Collapse
Affiliation(s)
- Nurcan Tuncbag
- Computational Sciences and Engineering Program at Koc University, Istanbul, Turkey
| | | | | | | | | |
Collapse
|
100
|
Park SH, Reyes JA, Gilbert DR, Kim JW, Kim S. Prediction of protein-protein interaction types using association rule based classification. BMC Bioinformatics 2009; 10:36. [PMID: 19173748 PMCID: PMC2667511 DOI: 10.1186/1471-2105-10-36] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2008] [Accepted: 01/28/2009] [Indexed: 11/10/2022] Open
Abstract
Background Protein-protein interactions (PPI) can be classified according to their characteristics into, for example obligate or transient interactions. The identification and characterization of these PPI types may help in the functional annotation of new protein complexes and in the prediction of protein interaction partners by knowledge driven approaches. Results This work addresses pattern discovery of the interaction sites for four different interaction types to characterize and uses them for the prediction of PPI types employing Association Rule Based Classification (ARBC) which includes association rule generation and posterior classification. We incorporated domain information from protein complexes in SCOP proteins and identified 354 domain-interaction sites. 14 interface properties were calculated from amino acid and secondary structure composition and then used to generate a set of association rules characterizing these domain-interaction sites employing the APRIORI algorithm. Our results regarding the classification of PPI types based on a set of discovered association rules shows that the discriminative ability of association rules can significantly impact on the prediction power of classification models. We also showed that the accuracy of the classification can be improved through the use of structural domain information and also the use of secondary structure content. Conclusion The advantage of our approach is that we can extract biologically significant information from the interpretation of the discovered association rules in terms of understandability and interpretability of rules. A web application based on our method can be found at
Collapse
Affiliation(s)
- Sung Hee Park
- Department of Bioinformatics & Life Science, Soongsil University, Seoul, Korea.
| | | | | | | | | |
Collapse
|