151
|
Mészáros B, Simon I, Dosztányi Z. The expanding view of protein-protein interactions: complexes involving intrinsically disordered proteins. Phys Biol 2011; 8:035003. [PMID: 21572179 DOI: 10.1088/1478-3975/8/3/035003] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
A frequently neglected aspect of protein-protein interactions is flexibility. Small-scale fluctuations are present even in globular proteins, and alternative conformations can have a significant influence on the binding process. However, flexibility becomes highly prominent in complexes involving intrinsically disordered proteins. The importance of disordered regions in protein interactions has been recognized only relatively recently. In this survey we examine the basic properties of the complexes of disordered and ordered proteins from three different directions. The comparison of the interface properties shows that although disordered proteins can also adopt well-defined conformations in their bound form, their inherently dynamic nature is cast into their complexes. Furthermore, an overview of prediction methods indicates that disordered proteins as well as their binding regions can be recognized from the amino acid sequence by capturing the basic biophysical properties of these segments. Finally, we propose the generalization of the 'energy landscape model' for the description of complex formation that can help to put the various types of protein associations on a common ground.
Collapse
Affiliation(s)
- Bálint Mészáros
- Institute of Enzymology, Hungarian Academy of Sciences, PO Box 7, H-1518 Budapest, Hungary
| | | | | |
Collapse
|
152
|
Fernández‐Recio J. Prediction of protein binding sites and hot spots. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2011. [DOI: 10.1002/wcms.45] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
153
|
Chennamsetty N, Voynov V, Kayser V, Helk B, Trout BL. Prediction of protein binding regions. Proteins 2010; 79:888-97. [DOI: 10.1002/prot.22926] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2010] [Revised: 09/23/2010] [Accepted: 10/13/2010] [Indexed: 11/07/2022]
|
154
|
Lateral acquisition of genes is affected by the friendliness of their products. Proc Natl Acad Sci U S A 2010; 108:343-8. [PMID: 21149709 DOI: 10.1073/pnas.1009775108] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
A major factor in the evolution of microbial genomes is the lateral acquisition of genes that evolved under the functional constraints of other species. Integration of foreign genes into a genome that has different components and circuits poses an evolutionary challenge. Moreover, genes belonging to complex modules in the pretransfer species are unlikely to maintain their functionality when transferred alone to new species. Thus, it is widely accepted that lateral gene transfer favors proteins with only a few protein-protein interactions. The propensity of proteins to participate in protein-protein interactions can be assessed using computational methods that identify putative interaction sites on the protein. Here we report that laterally acquired proteins contain significantly more putative interaction sites than native proteins. Thus, genes encoding proteins with multiple protein-protein interactions may in fact be more prone to transfer than genes with fewer interactions. We suggest that these proteins have a greater chance of forming new interactions in new species, thus integrating into existing modules. These results reveal basic principles for the incorporation of novel genes into existing systems.
Collapse
|
155
|
Chen P, Li J. Sequence-based identification of interface residues by an integrative profile combining hydrophobic and evolutionary information. BMC Bioinformatics 2010; 11:402. [PMID: 20667087 PMCID: PMC2921408 DOI: 10.1186/1471-2105-11-402] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2010] [Accepted: 07/28/2010] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Protein-protein interactions play essential roles in protein function determination and drug design. Numerous methods have been proposed to recognize their interaction sites, however, only a small proportion of protein complexes have been successfully resolved due to the high cost. Therefore, it is important to improve the performance for predicting protein interaction sites based on primary sequence alone. RESULTS We propose a new idea to construct an integrative profile for each residue in a protein by combining its hydrophobic and evolutionary information. A support vector machine (SVM) ensemble is then developed, where SVMs train on different pairs of positive (interface sites) and negative (non-interface sites) subsets. The subsets having roughly the same sizes are grouped in the order of accessible surface area change before and after complexation. A self-organizing map (SOM) technique is applied to group similar input vectors to make more accurate the identification of interface residues. An ensemble of ten-SVMs achieves an MCC improvement by around 8% and F1 improvement by around 9% over that of three-SVMs. As expected, SVM ensembles constantly perform better than individual SVMs. In addition, the model by the integrative profiles outperforms that based on the sequence profile or the hydropathy scale alone. As our method uses a small number of features to encode the input vectors, our model is simpler, faster and more accurate than the existing methods. CONCLUSIONS The integrative profile by combining hydrophobic and evolutionary information contributes most to the protein-protein interaction prediction. Results show that evolutionary context of residue with respect to hydrophobicity makes better the identification of protein interface residues. In addition, the ensemble of SVM classifiers improves the prediction performance. AVAILABILITY Datasets and software are available at http://mail.ustc.edu.cn/~bigeagle/BMCBioinfo2010/index.htm.
Collapse
Affiliation(s)
- Peng Chen
- Bioinformatics Research Center, School of Computer Engineering, Nanyang Technological University, 639798 Singapore
| | | |
Collapse
|
156
|
Murakami Y, Mizuguchi K. Applying the Naïve Bayes classifier with kernel density estimation to the prediction of protein-protein interaction sites. ACTA ACUST UNITED AC 2010; 26:1841-8. [PMID: 20529890 DOI: 10.1093/bioinformatics/btq302] [Citation(s) in RCA: 161] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION The limited availability of protein structures often restricts the functional annotation of proteins and the identification of their protein-protein interaction sites. Computational methods to identify interaction sites from protein sequences alone are, therefore, required for unraveling the functions of many proteins. This article describes a new method (PSIVER) to predict interaction sites, i.e. residues binding to other proteins, in protein sequences. Only sequence features (position-specific scoring matrix and predicted accessibility) are used for training a Naïve Bayes classifier (NBC), and conditional probabilities of each sequence feature are estimated using a kernel density estimation method (KDE). RESULTS The leave-one out cross-validation of PSIVER achieved a Matthews correlation coefficient (MCC) of 0.151, an F-measure of 35.3%, a precision of 30.6% and a recall of 41.6% on a non-redundant set of 186 protein sequences extracted from 105 heterodimers in the Protein Data Bank (consisting of 36 219 residues, of which 15.2% were known interface residues). Even though the dataset used for training was highly imbalanced, a randomization test demonstrated that the proposed method managed to avoid overfitting. PSIVER was also tested on 72 sequences not used in training (consisting of 18 140 residues, of which 10.6% were known interface residues), and achieved an MCC of 0.135, an F-measure of 31.5%, a precision of 25.0% and a recall of 46.5%, outperforming other publicly available servers tested on the same dataset. PSIVER enables experimental biologists to identify potential interface residues in unknown proteins from sequence information alone, and to mutate those residues selectively in order to unravel protein functions. AVAILABILITY Freely available on the web at http://tardis.nibio.go.jp/PSIVER/
Collapse
|
157
|
Ozbek P, Soner S, Erman B, Haliloglu T. DNABINDPROT: fluctuation-based predictor of DNA-binding residues within a network of interacting residues. Nucleic Acids Res 2010; 38:W417-23. [PMID: 20478828 PMCID: PMC2896127 DOI: 10.1093/nar/gkq396] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
DNABINDPROT is designed to predict DNA-binding residues, based on the fluctuations of residues in high-frequency modes by the Gaussian network model. The residue pairs that display high mean-square distance fluctuations are analyzed with respect to DNA binding, which are then filtered with their evolutionary conservation profiles and ranked according to their DNA-binding propensities. If the analyses are based on the exact outcome of fluctuations in the highest mode, using a conservation threshold of 5, the results have a sensitivity, specificity, precision and accuracy of 9.3%, 90.5%, 18.1% and 78.6%, respectively, on a dataset of 36 unbound–bound protein structure pairs. These values increase up to 24.3%, 93.4%, 45.3% and 83.3% for the respective cases, when the neighboring two residues are considered. The relatively low sensitivity appears with the identified residues being selective and susceptible more for the binding core residues rather than all DNA-binding residues. The predicted residues that are not tagged as DNA-binding residues are those whose fluctuations are coupled with DNA-binding sites. They are in close proximity as well as plausible for other functional residues, such as ligand and protein–protein interaction sites. DNABINDPROT is free and open to all users without login requirement available at: http://www.prc.boun.edu.tr/appserv/prc/dnabindprot/.
Collapse
Affiliation(s)
- Pemra Ozbek
- Department of Chemical Engineering and Polymer Research Center, Bogazici University, Bebek, 34342 Istanbul
| | | | | | | |
Collapse
|
158
|
Gromiha MM, Yokota K, Fukui K. Energy based approach for understanding the recognition mechanism in protein-protein complexes. MOLECULAR BIOSYSTEMS 2010; 5:1779-86. [PMID: 19593470 DOI: 10.1039/b904161n] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Protein-protein interactions play an essential role in the regulation of various cellular processes. Understanding the recognition mechanism of protein-protein complexes is a challenging task in molecular and computational biology. In this work, we have developed an energy based approach for identifying the binding sites and important residues for binding in protein-protein complexes. The new approach is different from the traditional distance based contacts in which the repulsive interactions are treated as binding sites as well as the contacts within a specific cutoff have been treated in the same way. We found that the residues and residue-pairs with charged and aromatic side chains are important for binding. These residues influence to form cation-, electrostatic and aromatic interactions. Our observation has been verified with the experimental binding specificity of protein-protein complexes and found good agreement with experiments. Based on these results we have proposed a novel mechanism for the recognition of protein-protein complexes: the charged and aromatic residues in receptor and ligand initiate recognition by making suitable interactions between them; the neighboring hydrophobic residues assist the stability of complex along with other hydrogen bonding partners by the polar residues. Further, the propensity of residues in the binding sites of receptors and ligands, atomic contributions and the influence on secondary structure will be discussed.
Collapse
Affiliation(s)
- M Michael Gromiha
- Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, 2-42 Aomi, Koto-ku, Tokyo 135-0064, Japan.
| | | | | |
Collapse
|
159
|
|
160
|
Gromiha MM, Yokota K, Fukui K. Sequence and structural analysis of binding site residues in protein-protein complexes. Int J Biol Macromol 2009; 46:187-92. [PMID: 20026105 DOI: 10.1016/j.ijbiomac.2009.11.009] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2009] [Revised: 11/23/2009] [Accepted: 11/24/2009] [Indexed: 12/24/2022]
Abstract
The binding sites in protein-protein complexes have been identified with different methods including atomic contacts, reduction in solvent accessibility and interaction energy between the interacting partners. In our earlier work, we have developed an energy-based criteria for identifying the binding sites in protein-protein complexes, which showed that the interacting residues are different from that obtained with distance-based methods. In this work, we analyzed the binding site residues based on sequence and structural properties, such as, neighboring residues, secondary structure, solvent accessibility, conservation of residues, medium and long-range contacts and surrounding hydrophobicity. Our results showed that the neighboring residues of binding sites in proteins and ligands are different from each other although the interacting pairs of residues have a common behavior. The analysis on surrounding hydrophobicity reveals that the binding residues are less hydrophobic than non-binding sites, which suggests that the hydrophobic core are important for folding and stability whereas the surface seeking residues play a critical role in binding. This tendency has been verified with the number of contacts in binding sites. In addition, the binding site residues are highly conserved compared with non-binding residues. We suggest that the incorporation of sequence and structure-based features may improve the prediction accuracy of binding sites in protein-protein complexes.
Collapse
Affiliation(s)
- M Michael Gromiha
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan.
| | | | | |
Collapse
|
161
|
Dosztanyi Z, Meszaros B, Simon I. Bioinformatical approaches to characterize intrinsically disordered/unstructured proteins. Brief Bioinform 2009; 11:225-43. [DOI: 10.1093/bib/bbp061] [Citation(s) in RCA: 93] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
162
|
Liu B, Wang X, Lin L, Tang B, Dong Q, Wang X. Prediction of protein binding sites in protein structures using hidden Markov support vector machine. BMC Bioinformatics 2009; 10:381. [PMID: 19925685 PMCID: PMC2785799 DOI: 10.1186/1471-2105-10-381] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2009] [Accepted: 11/20/2009] [Indexed: 01/08/2023] Open
Abstract
Background Predicting the binding sites between two interacting proteins provides important clues to the function of a protein. Recent research on protein binding site prediction has been mainly based on widely known machine learning techniques, such as artificial neural networks, support vector machines, conditional random field, etc. However, the prediction performance is still too low to be used in practice. It is necessary to explore new algorithms, theories and features to further improve the performance. Results In this study, we introduce a novel machine learning model hidden Markov support vector machine for protein binding site prediction. The model treats the protein binding site prediction as a sequential labelling task based on the maximum margin criterion. Common features derived from protein sequences and structures, including protein sequence profile and residue accessible surface area, are used to train hidden Markov support vector machine. When tested on six data sets, the method based on hidden Markov support vector machine shows better performance than some state-of-the-art methods, including artificial neural networks, support vector machines and conditional random field. Furthermore, its running time is several orders of magnitude shorter than that of the compared methods. Conclusion The improved prediction performance and computational efficiency of the method based on hidden Markov support vector machine can be attributed to the following three factors. Firstly, the relation between labels of neighbouring residues is useful for protein binding site prediction. Secondly, the kernel trick is very advantageous to this field. Thirdly, the complexity of the training step for hidden Markov support vector machine is linear with the number of training samples by using the cutting-plane algorithm.
Collapse
Affiliation(s)
- Bin Liu
- Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, PR China.
| | | | | | | | | | | |
Collapse
|
163
|
Using Support Vector Machine Combined with Post-processing Procedure to Improve Prediction of Interface Residues in Transient Complexes. Protein J 2009; 28:369-74. [DOI: 10.1007/s10930-009-9203-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
164
|
Hu J, Yan C. A tool for calculating binding-site residues on proteins from PDB structures. BMC STRUCTURAL BIOLOGY 2009; 9:52. [PMID: 19650927 PMCID: PMC2728722 DOI: 10.1186/1472-6807-9-52] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2009] [Accepted: 08/03/2009] [Indexed: 11/24/2022]
Abstract
Background In the research on protein functional sites, researchers often need to identify binding-site residues on a protein. A commonly used strategy is to find a complex structure from the Protein Data Bank (PDB) that consists of the protein of interest and its interacting partner(s) and calculate binding-site residues based on the complex structure. However, since a protein may participate in multiple interactions, the binding-site residues calculated based on one complex structure usually do not reveal all binding sites on a protein. Thus, this requires researchers to find all PDB complexes that contain the protein of interest and combine the binding-site information gleaned from them. This process is very time-consuming. Especially, combing binding-site information obtained from different PDB structures requires tedious work to align protein sequences. The process becomes overwhelmingly difficult when researchers have a large set of proteins to analyze, which is usually the case in practice. Results In this study, we have developed a tool for calculating binding-site residues on proteins, TCBRP . For an input protein, TCBRP can quickly find all binding-site residues on the protein by automatically combining the information obtained from all PDB structures that consist of the protein of interest. Additionally, TCBRP presents the binding-site residues in different categories according to the interaction type. TCBRP also allows researchers to set the definition of binding-site residues. Conclusion The developed tool is very useful for the research on protein binding site analysis and prediction.
Collapse
Affiliation(s)
- Jing Hu
- Department of Computer Science, Utah State University, Logan, UT, USA.
| | | |
Collapse
|
165
|
Ezkurdia I, Bartoli L, Fariselli P, Casadio R, Valencia A, Tress ML. Progress and challenges in predicting protein-protein interaction sites. Brief Bioinform 2009; 10:233-46. [PMID: 19346321 DOI: 10.1093/bib/bbp021] [Citation(s) in RCA: 113] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The identification of protein-protein interaction sites is an essential intermediate step for mutant design and the prediction of protein networks. In recent years a significant number of methods have been developed to predict these interface residues and here we review the current status of the field. Progress in this area requires a clear view of the methodology applied, the data sets used for training and testing the systems, and the evaluation procedures. We have analysed the impact of a representative set of features and algorithms and highlighted the problems inherent in generating reliable protein data sets and in the posterior analysis of the results. Although it is clear that there have been some improvements in methods for predicting interacting sites, several major bottlenecks remain. Proteins in complexes are still under-represented in the structural databases and in particular many proteins involved in transient complexes are still to be crystallized. We provide suggestions for effective feature selection, and make it clear that community standards for testing, training and performance measures are necessary for progress in the field.
Collapse
Affiliation(s)
- Iakes Ezkurdia
- Centro Nacional de Biotechnolgia, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | | | | | | | | | | |
Collapse
|
166
|
Tuncbag N, Kar G, Keskin O, Gursoy A, Nussinov R. A survey of available tools and web servers for analysis of protein-protein interactions and interfaces. Brief Bioinform 2009; 10:217-32. [PMID: 19240123 DOI: 10.1093/bib/bbp001] [Citation(s) in RCA: 98] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
The unanimous agreement that cellular processes are (largely) governed by interactions between proteins has led to enormous community efforts culminating in overwhelming information relating to these proteins; to the regulation of their interactions, to the way in which they interact and to the function which is determined by these interactions. These data have been organized in databases and servers. However, to make these really useful, it is essential not only to be aware of these, but in particular to have a working knowledge of which tools to use for a given problem; what are the tool advantages and drawbacks; and no less important how to combine these for a particular goal since usually it is not one tool, but some combination of tool-modules that is needed. This is the goal of this review.
Collapse
Affiliation(s)
- Nurcan Tuncbag
- Computational Sciences and Engineering Program at Koc University, Istanbul, Turkey
| | | | | | | | | |
Collapse
|
167
|
Abstract
Machine-learning techniques can classify functionally related proteins where homology-transfer as well as sequence and structure motifs fail. Here, we present a method that aimed at complementing homology-transfer in the identification of cell cycle control kinases from sequence alone. First, we identified functionally significant residues in cell cycle proteins through their high sequence conservation and biophysical properties. We then incorporated these residues and their features into support vector machines (SVM) to identify new kinases and more specifically to differentiate cell cycle kinases from other kinases and other proteins. As expected, the most informative residues tend to be highly conserved and tend to localize in the ATP binding regions of the kinases. Another observation confirmed that ATP binding regions are typically not found on the surface but in partially buried sites, and that this fact is correctly captured by accessibility predictions. Using these highly conserved, semi-buried residues and their biophysical properties, we could distinguish cell cycle S/T kinases from other kinase families at levels around 70-80% accuracy and 62-81% coverage. An application to the entire human proteome predicted at least 97 human proteins with limited previous annotations to be candidates for cell cycle kinases.
Collapse
Affiliation(s)
- Kazimierz O. Wrzeszczynski
- Department of Biochemistry and Molecular Biophysics, Columbia University, 630 West 168th Street, New York, NY 10032, USA
- Columbia University Center for Computational Biology and Bioinformatics (C2B2), 1130 St. Nicholas Ave. Rm. 802, New York, NY 10032, USA
- NorthEast Structural Genomics Consortium (NESG), Columbia University, 1130 St. Nicholas Ave. Rm. 802, New York, NY 10032, USA
- Integrated Program in Cellular, Molecular, Structural and Genetic Studies, Columbia University, 630 West 168th Street, New York, NY 10032, USA
| | - Burkhard Rost
- Department of Biochemistry and Molecular Biophysics, Columbia University, 630 West 168th Street, New York, NY 10032, USA
- Columbia University Center for Computational Biology and Bioinformatics (C2B2), 1130 St. Nicholas Ave. Rm. 802, New York, NY 10032, USA
- NorthEast Structural Genomics Consortium (NESG), Columbia University, 1130 St. Nicholas Ave. Rm. 802, New York, NY 10032, USA
| |
Collapse
|
168
|
Prediction of protein-protein interaction sites in sequences and 3D structures by random forests. PLoS Comput Biol 2009; 5:e1000278. [PMID: 19180183 PMCID: PMC2621338 DOI: 10.1371/journal.pcbi.1000278] [Citation(s) in RCA: 111] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2008] [Accepted: 12/16/2008] [Indexed: 11/19/2022] Open
Abstract
Identifying interaction sites in proteins provides important clues to the function of a protein and is becoming increasingly relevant in topics such as systems biology and drug discovery. Although there are numerous papers on the prediction of interaction sites using information derived from structure, there are only a few case reports on the prediction of interaction residues based solely on protein sequence. Here, a sliding window approach is combined with the Random Forests method to predict protein interaction sites using (i) a combination of sequence- and structure-derived parameters and (ii) sequence information alone. For sequence-based prediction we achieved a precision of 84% with a 26% recall and an F-measure of 40%. When combined with structural information, the prediction performance increases to a precision of 76% and a recall of 38% with an F-measure of 51%. We also present an attempt to rationalize the sliding window size and demonstrate that a nine-residue window is the most suitable for predictor construction. Finally, we demonstrate the applicability of our prediction methods by modeling the Ras–Raf complex using predicted interaction sites as target binding interfaces. Our results suggest that it is possible to predict protein interaction sites with quite a high accuracy using only sequence information. In their active state, proteins—the workhorses of a living cell—need to have a defined 3D structure. The majority of functions in the living cell are performed through protein interactions that occur through specific, often unknown, residues on their surfaces. We can study protein interactions either qualitatively (interaction: yes/no) using large-scale, high-throughput experiments or determine specific interaction sites by using biophysical techniques, such as, for example, X-ray crystallography, that are much more laborious and yet unable to provide us with a complete interaction map within the cell. This paper presents the machine learning classification method termed “Random Forests” in its application to predicting interaction sites. We use interaction data from available experimental evidence to train the classifier and predict the interacting residues on proteins with unknown 3D structures. Using this approach, we are able to predict many more interactions in greater detail (i.e., to accurately predict most of the binding site) and with that to infer knowledge about the functions of unknown proteins.
Collapse
|
169
|
Engelen S, Trojan LA, Sacquin-Mora S, Lavery R, Carbone A. Joint evolutionary trees: a large-scale method to predict protein interfaces based on sequence sampling. PLoS Comput Biol 2009; 5:e1000267. [PMID: 19165315 PMCID: PMC2613531 DOI: 10.1371/journal.pcbi.1000267] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2008] [Accepted: 12/04/2008] [Indexed: 11/18/2022] Open
Abstract
The Joint Evolutionary Trees (JET) method detects protein interfaces, the core
residues involved in the folding process, and residues susceptible to
site-directed mutagenesis and relevant to molecular recognition. The approach,
based on the Evolutionary Trace (ET) method, introduces a novel way to treat
evolutionary information. Families of homologous sequences are analyzed through
a Gibbs-like sampling of distance trees to reduce effects of erroneous multiple
alignment and impacts of weakly homologous sequences on distance tree
construction. The sampling method makes sequence analysis more sensitive to
functional and structural importance of individual residues by avoiding effects
of the overrepresentation of highly homologous sequences and improves
computational efficiency. A carefully designed clustering method is parametrized
on the target structure to detect and extend patches on protein surfaces into
predicted interaction sites. Clustering takes into account residues'
physical-chemical properties as well as conservation. Large-scale application of
JET requires the system to be adjustable for different datasets and to guarantee
predictions even if the signal is low. Flexibility was achieved by a careful
treatment of the number of retrieved sequences, the amino acid distance between
sequences, and the selective thresholds for cluster identification. An iterative
version of JET (iJET) that guarantees finding the most likely interface residues
is proposed as the appropriate tool for large-scale predictions. Tests are
carried out on the Huang database of 62 heterodimer, homodimer, and transient
complexes and on 265 interfaces belonging to signal transduction proteins,
enzymes, inhibitors, antibodies, antigens, and others. A specific set of
proteins chosen for their special functional and structural properties
illustrate JET behavior on a large variety of interactions covering proteins,
ligands, DNA, and RNA. JET is compared at a large scale to ET and to Consurf,
Rate4Site, siteFiNDER|3D, and SCORECONS on specific structures. A significant
improvement in performance and computational efficiency is shown. Information obtained on the structure of macromolecular complexes is important
for identifying functionally important partners but also for determining how
such interactions will be perturbed by natural or engineered site mutations.
Hence, to fully understand or control biological processes we need to predict in
the most accurate manner protein interfaces for a protein structure, possibly
without knowing its partners. Joint Evolutionary Trees (JET) is a method
designed to detect very different types of interactions of a protein with
another protein, ligands, DNA, and RNA. It uses a carefully designed sampling
method, making sequence analysis more sensitive to the functional and structural
importance of individual residues, and a clustering method parametrized on the
target structure for the detection of patches on protein surfaces and their
extension into predicted interaction sites. JET is a large-scale method, highly
accurate and potentially applicable to search for protein partners.
Collapse
Affiliation(s)
- Stefan Engelen
- Génomique Analytique, Université Pierre et Marie
Curie-Paris 6, UMR S511, Paris, France
- INSERM, U511, Paris, France
| | - Ladislas A. Trojan
- Génomique Analytique, Université Pierre et Marie
Curie-Paris 6, UMR S511, Paris, France
- INSERM, U511, Paris, France
| | | | - Richard Lavery
- Institut de Biologie et Chimie des Protéines, CNRS UMR
5086/IFR 128/Université de Lyon, Lyon, France
| | - Alessandra Carbone
- Génomique Analytique, Université Pierre et Marie
Curie-Paris 6, UMR S511, Paris, France
- INSERM, U511, Paris, France
- * E-mail:
| |
Collapse
|
170
|
A heterospecific leucine zipper tetramer. ACTA ACUST UNITED AC 2008; 15:908-19. [PMID: 18804028 PMCID: PMC7111190 DOI: 10.1016/j.chembiol.2008.07.008] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2008] [Revised: 07/07/2008] [Accepted: 07/10/2008] [Indexed: 11/21/2022]
Abstract
Protein-protein interactions dictate the assembly of the macromolecular complexes essential for functional networks and cellular behavior. Elucidating principles of molecular recognition governing important interfaces such as coiled coils is a challenging goal for structural and systems biology. We report here that two valine-containing mutants of the GCN4 leucine zipper that fold individually as four-stranded coiled coils associate preferentially in mixtures to form an antiparallel, heterotetrameric structure. X-ray crystallographic analysis reveals that the coinciding hydrophobic interfaces of the hetero- and homotetramers differ in detail, explaining their partnering and structural specificity. Equilibrium disulfide exchange and thermal denaturation experiments show that the 50-fold preference for heterospecificity results from a combination of preferential packing and hydrophobicity. The extent of preference is sensitive to the side chains comprising the interface. Thus, heterotypic versus homotypic interaction specificity in coiled coils reflects a delicate balance in complementarity of shape and chemistry of the participating side chains.
Collapse
|
171
|
Abstract
AbstractProtein–protein recognition plays an essential role in structure and function. Specific non-covalent interactions stabilize the structure of macromolecular assemblies, exemplified in this review by oligomeric proteins and the capsids of icosahedral viruses. They also allow proteins to form complexes that have a very wide range of stability and lifetimes and are involved in all cellular processes. We present some of the structure-based computational methods that have been developed to characterize the quaternary structure of oligomeric proteins and other molecular assemblies and analyze the properties of the interfaces between the subunits. We compare the size, the chemical and amino acid compositions and the atomic packing of the subunit interfaces of protein–protein complexes, oligomeric proteins, viral capsids and protein–nucleic acid complexes. These biologically significant interfaces are generally close-packed, whereas the non-specific interfaces between molecules in protein crystals are loosely packed, an observation that gives a structural basis to specific recognition. A distinction is made within each interface between a core that contains buried atoms and a solvent accessible rim. The core and the rim differ in their amino acid composition and their conservation in evolution, and the distinction helps correlating the structural data with the results of site-directed mutagenesis and in vitro studies of self-assembly.
Collapse
|
172
|
Bromberg Y, Rost B. Comprehensive in silico mutagenesis highlights functionally important residues in proteins. Bioinformatics 2008; 24:i207-12. [PMID: 18689826 PMCID: PMC2597370 DOI: 10.1093/bioinformatics/btn268] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Mutating residues into alanine (alanine scanning) is one of the fastest experimental means of probing hypotheses about protein function. Alanine scans can reveal functional hot spots, i.e. residues that alter function upon mutation. In vitro mutagenesis is cumbersome and costly: probing all residues in a protein is typically as impossible as substituting by all non-native amino acids. In contrast, such exhaustive mutagenesis is feasible in silico. RESULTS Previously, we developed SNAP to predict functional changes due to non-synonymous single nucleotide polymorphisms. Here, we applied SNAP to all experimental mutations in the ASEdb database of alanine scans; we identi.ed 70% of the hot spots (>or=1 kCal/mol change in binding energy); more severe changes were predicted more accurately. Encouraged, we carried out a complete all-against-all in silico mutagenesis for human glucokinase. Many of the residues predicted as functionally important have indeed been con.rmed in the literature, others await experimental veri.cation, and our method is ready to aid in the design of in vitro mutagenesis. AVAILABILITY ASEdb and glucokinase scores are available at http://www.rostlab.org/services/SNAP. For submissions of large/whole proteins for processing please contact the author.
Collapse
Affiliation(s)
- Yana Bromberg
- Department of Biochemistry Molecular Biophysics, Columbia University, 630 West 168th St, New York, NY 10032, USA.
| | | |
Collapse
|
173
|
Ming D, Cohn JD, Wall ME. Fast dynamics perturbation analysis for prediction of protein functional sites. BMC STRUCTURAL BIOLOGY 2008; 8:5. [PMID: 18234095 PMCID: PMC2276503 DOI: 10.1186/1472-6807-8-5] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/18/2007] [Accepted: 01/30/2008] [Indexed: 11/10/2022]
Abstract
Background We present a fast version of the dynamics perturbation analysis (DPA) algorithm to predict functional sites in protein structures. The original DPA algorithm finds regions in proteins where interactions cause a large change in the protein conformational distribution, as measured using the relative entropy Dx. Such regions are associated with functional sites. Results The Fast DPA algorithm, which accelerates DPA calculations, is motivated by an empirical observation that Dx in a normal-modes model is highly correlated with an entropic term that only depends on the eigenvalues of the normal modes. The eigenvalues are accurately estimated using first-order perturbation theory, resulting in a N-fold reduction in the overall computational requirements of the algorithm, where N is the number of residues in the protein. The performance of the original and Fast DPA algorithms was compared using protein structures from a standard small-molecule docking test set. For nominal implementations of each algorithm, top-ranked Fast DPA predictions overlapped the true binding site 94% of the time, compared to 87% of the time for original DPA. In addition, per-protein recall statistics (fraction of binding-site residues that are among predicted residues) were slightly better for Fast DPA. On the other hand, per-protein precision statistics (fraction of predicted residues that are among binding-site residues) were slightly better using original DPA. Overall, the performance of Fast DPA in predicting ligand-binding-site residues was comparable to that of the original DPA algorithm. Conclusion Compared to the original DPA algorithm, the decreased run time with comparable performance makes Fast DPA well-suited for implementation on a web server and for high-throughput analysis.
Collapse
Affiliation(s)
- Dengming Ming
- Computer, Computational, and Statistical Scienes Division, Los Alamos National Laboratory, Los Alamos, New Mexico, USA.
| | | | | |
Collapse
|
174
|
Abstract
MOTIVATION Thousands of proteins are known to bind to DNA; for most of them the mechanism of action and the residues that bind to DNA, i.e. the binding sites, are yet unknown. Experimental identification of binding sites requires expensive and laborious methods such as mutagenesis and binding essays. Hence, such studies are not applicable on a large scale. If the 3D structure of a protein is known, it is often possible to predict DNA-binding sites in silico. However, for most proteins, such knowledge is not available. RESULTS It has been shown that DNA-binding residues have distinct biophysical characteristics. Here we demonstrate that these characteristics are so distinct that they enable accurate prediction of the residues that bind DNA directly from amino acid sequence, without requiring any additional experimental or structural information. In a cross-validation based on the largest non-redundant dataset of high-resolution protein-DNA complexes available today, we found that 89% of our predictions are confirmed by experimental data. Thus, it is now possible to identify DNA-binding sites on a proteomic scale even in the absence of any experimental data or 3D-structural information. AVAILABILITY http://cubic.bioc.columbia.edu/services/disis.
Collapse
Affiliation(s)
- Yanay Ofran
- Department of Biochemistry and Molecular Biophysics, Columbia University, 630 West 168th Street, New York, NY 10032, USA.
| | | | | |
Collapse
|
175
|
Ofran Y, Rost B. Protein-protein interaction hotspots carved into sequences. PLoS Comput Biol 2007; 3:e119. [PMID: 17630824 PMCID: PMC1914369 DOI: 10.1371/journal.pcbi.0030119] [Citation(s) in RCA: 179] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2006] [Accepted: 05/11/2007] [Indexed: 11/24/2022] Open
Abstract
Protein-protein interactions, a key to almost any biological process, are mediated by molecular mechanisms that are not entirely clear. The study of these mechanisms often focuses on all residues at protein-protein interfaces. However, only a small subset of all interface residues is actually essential for recognition or binding. Commonly referred to as "hotspots," these essential residues are defined as residues that impede protein-protein interactions if mutated. While no in silico tool identifies hotspots in unbound chains, numerous prediction methods were designed to identify all the residues in a protein that are likely to be a part of protein-protein interfaces. These methods typically identify successfully only a small fraction of all interface residues. Here, we analyzed the hypothesis that the two subsets correspond (i.e., that in silico methods may predict few residues because they preferentially predict hotspots). We demonstrate that this is indeed the case and that we can therefore predict directly from the sequence of a single protein which residues are interaction hotspots (without knowledge of the interaction partner). Our results suggested that most protein complexes are stabilized by similar basic principles. The ability to accurately and efficiently identify hotspots from sequence enables the annotation and analysis of protein-protein interaction hotspots in entire organisms and thus may benefit function prediction and drug development. The server for prediction is available at http://www.rostlab.org/services/isis.
Collapse
Affiliation(s)
- Yanay Ofran
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, USA.
| | | |
Collapse
|
176
|
Abstract
Many genetic variations are single nucleotide polymorphisms (SNPs). Non-synonymous SNPs are ‘neutral’ if the resulting point-mutated protein is not functionally discernible from the wild type and ‘non-neutral’ otherwise. The ability to identify non-neutral substitutions could significantly aid targeting disease causing detrimental mutations, as well as SNPs that increase the fitness of particular phenotypes. Here, we introduced comprehensive data sets to assess the performance of methods that predict SNP effects. Along we introduced SNAP (screening for non-acceptable polymorphisms), a neural network-based method for the prediction of the functional effects of non-synonymous SNPs. SNAP needs only sequence information as input, but benefits from functional and structural annotations, if available. In a cross-validation test on over 80 000 mutants, SNAP identified 80% of the non-neutral substitutions at 77% accuracy and 76% of the neutral substitutions at 80% accuracy. This constituted an important improvement over other methods; the improvement rose to over ten percentage points for mutants for which existing methods disagreed. Possibly even more importantly SNAP introduced a well-calibrated measure for the reliability of each prediction. This measure will allow users to focus on the most accurate predictions and/or the most severe effects. Available at http://www.rostlab.org/services/SNAP
Collapse
Affiliation(s)
- Yana Bromberg
- Department of Biochemistry and Molecular Biophysics, Columbia University, 630 West 168th St., New York, NY 10032, USA.
| | | |
Collapse
|