Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Ofran Y, Rost B. ISIS: interaction sites identified from sequence. Bioinformatics 2007;23:e13-6. [PMID: 17237081 DOI: 10.1093/bioinformatics/btl303] [Citation(s) in RCA: 172] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Number

Cited by Other Article(s)

151

Mészáros B, Simon I, Dosztányi Z. The expanding view of protein-protein interactions: complexes involving intrinsically disordered proteins. Phys Biol 2011;8:035003. [PMID: 21572179 DOI: 10.1088/1478-3975/8/3/035003] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]

152

Fernández‐Recio J. Prediction of protein binding sites and hot spots. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2011. [DOI: 10.1002/wcms.45] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

153

Chennamsetty N, Voynov V, Kayser V, Helk B, Trout BL. Prediction of protein binding regions. Proteins 2010;79:888-97. [DOI: 10.1002/prot.22926] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2010] [Revised: 09/23/2010] [Accepted: 10/13/2010] [Indexed: 11/07/2022]

154

Lateral acquisition of genes is affected by the friendliness of their products. Proc Natl Acad Sci U S A 2010;108:343-8. [PMID: 21149709 DOI: 10.1073/pnas.1009775108] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

155

Chen P, Li J. Sequence-based identification of interface residues by an integrative profile combining hydrophobic and evolutionary information. BMC Bioinformatics 2010;11:402. [PMID: 20667087 PMCID: PMC2921408 DOI: 10.1186/1471-2105-11-402] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2010] [Accepted: 07/28/2010] [Indexed: 01/09/2023] Open

Abstract

BACKGROUND

Protein-protein interactions play essential roles in protein function determination and drug design. Numerous methods have been proposed to recognize their interaction sites, however, only a small proportion of protein complexes have been successfully resolved due to the high cost. Therefore, it is important to improve the performance for predicting protein interaction sites based on primary sequence alone.

RESULTS

We propose a new idea to construct an integrative profile for each residue in a protein by combining its hydrophobic and evolutionary information. A support vector machine (SVM) ensemble is then developed, where SVMs train on different pairs of positive (interface sites) and negative (non-interface sites) subsets. The subsets having roughly the same sizes are grouped in the order of accessible surface area change before and after complexation. A self-organizing map (SOM) technique is applied to group similar input vectors to make more accurate the identification of interface residues. An ensemble of ten-SVMs achieves an MCC improvement by around 8% and F1 improvement by around 9% over that of three-SVMs. As expected, SVM ensembles constantly perform better than individual SVMs. In addition, the model by the integrative profiles outperforms that based on the sequence profile or the hydropathy scale alone. As our method uses a small number of features to encode the input vectors, our model is simpler, faster and more accurate than the existing methods.

CONCLUSIONS

The integrative profile by combining hydrophobic and evolutionary information contributes most to the protein-protein interaction prediction. Results show that evolutionary context of residue with respect to hydrophobicity makes better the identification of protein interface residues. In addition, the ensemble of SVM classifiers improves the prediction performance.

AVAILABILITY

Datasets and software are available at http://mail.ustc.edu.cn/~bigeagle/BMCBioinfo2010/index.htm.

Collapse

156

Murakami Y, Mizuguchi K. Applying the Naïve Bayes classifier with kernel density estimation to the prediction of protein-protein interaction sites. ACTA ACUST UNITED AC 2010;26:1841-8. [PMID: 20529890 DOI: 10.1093/bioinformatics/btq302] [Citation(s) in RCA: 161] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Abstract

MOTIVATION

The limited availability of protein structures often restricts the functional annotation of proteins and the identification of their protein-protein interaction sites. Computational methods to identify interaction sites from protein sequences alone are, therefore, required for unraveling the functions of many proteins. This article describes a new method (PSIVER) to predict interaction sites, i.e. residues binding to other proteins, in protein sequences. Only sequence features (position-specific scoring matrix and predicted accessibility) are used for training a Naïve Bayes classifier (NBC), and conditional probabilities of each sequence feature are estimated using a kernel density estimation method (KDE).

RESULTS

The leave-one out cross-validation of PSIVER achieved a Matthews correlation coefficient (MCC) of 0.151, an F-measure of 35.3%, a precision of 30.6% and a recall of 41.6% on a non-redundant set of 186 protein sequences extracted from 105 heterodimers in the Protein Data Bank (consisting of 36 219 residues, of which 15.2% were known interface residues). Even though the dataset used for training was highly imbalanced, a randomization test demonstrated that the proposed method managed to avoid overfitting. PSIVER was also tested on 72 sequences not used in training (consisting of 18 140 residues, of which 10.6% were known interface residues), and achieved an MCC of 0.135, an F-measure of 31.5%, a precision of 25.0% and a recall of 46.5%, outperforming other publicly available servers tested on the same dataset. PSIVER enables experimental biologists to identify potential interface residues in unknown proteins from sequence information alone, and to mutate those residues selectively in order to unravel protein functions.

AVAILABILITY

Freely available on the web at http://tardis.nibio.go.jp/PSIVER/

Collapse

157

Ozbek P, Soner S, Erman B, Haliloglu T. DNABINDPROT: fluctuation-based predictor of DNA-binding residues within a network of interacting residues. Nucleic Acids Res 2010;38:W417-23. [PMID: 20478828 PMCID: PMC2896127 DOI: 10.1093/nar/gkq396] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open

158

Gromiha MM, Yokota K, Fukui K. Energy based approach for understanding the recognition mechanism in protein-protein complexes. MOLECULAR BIOSYSTEMS 2010;5:1779-86. [PMID: 19593470 DOI: 10.1039/b904161n] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]

159

Identification and Analysis of Binding Site Residues in Protein Complexes: Energy Based Approach. ACTA ACUST UNITED AC 2010. [DOI: 10.1007/978-3-642-14922-1_78] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]

160

Gromiha MM, Yokota K, Fukui K. Sequence and structural analysis of binding site residues in protein-protein complexes. Int J Biol Macromol 2009;46:187-92. [PMID: 20026105 DOI: 10.1016/j.ijbiomac.2009.11.009] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2009] [Revised: 11/23/2009] [Accepted: 11/24/2009] [Indexed: 12/24/2022]

161

Dosztanyi Z, Meszaros B, Simon I. Bioinformatical approaches to characterize intrinsically disordered/unstructured proteins. Brief Bioinform 2009;11:225-43. [DOI: 10.1093/bib/bbp061] [Citation(s) in RCA: 93] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

162

Liu B, Wang X, Lin L, Tang B, Dong Q, Wang X. Prediction of protein binding sites in protein structures using hidden Markov support vector machine. BMC Bioinformatics 2009;10:381. [PMID: 19925685 PMCID: PMC2785799 DOI: 10.1186/1471-2105-10-381] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2009] [Accepted: 11/20/2009] [Indexed: 01/08/2023] Open

Abstract

Background

Predicting the binding sites between two interacting proteins provides important clues to the function of a protein. Recent research on protein binding site prediction has been mainly based on widely known machine learning techniques, such as artificial neural networks, support vector machines, conditional random field, etc. However, the prediction performance is still too low to be used in practice. It is necessary to explore new algorithms, theories and features to further improve the performance.

Results

In this study, we introduce a novel machine learning model hidden Markov support vector machine for protein binding site prediction. The model treats the protein binding site prediction as a sequential labelling task based on the maximum margin criterion. Common features derived from protein sequences and structures, including protein sequence profile and residue accessible surface area, are used to train hidden Markov support vector machine. When tested on six data sets, the method based on hidden Markov support vector machine shows better performance than some state-of-the-art methods, including artificial neural networks, support vector machines and conditional random field. Furthermore, its running time is several orders of magnitude shorter than that of the compared methods.

Conclusion

The improved prediction performance and computational efficiency of the method based on hidden Markov support vector machine can be attributed to the following three factors. Firstly, the relation between labels of neighbouring residues is useful for protein binding site prediction. Secondly, the kernel trick is very advantageous to this field. Thirdly, the complexity of the training step for hidden Markov support vector machine is linear with the number of training samples by using the cutting-plane algorithm.

Collapse

163

Using Support Vector Machine Combined with Post-processing Procedure to Improve Prediction of Interface Residues in Transient Complexes. Protein J 2009;28:369-74. [DOI: 10.1007/s10930-009-9203-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]

164

Hu J, Yan C. A tool for calculating binding-site residues on proteins from PDB structures. BMC STRUCTURAL BIOLOGY 2009;9:52. [PMID: 19650927 PMCID: PMC2728722 DOI: 10.1186/1472-6807-9-52] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2009] [Accepted: 08/03/2009] [Indexed: 11/24/2022]

165

Ezkurdia I, Bartoli L, Fariselli P, Casadio R, Valencia A, Tress ML. Progress and challenges in predicting protein-protein interaction sites. Brief Bioinform 2009;10:233-46. [PMID: 19346321 DOI: 10.1093/bib/bbp021] [Citation(s) in RCA: 113] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

166

Tuncbag N, Kar G, Keskin O, Gursoy A, Nussinov R. A survey of available tools and web servers for analysis of protein-protein interactions and interfaces. Brief Bioinform 2009;10:217-32. [PMID: 19240123 DOI: 10.1093/bib/bbp001] [Citation(s) in RCA: 98] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open

167

Wrzeszczynski KO, Rost B. Cell cycle kinases predicted from conserved biophysical properties. Proteins 2009;74:655-68. [PMID: 18704950 PMCID: PMC2629806 DOI: 10.1002/prot.22181] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

168

Prediction of protein-protein interaction sites in sequences and 3D structures by random forests. PLoS Comput Biol 2009;5:e1000278. [PMID: 19180183 PMCID: PMC2621338 DOI: 10.1371/journal.pcbi.1000278] [Citation(s) in RCA: 111] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2008] [Accepted: 12/16/2008] [Indexed: 11/19/2022] Open

Abstract

Identifying interaction sites in proteins provides important clues to the function of a protein and is becoming increasingly relevant in topics such as systems biology and drug discovery. Although there are numerous papers on the prediction of interaction sites using information derived from structure, there are only a few case reports on the prediction of interaction residues based solely on protein sequence. Here, a sliding window approach is combined with the Random Forests method to predict protein interaction sites using (i) a combination of sequence- and structure-derived parameters and (ii) sequence information alone. For sequence-based prediction we achieved a precision of 84% with a 26% recall and an F-measure of 40%. When combined with structural information, the prediction performance increases to a precision of 76% and a recall of 38% with an F-measure of 51%. We also present an attempt to rationalize the sliding window size and demonstrate that a nine-residue window is the most suitable for predictor construction. Finally, we demonstrate the applicability of our prediction methods by modeling the Ras–Raf complex using predicted interaction sites as target binding interfaces. Our results suggest that it is possible to predict protein interaction sites with quite a high accuracy using only sequence information.

In their active state, proteins—the workhorses of a living cell—need to have a defined 3D structure. The majority of functions in the living cell are performed through protein interactions that occur through specific, often unknown, residues on their surfaces. We can study protein interactions either qualitatively (interaction: yes/no) using large-scale, high-throughput experiments or determine specific interaction sites by using biophysical techniques, such as, for example, X-ray crystallography, that are much more laborious and yet unable to provide us with a complete interaction map within the cell. This paper presents the machine learning classification method termed “Random Forests” in its application to predicting interaction sites. We use interaction data from available experimental evidence to train the classifier and predict the interacting residues on proteins with unknown 3D structures. Using this approach, we are able to predict many more interactions in greater detail (i.e., to accurately predict most of the binding site) and with that to infer knowledge about the functions of unknown proteins.

Collapse

169

Engelen S, Trojan LA, Sacquin-Mora S, Lavery R, Carbone A. Joint evolutionary trees: a large-scale method to predict protein interfaces based on sequence sampling. PLoS Comput Biol 2009;5:e1000267. [PMID: 19165315 PMCID: PMC2613531 DOI: 10.1371/journal.pcbi.1000267] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2008] [Accepted: 12/04/2008] [Indexed: 11/18/2022] Open

Abstract

The Joint Evolutionary Trees (JET) method detects protein interfaces, the core residues involved in the folding process, and residues susceptible to site-directed mutagenesis and relevant to molecular recognition. The approach, based on the Evolutionary Trace (ET) method, introduces a novel way to treat evolutionary information. Families of homologous sequences are analyzed through a Gibbs-like sampling of distance trees to reduce effects of erroneous multiple alignment and impacts of weakly homologous sequences on distance tree construction. The sampling method makes sequence analysis more sensitive to functional and structural importance of individual residues by avoiding effects of the overrepresentation of highly homologous sequences and improves computational efficiency. A carefully designed clustering method is parametrized on the target structure to detect and extend patches on protein surfaces into predicted interaction sites. Clustering takes into account residues' physical-chemical properties as well as conservation. Large-scale application of JET requires the system to be adjustable for different datasets and to guarantee predictions even if the signal is low. Flexibility was achieved by a careful treatment of the number of retrieved sequences, the amino acid distance between sequences, and the selective thresholds for cluster identification. An iterative version of JET (iJET) that guarantees finding the most likely interface residues is proposed as the appropriate tool for large-scale predictions. Tests are carried out on the Huang database of 62 heterodimer, homodimer, and transient complexes and on 265 interfaces belonging to signal transduction proteins, enzymes, inhibitors, antibodies, antigens, and others. A specific set of proteins chosen for their special functional and structural properties illustrate JET behavior on a large variety of interactions covering proteins, ligands, DNA, and RNA. JET is compared at a large scale to ET and to Consurf, Rate4Site, siteFiNDER|3D, and SCORECONS on specific structures. A significant improvement in performance and computational efficiency is shown.

Information obtained on the structure of macromolecular complexes is important for identifying functionally important partners but also for determining how such interactions will be perturbed by natural or engineered site mutations. Hence, to fully understand or control biological processes we need to predict in the most accurate manner protein interfaces for a protein structure, possibly without knowing its partners. Joint Evolutionary Trees (JET) is a method designed to detect very different types of interactions of a protein with another protein, ligands, DNA, and RNA. It uses a carefully designed sampling method, making sequence analysis more sensitive to the functional and structural importance of individual residues, and a clustering method parametrized on the target structure for the detection of patches on protein surfaces and their extension into predicted interaction sites. JET is a large-scale method, highly accurate and potentially applicable to search for protein partners.

Collapse

170

A heterospecific leucine zipper tetramer. ACTA ACUST UNITED AC 2008;15:908-19. [PMID: 18804028 PMCID: PMC7111190 DOI: 10.1016/j.chembiol.2008.07.008] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2008] [Revised: 07/07/2008] [Accepted: 07/10/2008] [Indexed: 11/21/2022]

171

Protein–protein interaction and quaternary structure. Q Rev Biophys 2008;41:133-80. [PMID: 18812015 DOI: 10.1017/s0033583508004708] [Citation(s) in RCA: 301] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

172

Bromberg Y, Rost B. Comprehensive in silico mutagenesis highlights functionally important residues in proteins. Bioinformatics 2008;24:i207-12. [PMID: 18689826 PMCID: PMC2597370 DOI: 10.1093/bioinformatics/btn268] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

173

Ming D, Cohn JD, Wall ME. Fast dynamics perturbation analysis for prediction of protein functional sites. BMC STRUCTURAL BIOLOGY 2008;8:5. [PMID: 18234095 PMCID: PMC2276503 DOI: 10.1186/1472-6807-8-5] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/18/2007] [Accepted: 01/30/2008] [Indexed: 11/10/2022]

174

Ofran Y, Mysore V, Rost B. Prediction of DNA-binding residues from sequence. ACTA ACUST UNITED AC 2007;23:i347-53. [PMID: 17646316 DOI: 10.1093/bioinformatics/btm174] [Citation(s) in RCA: 118] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

175

Ofran Y, Rost B. Protein-protein interaction hotspots carved into sequences. PLoS Comput Biol 2007;3:e119. [PMID: 17630824 PMCID: PMC1914369 DOI: 10.1371/journal.pcbi.0030119] [Citation(s) in RCA: 179] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2006] [Accepted: 05/11/2007] [Indexed: 11/24/2022] Open

176

Bromberg Y, Rost B. SNAP: predict effect of non-synonymous polymorphisms on function. Nucleic Acids Res 2007;35:3823-35. [PMID: 17526529 PMCID: PMC1920242 DOI: 10.1093/nar/gkm238] [Citation(s) in RCA: 667] [Impact Index Per Article: 37.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open