101
|
Munteanu CR, Pimenta AC, Fernandez-Lozano C, Melo A, Cordeiro MNDS, Moreira IS. Solvent accessible surface area-based hot-spot detection methods for protein-protein and protein-nucleic acid interfaces. J Chem Inf Model 2015; 55:1077-86. [PMID: 25845030 DOI: 10.1021/ci500760m] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Due to the importance of hot-spots (HS) detection and the efficiency of computational methodologies, several HS detecting approaches have been developed. The current paper presents new models to predict HS for protein-protein and protein-nucleic acid interactions with better statistics compared with the ones currently reported in literature. These models are based on solvent accessible surface area (SASA) and genetic conservation features subjected to simple Bayes networks (protein-protein systems) and a more complex multi-objective genetic algorithm-support vector machine algorithms (protein-nucleic acid systems). The best models for these interactions have been implemented in two free Web tools.
Collapse
Affiliation(s)
- Cristian R Munteanu
- †Information and Communication Technologies Department, Computer Science Faculty, University of A Coruna, Campus de Elviña s/n, 15071 A Coruña, Spain
| | - António C Pimenta
- ‡REQUIMTE/Departamento de Química e Bioquímica, Faculdade de Ciências da Universidade do Porto, Rua do Campo Alegre s/n, 4169-007 Porto, Portugal
| | - Carlos Fernandez-Lozano
- †Information and Communication Technologies Department, Computer Science Faculty, University of A Coruna, Campus de Elviña s/n, 15071 A Coruña, Spain
| | - André Melo
- ‡REQUIMTE/Departamento de Química e Bioquímica, Faculdade de Ciências da Universidade do Porto, Rua do Campo Alegre s/n, 4169-007 Porto, Portugal
| | - Maria N D S Cordeiro
- ‡REQUIMTE/Departamento de Química e Bioquímica, Faculdade de Ciências da Universidade do Porto, Rua do Campo Alegre s/n, 4169-007 Porto, Portugal
| | - Irina S Moreira
- ‡REQUIMTE/Departamento de Química e Bioquímica, Faculdade de Ciências da Universidade do Porto, Rua do Campo Alegre s/n, 4169-007 Porto, Portugal.,§CNC-Center for Neuroscience and Cell Biology, Universidade de Coimbra, Rua Larga, FMUC, Polo I, 1°andar, 3004-517 Coimbra, Portugal
| |
Collapse
|
102
|
Maheshwari S, Brylinski M. Predicting protein interface residues using easily accessible on-line resources. Brief Bioinform 2015; 16:1025-34. [PMID: 25797794 DOI: 10.1093/bib/bbv009] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2014] [Indexed: 01/20/2023] Open
Abstract
It has been more than a decade since the completion of the Human Genome Project that provided us with a complete list of human proteins. The next obvious task is to figure out how various parts interact with each other. On that account, we review 10 methods for protein interface prediction, which are freely available as web servers. In addition, we comparatively evaluate their performance on a common data set comprising different quality target structures. We find that using experimental structures and high-quality homology models, structure-based methods outperform those using only protein sequences, with global template-based approaches providing the best performance. For moderate-quality models, sequence-based methods often perform better than those structure-based techniques that rely on fine atomic details. We note that post-processing protocols implemented in several methods quantitatively improve the results only for experimental structures, suggesting that these procedures should be tuned up for computer-generated models. Finally, we anticipate that advanced meta-prediction protocols are likely to enhance interface residue prediction. Notwithstanding further improvements, easily accessible web servers already provide the scientific community with convenient resources for the identification of protein-protein interaction sites.
Collapse
|
103
|
Wierschin T, Wang K, Welter M, Waack S, Stanke M. Combining features in a graphical model to predict protein binding sites. Proteins 2015; 83:844-52. [PMID: 25663045 DOI: 10.1002/prot.24775] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2014] [Revised: 01/16/2015] [Accepted: 01/26/2015] [Indexed: 11/08/2022]
Abstract
Large efforts have been made in classifying residues as binding sites in proteins using machine learning methods. The prediction task can be translated into the computational challenge of assigning each residue the label binding site or non-binding site. Observational data comes from various possibly highly correlated sources. It includes the structure of the protein but not the structure of the complex. The model class of conditional random fields (CRFs) has previously successfully been used for protein binding site prediction. Here, a new CRF-approach is presented that models the dependencies of residues using a general graphical structure defined as a neighborhood graph and thus our model makes fewer independence assumptions on the labels than sequential labeling approaches. A novel node feature "change in free energy" is introduced into the model, which is then denoted by ΔF-CRF. Parameters are trained with an online large-margin algorithm. Using the standard feature class relative accessible surface area alone, the general graph-structure CRF already achieves higher prediction accuracy than the linear chain CRF of Li et al. ΔF-CRF performs significantly better on a large range of false positive rates than the support-vector-machine-based program PresCont of Zellner et al. on a homodimer set containing 128 chains. ΔF-CRF has a broader scope than PresCont since it is not constrained to protein subgroups and requires no multiple sequence alignment. The improvement is attributed to the advantageous combination of the novel node feature with the standard feature and to the adopted parameter training method.
Collapse
Affiliation(s)
- Torsten Wierschin
- Institute of Mathematics and Computer Science, University of Greifswald, 17487, Greifswald, Germany
| | | | | | | | | |
Collapse
|
104
|
Wiech EM, Cheng HP, Singh SM. Molecular modeling and computational analyses suggests that the Sinorhizobium meliloti periplasmic regulator protein ExoR adopts a superhelical fold and is controlled by a unique mechanism of proteolysis. Protein Sci 2015; 24:319-27. [PMID: 25492513 PMCID: PMC4353358 DOI: 10.1002/pro.2616] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2014] [Revised: 11/26/2014] [Accepted: 12/01/2014] [Indexed: 12/12/2022]
Abstract
The Sinorhizobium meliloti periplasmic ExoR protein and the ExoS/ChvI two-component system form a regulatory mechanism that directly controls the transformation of free-living to host-invading cells. In the absence of crystal structures, understanding the molecular mechanism of interaction between ExoR and the ExoS sensor, which is believed to drive the key regulatory step in the invasion process, remains a major challenge. In this study, we present a theoretical structural model of the active form of ExoR protein, ExoRm , generated using computational methods. Our model suggests that ExoR possesses a super-helical fold comprising 12 α-helices forming six Sel1-like repeats, including two that were unidentified in previous studies. This fold is highly conducive to mediating protein-protein interactions and this is corroborated by the identification of putative protein binding sites on the surface of the ExoRm protein. Our studies reveal two novel insights: (a) an extended conformation of the third Sel1-like repeat that might be important for ExoR regulatory function and (b) a buried proteolytic site that implies a unique proteolytic mechanism. This study provides new and interesting insights into the structure of S. meliloti ExoR, lays the groundwork for elaborating the molecular mechanism of ExoRm cleavage, ExoRm -ExoS interactions, and studies of ExoR homologs in other bacterial host interactions.
Collapse
Affiliation(s)
- Eliza M Wiech
- Department of Biology, The Graduate Center of the City University of New YorkNew York, New York, 10016
- Department of Biology, Brooklyn College, The City University of New YorkBrooklyn, New York, 11210
| | - Hai-Ping Cheng
- Department of Biology, The Graduate Center of the City University of New YorkNew York, New York, 10016
- Biological Sciences Department, Lehman College, The City University of New YorkBronx, New York, 10468
| | - Shaneen M Singh
- Department of Biology, The Graduate Center of the City University of New YorkNew York, New York, 10016
- Department of Biology, Brooklyn College, The City University of New YorkBrooklyn, New York, 11210
| |
Collapse
|
105
|
Aumentado-Armstrong TT, Istrate B, Murgita RA. Algorithmic approaches to protein-protein interaction site prediction. Algorithms Mol Biol 2015; 10:7. [PMID: 25713596 PMCID: PMC4338852 DOI: 10.1186/s13015-015-0033-9] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2014] [Accepted: 01/07/2015] [Indexed: 12/19/2022] Open
Abstract
Interaction sites on protein surfaces mediate virtually all biological activities, and their identification holds promise for disease treatment and drug design. Novel algorithmic approaches for the prediction of these sites have been produced at a rapid rate, and the field has seen significant advancement over the past decade. However, the most current methods have not yet been reviewed in a systematic and comprehensive fashion. Herein, we describe the intricacies of the biological theory, datasets, and features required for modern protein-protein interaction site (PPIS) prediction, and present an integrative analysis of the state-of-the-art algorithms and their performance. First, the major sources of data used by predictors are reviewed, including training sets, evaluation sets, and methods for their procurement. Then, the features employed and their importance in the biological characterization of PPISs are explored. This is followed by a discussion of the methodologies adopted in contemporary prediction programs, as well as their relative performance on the datasets most recently used for evaluation. In addition, the potential utility that PPIS identification holds for rational drug design, hotspot prediction, and computational molecular docking is described. Finally, an analysis of the most promising areas for future development of the field is presented.
Collapse
|
106
|
Ntostis P, Agiannitopoulos K, Tsaousis G, Pantos K, Lamnissou K. Evidence for association of the rs605059 polymorphism of HSD17B1 gene with recurrent spontaneous abortions. J Matern Fetal Neonatal Med 2014; 28:2250-3. [PMID: 25394609 DOI: 10.3109/14767058.2014.984289] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
OBJECTIVE To investigate whether the missense rs605059 polymorphism of HSD17B1 gene, which is expressed mainly in the placenta, is associated with recurrent spontaneous abortions (RSA). METHODS This study group consisted of 138 women with three or more unexplained spontaneous abortions, before the 20th week of gestation, with the same partner, while 140 healthy women served as controls. To genotype the individuals, we used the polymerase chain reaction-restriction fragment length polymorphism method. RESULTS The genotyping of the rs605059 polymorphism revealed the frequencies 0.22, 0.45 and 0.33, for AA, GA and GG genotypes, respectively, for the patient group and 0.37, 0.41 and 0.22, respectively, for the control group. The A allele frequencies were 0.44 and 0.57 for the patient and control group, respectively, and the G allele frequencies were 0.56 and 0.43 for the patient and control group, respectively. Statistical analysis of the results indicated the existence of significant differences in genotype and allele frequencies between the two groups. CONCLUSION The rs605059 polymorphism of the HSD17B1 gene is associated with increased risk of RSA in our Caucasian Greek population. Thus it could be used as a prognostic genetic marker for RSA.
Collapse
Affiliation(s)
- Panagiotis Ntostis
- a Department of Genetics and Biotechnology, Faculty of Biology , University of Athens , Athens , Greece
| | | | - Georgios Tsaousis
- b Department of Cell Biology and Biophysics, Faculty of Biology , University of Athens , Athens , Greece , and
| | | | - Klea Lamnissou
- a Department of Genetics and Biotechnology, Faculty of Biology , University of Athens , Athens , Greece
| |
Collapse
|
107
|
Yugandhar K, Gromiha MM. Protein–protein binding affinity prediction from amino acid sequence. Bioinformatics 2014; 30:3583-9. [DOI: 10.1093/bioinformatics/btu580] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
|
108
|
Dong Z, Wang K, Dang TKL, Gültas M, Welter M, Wierschin T, Stanke M, Waack S. CRF-based models of protein surfaces improve protein-protein interaction site predictions. BMC Bioinformatics 2014; 15:277. [PMID: 25124108 PMCID: PMC4150965 DOI: 10.1186/1471-2105-15-277] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2013] [Accepted: 08/01/2014] [Indexed: 11/13/2022] Open
Abstract
Background The identification of protein-protein interaction sites is a computationally challenging task and important for understanding the biology of protein complexes. There is a rich literature in this field. A broad class of approaches assign to each candidate residue a real-valued score that measures how likely it is that the residue belongs to the interface. The prediction is obtained by thresholding this score. Some probabilistic models classify the residues on the basis of the posterior probabilities. In this paper, we introduce pairwise conditional random fields (pCRFs) in which edges are not restricted to the backbone as in the case of linear-chain CRFs utilized by Li et al. (2007). In fact, any 3D-neighborhood relation can be modeled. On grounds of a generalized Viterbi inference algorithm and a piecewise training process for pCRFs, we demonstrate how to utilize pCRFs to enhance a given residue-wise score-based protein-protein interface predictor on the surface of the protein under study. The features of the pCRF are solely based on the interface predictions scores of the predictor the performance of which shall be improved. Results We performed three sets of experiments with synthetic scores assigned to the surface residues of proteins taken from the data set PlaneDimers compiled by Zellner et al. (2011), from the list published by Keskin et al. (2004) and from the very recent data set due to Cukuroglu et al. (2014). That way we demonstrated that our pCRF-based enhancer is effective given the interface residue score distribution and the non-interface residue score are unimodal. Moreover, the pCRF-based enhancer is also successfully applicable, if the distributions are only unimodal over a certain sub-domain. The improvement is then restricted to that domain. Thus we were able to improve the prediction of the PresCont server devised by Zellner et al. (2011) on PlaneDimers. Conclusions Our results strongly suggest that pCRFs form a methodological framework to improve residue-wise score-based protein-protein interface predictors given the scores are appropriately distributed. A prototypical implementation of our method is accessible at http://ppicrf.informatik.uni-goettingen.de/index.html.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Stephan Waack
- Institute of Computer Science, University of Göttingen, Goldschmidtstr, 7, 37077 Göttingen, Germany.
| |
Collapse
|
109
|
Andreani J, Guerois R. Evolution of protein interactions: From interactomes to interfaces. Arch Biochem Biophys 2014; 554:65-75. [DOI: 10.1016/j.abb.2014.05.010] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2014] [Revised: 04/28/2014] [Accepted: 05/12/2014] [Indexed: 12/16/2022]
|
110
|
Esmaielbeiki R, Nebel JC. Scoring docking conformations using predicted protein interfaces. BMC Bioinformatics 2014; 15:171. [PMID: 24906633 PMCID: PMC4057934 DOI: 10.1186/1471-2105-15-171] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2012] [Accepted: 05/29/2014] [Indexed: 12/22/2022] Open
Abstract
Background Since proteins function by interacting with other molecules, analysis of protein-protein interactions is essential for comprehending biological processes. Whereas understanding of atomic interactions within a complex is especially useful for drug design, limitations of experimental techniques have restricted their practical use. Despite progress in docking predictions, there is still room for improvement. In this study, we contribute to this topic by proposing T-PioDock, a framework for detection of a native-like docked complex 3D structure. T-PioDock supports the identification of near-native conformations from 3D models that docking software produced by scoring those models using binding interfaces predicted by the interface predictor, Template based Protein Interface Prediction (T-PIP). Results First, exhaustive evaluation of interface predictors demonstrates that T-PIP, whose predictions are customised to target complexity, is a state-of-the-art method. Second, comparative study between T-PioDock and other state-of-the-art scoring methods establishes T-PioDock as the best performing approach. Moreover, there is good correlation between T-PioDock performance and quality of docking models, which suggests that progress in docking will lead to even better results at recognising near-native conformations. Conclusion Accurate identification of near-native conformations remains a challenging task. Although availability of 3D complexes will benefit from template-based methods such as T-PioDock, we have identified specific limitations which need to be addressed. First, docking software are still not able to produce native like models for every target. Second, current interface predictors do not explicitly consider pairwise residue interactions between proteins and their interacting partners which leaves ambiguity when assessing quality of complex conformations.
Collapse
Affiliation(s)
- Reyhaneh Esmaielbeiki
- Department of Statistics, University of Oxford, 1 South Parks Road, Oxford OX1 3TG, UK.
| | | |
Collapse
|
111
|
Villoutreix BO, Kuenemann MA, Poyet JL, Bruzzoni-Giovanelli H, Labbé C, Lagorce D, Sperandio O, Miteva MA. Drug-Like Protein-Protein Interaction Modulators: Challenges and Opportunities for Drug Discovery and Chemical Biology. Mol Inform 2014; 33:414-437. [PMID: 25254076 PMCID: PMC4160817 DOI: 10.1002/minf.201400040] [Citation(s) in RCA: 84] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2014] [Accepted: 04/21/2014] [Indexed: 12/13/2022]
Abstract
[Formula: see text] Fundamental processes in living cells are largely controlled by macromolecular interactions and among them, protein-protein interactions (PPIs) have a critical role while their dysregulations can contribute to the pathogenesis of numerous diseases. Although PPIs were considered as attractive pharmaceutical targets already some years ago, they have been thus far largely unexploited for therapeutic interventions with low molecular weight compounds. Several limiting factors, from technological hurdles to conceptual barriers, are known, which, taken together, explain why research in this area has been relatively slow. However, this last decade, the scientific community has challenged the dogma and became more enthusiastic about the modulation of PPIs with small drug-like molecules. In fact, several success stories were reported both, at the preclinical and clinical stages. In this review article, written for the 2014 International Summer School in Chemoinformatics (Strasbourg, France), we discuss in silico tools (essentially post 2012) and databases that can assist the design of low molecular weight PPI modulators (these tools can be found at www.vls3d.com). We first introduce the field of protein-protein interaction research, discuss key challenges and comment recently reported in silico packages, protocols and databases dedicated to PPIs. Then, we illustrate how in silico methods can be used and combined with experimental work to identify PPI modulators.
Collapse
Affiliation(s)
- Bruno O Villoutreix
- Université Paris Diderot, Sorbonne Paris Cité, UMRS 973 InsermParis 75013, France
- Inserm, U973Paris 75013, France
- CDithem, Faculté de Pharmacie, 1 rue du Prof Laguesse59000 Lille, France
| | - Melaine A Kuenemann
- Université Paris Diderot, Sorbonne Paris Cité, UMRS 973 InsermParis 75013, France
- Inserm, U973Paris 75013, France
| | - Jean-Luc Poyet
- Université Paris Diderot, Sorbonne Paris Cité, UMRS 973 InsermParis 75013, France
- Inserm, U973Paris 75013, France
- IUH, Hôpital Saint-LouisParis, France
- CDithem, Faculté de Pharmacie, 1 rue du Prof Laguesse59000 Lille, France
| | - Heriberto Bruzzoni-Giovanelli
- Université Paris Diderot, Sorbonne Paris Cité, UMRS 973 InsermParis 75013, France
- Inserm, U973Paris 75013, France
- CIC, Clinical investigation center, Hôpital Saint-LouisParis, France
| | - Céline Labbé
- Université Paris Diderot, Sorbonne Paris Cité, UMRS 973 InsermParis 75013, France
- Inserm, U973Paris 75013, France
| | - David Lagorce
- Université Paris Diderot, Sorbonne Paris Cité, UMRS 973 InsermParis 75013, France
- Inserm, U973Paris 75013, France
| | - Olivier Sperandio
- Université Paris Diderot, Sorbonne Paris Cité, UMRS 973 InsermParis 75013, France
- Inserm, U973Paris 75013, France
- CDithem, Faculté de Pharmacie, 1 rue du Prof Laguesse59000 Lille, France
| | - Maria A Miteva
- Université Paris Diderot, Sorbonne Paris Cité, UMRS 973 InsermParis 75013, France
- Inserm, U973Paris 75013, France
| |
Collapse
|
112
|
Lua RC, Marciano DC, Katsonis P, Adikesavan AK, Wilkins AD, Lichtarge O. Prediction and redesign of protein-protein interactions. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2014; 116:194-202. [PMID: 24878423 DOI: 10.1016/j.pbiomolbio.2014.05.004] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/25/2014] [Revised: 05/02/2014] [Accepted: 05/17/2014] [Indexed: 12/14/2022]
Abstract
Understanding the molecular basis of protein function remains a central goal of biology, with the hope to elucidate the role of human genes in health and in disease, and to rationally design therapies through targeted molecular perturbations. We review here some of the computational techniques and resources available for characterizing a critical aspect of protein function - those mediated by protein-protein interactions (PPI). We describe several applications and recent successes of the Evolutionary Trace (ET) in identifying molecular events and shapes that underlie protein function and specificity in both eukaryotes and prokaryotes. ET is a part of analytical approaches based on the successes and failures of evolution that enable the rational control of PPI.
Collapse
Affiliation(s)
- Rhonald C Lua
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - David C Marciano
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Anbu K Adikesavan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Angela D Wilkins
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, TX 77030, USA; Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, TX 77030, USA.
| |
Collapse
|
113
|
Yachdav G, Kloppmann E, Kajan L, Hecht M, Goldberg T, Hamp T, Hönigschmid P, Schafferhans A, Roos M, Bernhofer M, Richter L, Ashkenazy H, Punta M, Schlessinger A, Bromberg Y, Schneider R, Vriend G, Sander C, Ben-Tal N, Rost B. PredictProtein--an open resource for online prediction of protein structural and functional features. Nucleic Acids Res 2014; 42:W337-43. [PMID: 24799431 PMCID: PMC4086098 DOI: 10.1093/nar/gku366] [Citation(s) in RCA: 443] [Impact Index Per Article: 40.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
PredictProtein is a meta-service for sequence analysis that has been predicting
structural and functional features of proteins since 1992. Queried with a
protein sequence it returns: multiple sequence alignments, predicted aspects of
structure (secondary structure, solvent accessibility, transmembrane helices
(TMSEG) and strands, coiled-coil regions, disulfide bonds and disordered
regions) and function. The service incorporates analysis methods for the
identification of functional regions (ConSurf), homology-based inference of Gene
Ontology terms (metastudent), comprehensive subcellular localization prediction
(LocTree3), protein–protein binding sites (ISIS2),
protein–polynucleotide binding sites (SomeNA) and predictions of the
effect of point mutations (non-synonymous SNPs) on protein function (SNAP2). Our
goal has always been to develop a system optimized to meet the demands of
experimentalists not highly experienced in bioinformatics. To this end, the
PredictProtein results are presented as both text and a series of intuitive,
interactive and visually appealing figures. The web server and sources are
available at http://ppopen.rostlab.org.
Collapse
Affiliation(s)
- Guy Yachdav
- Department of Informatics, Bioinformatics & Computational Biology i12, TUM (Technische Universität München), Garching/Munich 85748, Germany Biosof LLC, New York, NY 10001, USA TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), TUM (Technische Universität München), Garching/Munich 85748, Germany
| | - Edda Kloppmann
- Department of Informatics, Bioinformatics & Computational Biology i12, TUM (Technische Universität München), Garching/Munich 85748, Germany New York Consortium on Membrane Protein Structure (NYCOMPS), Columbia University, New York, NY 10032, USA
| | - Laszlo Kajan
- Department of Informatics, Bioinformatics & Computational Biology i12, TUM (Technische Universität München), Garching/Munich 85748, Germany
| | - Maximilian Hecht
- Department of Informatics, Bioinformatics & Computational Biology i12, TUM (Technische Universität München), Garching/Munich 85748, Germany TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), TUM (Technische Universität München), Garching/Munich 85748, Germany
| | - Tatyana Goldberg
- Department of Informatics, Bioinformatics & Computational Biology i12, TUM (Technische Universität München), Garching/Munich 85748, Germany TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), TUM (Technische Universität München), Garching/Munich 85748, Germany
| | - Tobias Hamp
- Department of Informatics, Bioinformatics & Computational Biology i12, TUM (Technische Universität München), Garching/Munich 85748, Germany
| | - Peter Hönigschmid
- Department of Genome Oriented Bioinformatics, Technische Universität München, Wissenschaftszentrum Weihenstephan, Freising 85354, Germany
| | - Andrea Schafferhans
- Department of Informatics, Bioinformatics & Computational Biology i12, TUM (Technische Universität München), Garching/Munich 85748, Germany
| | - Manfred Roos
- Department of Informatics, Bioinformatics & Computational Biology i12, TUM (Technische Universität München), Garching/Munich 85748, Germany
| | - Michael Bernhofer
- Department of Informatics, Bioinformatics & Computational Biology i12, TUM (Technische Universität München), Garching/Munich 85748, Germany
| | - Lothar Richter
- Department of Informatics, Bioinformatics & Computational Biology i12, TUM (Technische Universität München), Garching/Munich 85748, Germany
| | - Haim Ashkenazy
- The Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, 69978 Tel Aviv, Israel
| | - Marco Punta
- Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, UK Institute for Food and Plant Sciences WZW-Weihenstephan, Alte Akademie 8, Freising 85350, Germany
| | - Avner Schlessinger
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Yana Bromberg
- Biosof LLC, New York, NY 10001, USA Department of Pharmacology and Systems Therapeutics, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA
| | - Reinhard Schneider
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ 08901, USA
| | - Gerrit Vriend
- Luxembourg University & Luxembourg Centre for Systems Biomedicine, 4362 Belval, Luxembourg
| | - Chris Sander
- CMBI, NCMLS, Radboudumc Nijmegen Medical Centre, 6525 GA Nijmegen, The Netherlands
| | - Nir Ben-Tal
- Computational Biology Program, Memorial Sloan Kettering Cancer Center, New York, 10065 NY, USA
| | - Burkhard Rost
- Department of Informatics, Bioinformatics & Computational Biology i12, TUM (Technische Universität München), Garching/Munich 85748, Germany Biosof LLC, New York, NY 10001, USA New York Consortium on Membrane Protein Structure (NYCOMPS), Columbia University, New York, NY 10032, USA The Department of Biochemistry and Molecular Biology, George S. Wise Faculty of Life Sciences, Tel Aviv University, 69978 Tel Aviv, Israel Department of Biochemistry and Molecular Biophysics & New York Consortium on Membrane Protein Structure (NYCOMPS), Columbia University, New York, NY 10032, USA Institute for Advanced Study (TUM-IAS), Garching/Munich 85748, Germany
| |
Collapse
|
114
|
Dhole K, Singh G, Pai PP, Mondal S. Sequence-based prediction of protein–protein interaction sites with L1-logreg classifier. J Theor Biol 2014; 348:47-54. [DOI: 10.1016/j.jtbi.2014.01.028] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2013] [Revised: 01/10/2014] [Accepted: 01/22/2014] [Indexed: 11/30/2022]
|
115
|
Wang B, Huang DS, Jiang C. A new strategy for protein interface identification using manifold learning method. IEEE Trans Nanobioscience 2014; 13:118-23. [PMID: 24771594 DOI: 10.1109/tnb.2014.2316997] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Protein interactions play vital roles in biological processes. The study for protein interface will allow people to elucidate the mechanism of protein interaction. However, a large portion of protein interface data is incorrectly collected in current studies. In this paper, a novel strategy of dataset reconstruction using manifold learning method has been proposed for dealing with the noises in the interaction interface data whose definition is based on the residue distances among the different chains within protein complexes. Three support vector machine-based predictors are constructed using different protein features to identify the functional sites involved in the formation of protein interface. The experimental results achieved in this work demonstrate that our strategy can remove noises, and therefore improve the ability for identification of protein interfaces with 77.8% accuracy.
Collapse
|
116
|
Yugandhar K, Gromiha MM. Feature selection and classification of protein-protein complexes based on their binding affinities using machine learning approaches. Proteins 2014; 82:2088-96. [PMID: 24648146 DOI: 10.1002/prot.24564] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2014] [Accepted: 03/14/2014] [Indexed: 12/16/2022]
Abstract
Protein-protein interactions are intrinsic to virtually every cellular process. Predicting the binding affinity of protein-protein complexes is one of the challenging problems in computational and molecular biology. In this work, we related sequence features of protein-protein complexes with their binding affinities using machine learning approaches. We set up a database of 185 protein-protein complexes for which the interacting pairs are heterodimers and their experimental binding affinities are available. On the other hand, we have developed a set of 610 features from the sequences of protein complexes and utilized Ranker search method, which is the combination of Attribute evaluator and Ranker method for selecting specific features. We have analyzed several machine learning algorithms to discriminate protein-protein complexes into high and low affinity groups based on their Kd values. Our results showed a 10-fold cross-validation accuracy of 76.1% with the combination of nine features using support vector machines. Further, we observed accuracy of 83.3% on an independent test set of 30 complexes. We suggest that our method would serve as an effective tool for identifying the interacting partners in protein-protein interaction networks and human-pathogen interactions based on the strength of interactions.
Collapse
Affiliation(s)
- K Yugandhar
- Department of Biotechnology, Indian Institute of Technology Madras, Chennai, 600036, Tamil Nadu, India
| | | |
Collapse
|
117
|
Falero A, Marrero K, Trigueros S, Fando R. Characterization of the RstB2 protein, the DNA-binding protein of CTXϕ phage from Vibrio cholerae. Virus Genes 2014; 48:518-27. [PMID: 24643345 DOI: 10.1007/s11262-014-1053-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2013] [Accepted: 02/28/2014] [Indexed: 11/25/2022]
Abstract
The low abundant protein RstB2, encoded in the RS2 region of CTXϕ, is essential for prophage formation. However, the only biochemical activity so far described is the single/double-stranded DNA-binding capacity of that protein. In this paper, a recombinant RstB2 (rRstB2) protein was overexpressed in E. coli with a yield of 58.4 mg l(-1) in shaken cultures, LB broth. The protein, purified to homogeneity, showed an identity with rRstB2 by peptide mass fingerprinting. The apparent molecular weight of the RstB2 native protein suggests that occurs mostly as a monomer in solution. The monomers were able of reacting immediately upon exposure to DNA molecules. After a year of storage at -20 °C, the protein remains biologically active. Bioinformatics analysis of the amino acid sequence of RstB2 predicts the C-end of this protein to be disordered and highly flexible, like in many other single-stranded DNA-binding proteins. When compared with the gVp of M13, conserved amino acids are found at structurally or functionally important relative positions. These results pave the way for additional studies of structure and molecular function of RstB2 for the biology of CTXϕ.
Collapse
Affiliation(s)
- Alina Falero
- National Center for Scientific Research, Ave 25 and 158, Cubanacán, Playa, PO Box 6214, Havana, Cuba,
| | | | | | | |
Collapse
|
118
|
Feiglin A, Ashkenazi S, Schlessinger A, Rost B, Ofran Y. Co-expression and co-localization of hub proteins and their partners are encoded in protein sequence. MOLECULAR BIOSYSTEMS 2014; 10:787-94. [PMID: 24457447 DOI: 10.1039/c3mb70411d] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Spatiotemporal coordination is a critical factor in biological processes. Some hubs in protein-protein interaction networks tend to be co-expressed and co-localized with their partners more strongly than others, a difference which is arguably related to functional differences between the hubs. Based on numerous analyses of yeast hubs, it has been suggested that differences in co-expression and co-localization are reflected in the structural and molecular characteristics of the hubs. We hypothesized that if indeed differences in co-expression and co-localization are encoded in the molecular characteristics of the protein, it may be possible to predict the tendency for co-expression and co-localization of human hubs based on features learned from systematically characterized yeast hubs. Thus, we trained a prediction algorithm on hubs from yeast that were classified as either strongly or weakly co-expressed and co-localized with their partners, and applied the trained model to 800 human hub proteins. We found that the algorithm significantly distinguishes between human hubs that are co-expressed and co-localized with their partners and hubs that are not. The prediction is based on sequence derived features such as "stickiness", i.e. the existence of multiple putative binding sites that enable multiple simultaneous interactions, "plasticity", i.e. the existence of predicted structural disorder which conjecturally allows for multiple consecutive interactions with the same binding site and predicted subcellular localization. These results suggest that spatiotemporal dynamics is encoded, at least in part, in the amino acid sequence of the protein and that this encoding is similar in yeast and in human.
Collapse
Affiliation(s)
- Ariel Feiglin
- The Goodman faculty of life sciences, Bar Ilan University, Ramat Gan 52900, Israel.
| | | | | | | | | |
Collapse
|
119
|
Shen HB, Yi DL, Yao LX, Yang J, Chou KC. Knowledge-based computational intelligence development for predicting protein secondary structures from sequences. Expert Rev Proteomics 2014; 5:653-62. [DOI: 10.1586/14789450.5.5.653] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
120
|
CompASM: an Amber-VMD alanine scanning mutagenesis plug-in. MARCO ANTONIO CHAER NASCIMENTO 2014. [DOI: 10.1007/978-3-642-41163-2_8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
|
121
|
Bhaskara RM, Padhi A, Srinivasan N. Accurate prediction of interfacial residues in two-domain proteins using evolutionary information: implications for three-dimensional modeling. Proteins 2013; 82:1219-34. [PMID: 24375512 DOI: 10.1002/prot.24486] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2013] [Revised: 11/04/2013] [Accepted: 11/19/2013] [Indexed: 01/08/2023]
Abstract
With the preponderance of multidomain proteins in eukaryotic genomes, it is essential to recognize the constituent domains and their functions. Often function involves communications across the domain interfaces, and the knowledge of the interacting sites is essential to our understanding of the structure-function relationship. Using evolutionary information extracted from homologous domains in at least two diverse domain architectures (single and multidomain), we predict the interface residues corresponding to domains from the two-domain proteins. We also use information from the three-dimensional structures of individual domains of two-domain proteins to train naïve Bayes classifier model to predict the interfacial residues. Our predictions are highly accurate (∼85%) and specific (∼95%) to the domain-domain interfaces. This method is specific to multidomain proteins which contain domains in at least more than one protein architectural context. Using predicted residues to constrain domain-domain interaction, rigid-body docking was able to provide us with accurate full-length protein structures with correct orientation of domains. We believe that these results can be of considerable interest toward rational protein and interaction design, apart from providing us with valuable information on the nature of interactions.
Collapse
|
122
|
Agrawal NJ, Helk B, Trout BL. A computational tool to predict the evolutionarily conserved protein-protein interaction hot-spot residues from the structure of the unbound protein. FEBS Lett 2013; 588:326-33. [PMID: 24239538 DOI: 10.1016/j.febslet.2013.11.004] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2013] [Revised: 11/01/2013] [Accepted: 11/04/2013] [Indexed: 11/29/2022]
Abstract
Identifying hot-spot residues - residues that are critical to protein-protein binding - can help to elucidate a protein's function and assist in designing therapeutic molecules to target those residues. We present a novel computational tool, termed spatial-interaction-map (SIM), to predict the hot-spot residues of an evolutionarily conserved protein-protein interaction from the structure of an unbound protein alone. SIM can predict the protein hot-spot residues with an accuracy of 36-57%. Thus, the SIM tool can be used to predict the yet unknown hot-spot residues for many proteins for which the structure of the protein-protein complexes are not available, thereby providing a clue to their functions and an opportunity to design therapeutic molecules to target these proteins.
Collapse
Affiliation(s)
- Neeraj J Agrawal
- Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, E19-502b, Cambridge, MA 02139, USA
| | | | - Bernhardt L Trout
- Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, E19-502b, Cambridge, MA 02139, USA.
| |
Collapse
|
123
|
An accurate method for prediction of protein-ligand binding site on protein surface using SVM and statistical depth function. BIOMED RESEARCH INTERNATIONAL 2013; 2013:409658. [PMID: 24195070 PMCID: PMC3806129 DOI: 10.1155/2013/409658] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/17/2013] [Revised: 08/15/2013] [Accepted: 08/29/2013] [Indexed: 11/17/2022]
Abstract
Since proteins carry out their functions through interactions with other molecules, accurately identifying the protein-ligand binding site plays an important role in protein functional annotation and rational drug discovery. In the past two decades, a lot of algorithms were present to predict the protein-ligand binding site. In this paper, we introduce statistical depth function to define negative samples and propose an SVM-based method which integrates sequence and structural information to predict binding site. The results show that the present method performs better than the existent ones. The accuracy, sensitivity, and specificity on training set are 77.55%, 56.15%, and 87.96%, respectively; on the independent test set, the accuracy, sensitivity, and specificity are 80.36%, 53.53%, and 92.38%, respectively.
Collapse
|
124
|
Ozbek P, Soner S, Haliloglu T. Hot spots in a network of functional sites. PLoS One 2013; 8:e74320. [PMID: 24023934 PMCID: PMC3759471 DOI: 10.1371/journal.pone.0074320] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2012] [Accepted: 08/02/2013] [Indexed: 12/05/2022] Open
Abstract
It is of significant interest to understand how proteins interact, which holds the key phenomenon in biological functions. Using dynamic fluctuations in high frequency modes, we show that the Gaussian Network Model (GNM) predicts hot spot residues with success rates ranging between S 8–58%, C 84–95%, P 5–19% and A 81–92% on unbound structures and S 8–51%, C 97–99%, P 14–50%, A 94–97% on complex structures for sensitivity, specificity, precision and accuracy, respectively. High specificity and accuracy rates with a single property on unbound protein structures suggest that hot spots are predefined in the dynamics of unbound structures and forming the binding core of interfaces, whereas the prediction of other functional residues with similar dynamic behavior explains the lower precision values. The latter is demonstrated with the case studies; ubiquitin, hen egg-white lysozyme and M2 proton channel. The dynamic fluctuations suggest a pseudo network of residues with high frequency fluctuations, which could be plausible for the mechanism of biological interactions and allosteric regulation.
Collapse
Affiliation(s)
- Pemra Ozbek
- Department of Bioengineering, Marmara University, Goztepe, Istanbul, Turkey
| | - Seren Soner
- Department of Chemical Engineering and Polymer Research Center, Bogazici University, Bebek, Turkey
| | - Turkan Haliloglu
- Department of Chemical Engineering and Polymer Research Center, Bogazici University, Bebek, Turkey
- * E-mail:
| |
Collapse
|
125
|
Cloud prediction of protein structure and function with PredictProtein for Debian. BIOMED RESEARCH INTERNATIONAL 2013; 2013:398968. [PMID: 23971032 PMCID: PMC3732596 DOI: 10.1155/2013/398968] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/04/2013] [Accepted: 07/05/2013] [Indexed: 11/18/2022]
Abstract
We report the release of PredictProtein for the Debian operating system and derivatives, such as Ubuntu, Bio-Linux, and Cloud BioLinux. The PredictProtein suite is available as a standard set of open source Debian packages. The release covers the most popular prediction methods from the Rost Lab, including methods for the prediction of secondary structure and solvent accessibility (profphd), nuclear localization signals (predictnls), and intrinsically disordered regions (norsnet). We also present two case studies that successfully utilize PredictProtein packages for high performance computing in the cloud: the first analyzes protein disorder for whole organisms, and the second analyzes the effect of all possible single sequence variants in protein coding regions of the human genome.
Collapse
|
126
|
Wang L, Hou Y, Quan H, Xu W, Bao Y, Li Y, Fu Y, Zou S. A compound-based computational approach for the accurate determination of hot spots. Protein Sci 2013; 22:1060-70. [PMID: 23776011 DOI: 10.1002/pro.2296] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2013] [Revised: 05/27/2013] [Accepted: 06/01/2013] [Indexed: 12/21/2022]
Abstract
A plethora of both experimental and computational methods have been proposed in the past 20 years for the identification of hot spots at a protein-protein interface. The experimental determination of a protein-protein complex followed by alanine scanning mutagenesis, though able to determine hot spots with much precision, is expensive and has no guarantee of success while the accuracy of the current computational methods for hot-spot identification remains low. Here, we present a novel structure-based computational approach that accurately determines hot spots through docking into a set of proteins homologous to only one of the two interacting partners of a compound capable of disrupting the protein-protein interaction (PPI). This approach has been applied to identify the hot spots of human activin receptor type II (ActRII) critical for its binding toward Cripto-I. The subsequent experimental confirmation of the computationally identified hot spots portends a potentially accurate method for hot-spot determination in silico given a compound capable of disrupting the PPI in question. The hot spots of human ActRII first reported here may well become the focal points for the design of small molecule drugs that target the PPI. The determination of their interface may have significant biological implications in that it suggests that Cripto-I plays an important role in both activin and nodal signal pathways.
Collapse
Affiliation(s)
- Lincong Wang
- The College of Computer Science and Technology, Jilin University, Changchun, Jilin, China.
| | | | | | | | | | | | | | | |
Collapse
|
127
|
Andreani J, Faure G, Guerois R. InterEvScore: a novel coarse-grained interface scoring function using a multi-body statistical potential coupled to evolution. ACTA ACUST UNITED AC 2013; 29:1742-9. [PMID: 23652426 DOI: 10.1093/bioinformatics/btt260] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
MOTIVATION Structural prediction of protein interactions currently remains a challenging but fundamental goal. In particular, progress in scoring functions is critical for the efficient discrimination of near-native interfaces among large sets of decoys. Many functions have been developed using knowledge-based potentials, but few make use of multi-body interactions or evolutionary information, although multi-residue interactions are crucial for protein-protein binding and protein interfaces undergo significant selection pressure to maintain their interactions. RESULTS This article presents InterEvScore, a novel scoring function using a coarse-grained statistical potential including two- and three-body interactions, which provides each residue with the opportunity to contribute in its most favorable local structural environment. Combination of this potential with evolutionary information considerably improves scoring results on the 54 test cases from the widely used protein docking benchmark for which evolutionary information can be collected. We analyze how our way to include evolutionary information gradually increases the discriminative power of InterEvScore. Comparison with several previously published scoring functions (ZDOCK, ZRANK and SPIDER) shows the significant progress brought by InterEvScore. AVAILABILITY http://biodev.cea.fr/interevol/interevscore CONTACT guerois@cea.fr SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jessica Andreani
- CEA, iBiTecS, Service de Bioenergetique Biologie Structurale et Mecanismes SB2SM, Laboratoire de Biologie Structurale et Radiobiologie LBSR, F-91191 Gif sur Yvette, France
| | | | | |
Collapse
|
128
|
Andorf CM, Honavar V, Sen TZ. Predicting the binding patterns of hub proteins: a study using yeast protein interaction networks. PLoS One 2013; 8:e56833. [PMID: 23431393 PMCID: PMC3576370 DOI: 10.1371/journal.pone.0056833] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2012] [Accepted: 01/16/2013] [Indexed: 02/01/2023] Open
Abstract
Background Protein-protein interactions are critical to elucidating the role played by individual proteins in important biological pathways. Of particular interest are hub proteins that can interact with large numbers of partners and often play essential roles in cellular control. Depending on the number of binding sites, protein hubs can be classified at a structural level as singlish-interface hubs (SIH) with one or two binding sites, or multiple-interface hubs (MIH) with three or more binding sites. In terms of kinetics, hub proteins can be classified as date hubs (i.e., interact with different partners at different times or locations) or party hubs (i.e., simultaneously interact with multiple partners). Methodology Our approach works in 3 phases: Phase I classifies if a protein is likely to bind with another protein. Phase II determines if a protein-binding (PB) protein is a hub. Phase III classifies PB proteins as singlish-interface versus multiple-interface hubs and date versus party hubs. At each stage, we use sequence-based predictors trained using several standard machine learning techniques. Conclusions Our method is able to predict whether a protein is a protein-binding protein with an accuracy of 94% and a correlation coefficient of 0.87; identify hubs from non-hubs with 100% accuracy for 30% of the data; distinguish date hubs/party hubs with 69% accuracy and area under ROC curve of 0.68; and SIH/MIH with 89% accuracy and area under ROC curve of 0.84. Because our method is based on sequence information alone, it can be used even in settings where reliable protein-protein interaction data or structures of protein-protein complexes are unavailable to obtain useful insights into the functional and evolutionary characteristics of proteins and their interactions. Availability We provide a web server for our three-phase approach: http://hybsvm.gdcb.iastate.edu.
Collapse
Affiliation(s)
- Carson M. Andorf
- Department of Computer Science, Iowa State University, Ames, Iowa, United States of America
| | - Vasant Honavar
- Department of Computer Science, Iowa State University, Ames, Iowa, United States of America
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, Iowa, United States of America
| | - Taner Z. Sen
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, Iowa, United States of America
- United States Department of Agriculture-Agriculture Research Service Corn Insects and Crop Genetics Research Unit, Ames, Iowa, United States of America
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, Iowa, United States of America
- * E-mail:
| |
Collapse
|
129
|
Ribeiro JV, Cerqueira NMFSA, Moreira IS, Fernandes PA, Ramos MJ. CompASM: an Amber-VMD alanine scanning mutagenesis plug-in. Theor Chem Acc 2012. [DOI: 10.1007/s00214-012-1271-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
130
|
Li BQ, Feng KY, Chen L, Huang T, Cai YD. Prediction of protein-protein interaction sites by random forest algorithm with mRMR and IFS. PLoS One 2012; 7:e43927. [PMID: 22937126 PMCID: PMC3429425 DOI: 10.1371/journal.pone.0043927] [Citation(s) in RCA: 73] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2012] [Accepted: 07/26/2012] [Indexed: 11/19/2022] Open
Abstract
Prediction of protein-protein interaction (PPI) sites is one of the most challenging problems in computational biology. Although great progress has been made by employing various machine learning approaches with numerous characteristic features, the problem is still far from being solved. In this study, we developed a novel predictor based on Random Forest (RF) algorithm with the Minimum Redundancy Maximal Relevance (mRMR) method followed by incremental feature selection (IFS). We incorporated features of physicochemical/biochemical properties, sequence conservation, residual disorder, secondary structure and solvent accessibility. We also included five 3D structural features to predict protein-protein interaction sites and achieved an overall accuracy of 0.672997 and MCC of 0.347977. Feature analysis showed that 3D structural features such as Depth Index (DPX) and surface curvature (SC) contributed most to the prediction of protein-protein interaction sites. It was also shown via site-specific feature analysis that the features of individual residues from PPI sites contribute most to the determination of protein-protein interaction sites. It is anticipated that our prediction method will become a useful tool for identifying PPI sites, and that the feature analysis described in this paper will provide useful insights into the mechanisms of interaction.
Collapse
Affiliation(s)
- Bi-Qing Li
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, People’s Republic of China
- Shanghai Center for Bioinformation Technology, Shanghai, People’s Republic of China
| | - Kai-Yan Feng
- Beijing Genomics Institute, Shenzhen, People’s Republic of China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, People’s Republic of China
| | - Tao Huang
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, People’s Republic of China
- Shanghai Center for Bioinformation Technology, Shanghai, People’s Republic of China
- Department of Genetics and Genomic Sciences, Mount Sinai School of Medicine, New York City, New York, United States of America
| | - Yu-Dong Cai
- Institute of Systems Biology, Shanghai University, Shanghai, People’s Republic of China
- * E-mail:
| |
Collapse
|
131
|
Hamp T, Rost B. Alternative protein-protein interfaces are frequent exceptions. PLoS Comput Biol 2012; 8:e1002623. [PMID: 22876170 PMCID: PMC3410849 DOI: 10.1371/journal.pcbi.1002623] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2011] [Accepted: 06/11/2012] [Indexed: 11/18/2022] Open
Abstract
The intricate molecular details of protein-protein interactions (PPIs) are crucial for function. Therefore, measuring the same interacting protein pair again, we expect the same result. This work measured the similarity in the molecular details of interaction for the same and for homologous protein pairs between different experiments. All scores analyzed suggested that different experiments often find exceptions in the interfaces of similar PPIs: up to 22% of all comparisons revealed some differences even for sequence-identical pairs of proteins. The corresponding number for pairs of close homologs reached 68%. Conversely, the interfaces differed entirely for 12-29% of all comparisons. All these estimates were calculated after redundancy reduction. The magnitude of interface differences ranged from subtle to the extreme, as illustrated by a few examples. An extreme case was a change of the interacting domains between two observations of the same biological interaction. One reason for different interfaces was the number of copies of an interaction in the same complex: the probability of observing alternative binding modes increases with the number of copies. Even after removing the special cases with alternative hetero-interfaces to the same homomer, a substantial variability remained. Our results strongly support the surprising notion that there are many alternative solutions to make the intricate molecular details of PPIs crucial for function.
Collapse
Affiliation(s)
- Tobias Hamp
- TUM, Bioinformatik - I12, Informatik, Garching, Germany
| | - Burkhard Rost
- TUM, Bioinformatik - I12, Informatik, Garching, Germany
- Institute of Advanced Study (IAS), TUM, Garching, Germany
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America
- * E-mail:
| |
Collapse
|
132
|
Chen P, Wong L, Li J. Detection of outlier residues for improving interface prediction in protein heterocomplexes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:1155-1165. [PMID: 22529331 DOI: 10.1109/tcbb.2012.58] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Sequence-based understanding and identification of protein binding interfaces is a challenging research topic due to the complexity in protein systems and the imbalanced distribution between interface and noninterface residues. This paper presents an outlier detection idea to address the redundancy problem in protein interaction data. The cleaned training data are then used for improving the prediction performance. We use three novel measures to describe the extent a residue is considered as an outlier in comparison to the other residues: the distance of a residue instance from the center instance of all residue instances of the same class label (Dist), the probability of the class label of the residue instance (PCL), and the importance of within-class and between-class (IWB) residue instances. Outlier scores are computed by integrating the three factors; instances with a sufficiently large score are treated as outliers and removed. The data sets without outliers are taken as input for a support vector machine (SVM) ensemble. The proposed SVM ensemble trained on input data without outliers performs better than that with outliers. Our method is also more accurate than many literature methods on benchmark data sets. From our empirical studies, we found that some outlier interface residues are truly near to noninterface regions, and some outlier noninterface residues are close to interface regions.
Collapse
Affiliation(s)
- Peng Chen
- Institute of Intelligent Machines, Chinese Academy of Sciences, PO Box 1130, Hefei 230031, China.
| | | | | |
Collapse
|
133
|
Schaefer C, Bromberg Y, Achten D, Rost B. Disease-related mutations predicted to impact protein function. BMC Genomics 2012; 13 Suppl 4:S11. [PMID: 22759649 PMCID: PMC3394413 DOI: 10.1186/1471-2164-13-s4-s11] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Background Non-synonymous single nucleotide polymorphisms (nsSNPs) alter the protein sequence and can cause disease. The impact has been described by reliable experiments for relatively few mutations. Here, we study predictions for functional impact of disease-annotated mutations from OMIM, PMD and Swiss-Prot and of variants not linked to disease. Results Most disease-causing mutations were predicted to impact protein function. More surprisingly, the raw predictions scores for disease-causing mutations were higher than the scores for the function-altering data set originally used for developing the prediction method (here SNAP). We might expect that diseases are caused by change-of-function mutations. However, it is surprising how well prediction methods developed for different purposes identify this link. Conversely, our predictions suggest that the set of nsSNPs not currently linked to diseases contains very few strong disease associations to be discovered. Conclusions Firstly, annotations of disease-causing nsSNPs are on average so reliable that they can be used as proxies for functional impact. Secondly, disease-causing nsSNPs can be identified very well by methods that predict the impact of mutations on protein function. This implies that the existing prediction methods provide a very good means of choosing a set of suspect SNPs relevant for disease.
Collapse
Affiliation(s)
- Christian Schaefer
- Bioinformatics-i12, Informatics, Technical University Munich, Boltzmannstrasse 3, Garching/Munich, Germany.
| | | | | | | |
Collapse
|
134
|
Abstract
Background Amino acid point mutations (nsSNPs) may change protein structure and function. However, no method directly predicts the impact of mutations on structure. Here, we compare pairs of pentamers (five consecutive residues) that locally change protein three-dimensional structure (3D, RMSD>0.4Å) to those that do not alter structure (RMSD<0.2Å). Mutations that alter structure locally can be distinguished from those that do not through a machine-learning (logistic regression) method. Results The method achieved a rather high overall performance (AUC>0.79, two-state accuracy >72%). This discriminative power was particularly unexpected given the enormous structural variability of pentamers. Mutants for which our method predicted a change of structure were also enriched in terms of disrupting stability and function. Although distinguishing change and no change in structure, the new method overall failed to distinguish between mutants with and without effect on stability or function. Conclusions Local structural change can be predicted. Future work will have to establish how useful this new perspective on predicting the effect of nsSNPs will be in combination with other methods.
Collapse
Affiliation(s)
- Christian Schaefer
- TUM, Bioinformatics-I12, Informatik, Boltzmannstrasse 3, Garching, Germany.
| | | |
Collapse
|
135
|
Chen CT, Peng HP, Jian JW, Tsai KC, Chang JY, Yang EW, Chen JB, Ho SY, Hsu WL, Yang AS. Protein-protein interaction site predictions with three-dimensional probability distributions of interacting atoms on protein surfaces. PLoS One 2012; 7:e37706. [PMID: 22701576 PMCID: PMC3368894 DOI: 10.1371/journal.pone.0037706] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2011] [Accepted: 04/23/2012] [Indexed: 11/18/2022] Open
Abstract
Protein-protein interactions are key to many biological processes. Computational methodologies devised to predict protein-protein interaction (PPI) sites on protein surfaces are important tools in providing insights into the biological functions of proteins and in developing therapeutics targeting the protein-protein interaction sites. One of the general features of PPI sites is that the core regions from the two interacting protein surfaces are complementary to each other, similar to the interior of proteins in packing density and in the physicochemical nature of the amino acid composition. In this work, we simulated the physicochemical complementarities by constructing three-dimensional probability density maps of non-covalent interacting atoms on the protein surfaces. The interacting probabilities were derived from the interior of known structures. Machine learning algorithms were applied to learn the characteristic patterns of the probability density maps specific to the PPI sites. The trained predictors for PPI sites were cross-validated with the training cases (consisting of 432 proteins) and were tested on an independent dataset (consisting of 142 proteins). The residue-based Matthews correlation coefficient for the independent test set was 0.423; the accuracy, precision, sensitivity, specificity were 0.753, 0.519, 0.677, and 0.779 respectively. The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces. In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence. The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.
Collapse
Affiliation(s)
- Ching-Tai Chen
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
- Institute of Bioinformatics and Systems Biology, National Chiao-Tung University, Hsinchu, Taiwan
- Institute of Information Science, Academia Sinica, Taipei, Taiwan
- Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan
| | - Hung-Pin Peng
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan
- Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan
| | - Jhih-Wei Jian
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan
- Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan
| | | | - Jeng-Yih Chang
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Ei-Wen Yang
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
- Institute of Information Science, Academia Sinica, Taipei, Taiwan
| | - Jun-Bo Chen
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
- Department of Computer Science, National Tsing-Hua University, Hsinchu, Taiwan
| | - Shinn-Ying Ho
- Institute of Bioinformatics and Systems Biology, National Chiao-Tung University, Hsinchu, Taiwan
| | - Wen-Lian Hsu
- Institute of Information Science, Academia Sinica, Taipei, Taiwan
- * E-mail: (AY); (WH)
| | - An-Suei Yang
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
- * E-mail: (AY); (WH)
| |
Collapse
|
136
|
Feng ZP, Chandrashekaran IR, Low A, Speed TP, Nicholson SE, Norton RS. The N-terminal domains of SOCS proteins: a conserved region in the disordered N-termini of SOCS4 and 5. Proteins 2012; 80:946-57. [PMID: 22423360 DOI: 10.1002/prot.23252] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Suppressors of cytokine signaling (SOCS) proteins function as negative regulators of cytokine signaling and are involved in fine tuning the immune response. The structure and role of the SH2 domains and C-terminal SOCS box motifs of the SOCS proteins are well characterized, but the long N-terminal domains of SOCS4-7 remain poorly understood. Here, we present bioinformatic analyses of the N-terminal domains of the mammalian SOCS proteins, which indicate that these domains of SOCS4, 5, 6, and 7 are largely disordered. We have also identified a conserved region of about 70 residues in the N-terminal domains of SOCS4 and 5 that is predicted to be more ordered than the surrounding sequence. The conservation of this region can be traced as far back as lower vertebrates. As conserved regions with increased structural propensity that are located within long disordered regions often contain molecular recognition motifs, we expressed the N-terminal conserved region of mouse SOCS4 for further analysis. This region, mSOCS4₈₆₋₁₅₅, has been characterized by circular dichroism and nuclear magnetic resonance spectroscopy, both of which indicate that it is predominantly unstructured in aqueous solution, although it becomes helical in the presence of trifluoroethanol. The high degree of sequence conservation of this region across different species and between SOCS4 and SOCS5 nonetheless implies that it has an important functional role, and presumably this region adopts a more ordered conformation in complex with its partners. The recombinant protein will be a valuable tool in identifying these partners and defining the structures of these complexes.
Collapse
Affiliation(s)
- Zhi-Ping Feng
- The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Victoria 3052, Australia
| | | | | | | | | | | |
Collapse
|
137
|
Garcia-Garcia J, Bonet J, Guney E, Fornes O, Planas J, Oliva B. Networks of ProteinProtein Interactions: From Uncertainty to Molecular Details. Mol Inform 2012; 31:342-62. [PMID: 27477264 DOI: 10.1002/minf.201200005] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2012] [Accepted: 03/09/2012] [Indexed: 11/08/2022]
Abstract
Proteins are the bricks and mortar of cells. The work of proteins is structural and functional, as they are the principal element of the organization of the cell architecture, but they also play a relevant role in its metabolism and regulation. To perform all these functions, proteins need to interact with each other and with other bio-molecules, either to form complexes or to recognize precise targets of their action. For instance, a particular transcription factor may activate one gene or another depending on its interactions with other proteins and not only with DNA. Hence, the ability of a protein to interact with other bio-molecules, and the partners they have at each particular time and location can be crucial to characterize the role of a protein. Proteins rarely act alone; they rather constitute a mingled network of physical interactions or other types of relationships (such as metabolic and regulatory) or signaling cascades. In this context, understanding the function of a protein implies to recognize the members of its neighborhood and to grasp how they associate, both at the systemic and atomic level. The network of physical interactions between the proteins of a system, cell or organism, is defined as the interactome. The purpose of this review is to deepen the description of interactomes at different levels of detail: from the molecular structure of complexes to the global topology of the network of interactions. The approaches and techniques applied experimentally and computationally to attain each level are depicted. The limits of each technique and its integration into a model network, the challenges and actual problems of completeness of an interactome, and the reliability of the interactions are reviewed and summarized. Finally, the application of the current knowledge of protein-protein interactions on modern network medicine and protein function annotation is also explored.
Collapse
Affiliation(s)
- Javier Garcia-Garcia
- Structural Bioinformatics Group, GRIB-IMIM, Universitat Pompeu Fabra, Barcelona Research Park of Biomedicine (PRBB), Catalonia, Spain
| | - Jaume Bonet
- Structural Bioinformatics Group, GRIB-IMIM, Universitat Pompeu Fabra, Barcelona Research Park of Biomedicine (PRBB), Catalonia, Spain
| | - Emre Guney
- Structural Bioinformatics Group, GRIB-IMIM, Universitat Pompeu Fabra, Barcelona Research Park of Biomedicine (PRBB), Catalonia, Spain
| | - Oriol Fornes
- Structural Bioinformatics Group, GRIB-IMIM, Universitat Pompeu Fabra, Barcelona Research Park of Biomedicine (PRBB), Catalonia, Spain
| | - Joan Planas
- Structural Bioinformatics Group, GRIB-IMIM, Universitat Pompeu Fabra, Barcelona Research Park of Biomedicine (PRBB), Catalonia, Spain
| | - Baldo Oliva
- Structural Bioinformatics Group, GRIB-IMIM, Universitat Pompeu Fabra, Barcelona Research Park of Biomedicine (PRBB), Catalonia, Spain.
| |
Collapse
|
138
|
Dakshinamoorthy G, Samykutty AK, Munirathinam G, Shinde GB, Nutman T, Reddy MV, Kalyanasundaram R. Biochemical characterization and evaluation of a Brugia malayi small heat shock protein as a vaccine against lymphatic filariasis. PLoS One 2012; 7:e34077. [PMID: 22496777 PMCID: PMC3320633 DOI: 10.1371/journal.pone.0034077] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2011] [Accepted: 02/21/2012] [Indexed: 12/15/2022] Open
Abstract
Filarial nematodes enjoy one of the longest life spans of any human pathogen due to effective immune evasion strategies developed by the parasite. Among the various immune evasion strategies exhibited by the parasite, Interleukin 10 (IL-10) productions and IL-10 mediated immune suppression has significant negative impact on the host immune system. Recently, we identified a small heat shock protein expressed by Brugia malayi (BmHsp12.6) that can bind to soluble human IL-10 receptor alpha (IL-10R) and activate IL-10 mediated effects in cell lines. In this study we show that the IL-10R binding region of BmHsp12.6 is localized to its N-terminal region. This region has significant sequence similarity to the receptor binding region of human IL-10. In vitro studies confirm that the N-terminal region of BmHsp12.6 (N-BmHsp12.6) has IL-10 like activity and the region containing the alpha crystalline domain and C-terminus of BmHsp12.6 (BmHsp12.6αc) has no IL-10 like activity. However, BmHsp12.6αc contains B cell, T cell and CTL epitopes. Members of the sHSP families are excellent vaccine candidates. Evaluation of sera samples from putatively immune endemic normal (EN) subjects showed IgG1 and IgG3 antibodies against BmHsp12.6αc and these antibodies were involved in the ADCC mediated protection. Subsequent vaccination trials with BmHsp12.6αc in a mouse model using a heterologous prime boost approach showed that 83% protection can be achieved against B. malayi L3 challenge. Results presented in this study thus show that the N-BmHsp12.6 subunit of BmHsp12.6 has immunoregulatory function, whereas, the BmHsp12.6αc subunit of BmHsp12.6 has significant vaccine potential.
Collapse
MESH Headings
- Amino Acid Sequence
- Animals
- Antibodies, Helminth/blood
- Antibodies, Helminth/immunology
- Antibody-Dependent Cell Cytotoxicity
- Antigens, Helminth/immunology
- Brugia malayi/immunology
- Cell Proliferation
- Cytokines/metabolism
- Elephantiasis, Filarial/immunology
- Elephantiasis, Filarial/prevention & control
- Heat-Shock Proteins, Small/genetics
- Heat-Shock Proteins, Small/immunology
- Heat-Shock Proteins, Small/metabolism
- Humans
- Immunoglobulin G/immunology
- Interleukin-10/immunology
- Interleukin-10/metabolism
- Male
- Mast Cells/cytology
- Mast Cells/metabolism
- Mice
- Mice, Inbred BALB C
- Molecular Sequence Data
- Peptide Fragments/immunology
- Receptors, Interleukin-10/immunology
- Receptors, Interleukin-10/metabolism
- Recombinant Proteins/genetics
- Recombinant Proteins/immunology
- Spleen/cytology
- Spleen/immunology
- Spleen/metabolism
- Vaccination
- Vaccines, DNA/therapeutic use
Collapse
Affiliation(s)
- Gajalakshmi Dakshinamoorthy
- Department of Biomedical Sciences, University of Illinois College of Medicine at Rockford, Rockford, Illinois, United States of America
| | - Abhilash Kumble Samykutty
- Department of Biomedical Sciences, University of Illinois College of Medicine at Rockford, Rockford, Illinois, United States of America
- Department of Biochemistry, Mahatma Gandhi Institute of Medical Sciences, Sevagram, Maharashtra, India
| | - Gnanasekar Munirathinam
- Department of Biomedical Sciences, University of Illinois College of Medicine at Rockford, Rockford, Illinois, United States of America
| | - Gangadhar Bhaurao Shinde
- Department of Biochemistry, Rashtrasant Tukadoji Maharaj Nagpur University, Nagpur, Maharashtra, India
| | - Thomas Nutman
- Helminth Immunology Section, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Maryada Venkatarami Reddy
- Department of Biochemistry, Mahatma Gandhi Institute of Medical Sciences, Sevagram, Maharashtra, India
| | - Ramaswamy Kalyanasundaram
- Department of Biomedical Sciences, University of Illinois College of Medicine at Rockford, Rockford, Illinois, United States of America
| |
Collapse
|
139
|
Qi Y, Oja M, Weston J, Noble WS. A unified multitask architecture for predicting local protein properties. PLoS One 2012; 7:e32235. [PMID: 22461885 PMCID: PMC3312883 DOI: 10.1371/journal.pone.0032235] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2011] [Accepted: 01/25/2012] [Indexed: 01/27/2023] Open
Abstract
A variety of functionally important protein properties, such as secondary structure, transmembrane topology and solvent accessibility, can be encoded as a labeling of amino acids. Indeed, the prediction of such properties from the primary amino acid sequence is one of the core projects of computational biology. Accordingly, a panoply of approaches have been developed for predicting such properties; however, most such approaches focus on solving a single task at a time. Motivated by recent, successful work in natural language processing, we propose to use multitask learning to train a single, joint model that exploits the dependencies among these various labeling tasks. We describe a deep neural network architecture that, given a protein sequence, outputs a host of predicted local properties, including secondary structure, solvent accessibility, transmembrane topology, signal peptides and DNA-binding residues. The network is trained jointly on all these tasks in a supervised fashion, augmented with a novel form of semi-supervised learning in which the model is trained to distinguish between local patterns from natural and synthetic protein sequences. The task-independent architecture of the network obviates the need for task-specific feature engineering. We demonstrate that, for all of the tasks that we considered, our approach leads to statistically significant improvements in performance, relative to a single task neural network approach, and that the resulting model achieves state-of-the-art performance.
Collapse
Affiliation(s)
- Yanjun Qi
- Machine Learning Department, NEC Labs America, Princeton, New Jersey, United States of America
| | - Merja Oja
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Jason Weston
- Google, New York, New York, United States of America
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
- * E-mail:
| |
Collapse
|
140
|
Jordan RA, EL-Manzalawy Y, Dobbs D, Honavar V. Predicting protein-protein interface residues using local surface structural similarity. BMC Bioinformatics 2012; 13:41. [PMID: 22424103 PMCID: PMC3386866 DOI: 10.1186/1471-2105-13-41] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2011] [Accepted: 03/18/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Identification of the residues in protein-protein interaction sites has a significant impact in problems such as drug discovery. Motivated by the observation that the set of interface residues of a protein tend to be conserved even among remote structural homologs, we introduce PrISE, a family of local structural similarity-based computational methods for predicting protein-protein interface residues. RESULTS We present a novel representation of the surface residues of a protein in the form of structural elements. Each structural element consists of a central residue and its surface neighbors. The PrISE family of interface prediction methods uses a representation of structural elements that captures the atomic composition and accessible surface area of the residues that make up each structural element. Each of the members of the PrISE methods identifies for each structural element in the query protein, a collection of similar structural elements in its repository of structural elements and weights them according to their similarity with the structural element of the query protein. PrISEL relies on the similarity between structural elements (i.e. local structural similarity). PrISEG relies on the similarity between protein surfaces (i.e. general structural similarity). PrISEC, combines local structural similarity and general structural similarity to predict interface residues. These predictors label the central residue of a structural element in a query protein as an interface residue if a weighted majority of the structural elements that are similar to it are interface residues, and as a non-interface residue otherwise. The results of our experiments using three representative benchmark datasets show that the PrISEC outperforms PrISEL and PrISEG; and that PrISEC is highly competitive with state-of-the-art structure-based methods for predicting protein-protein interface residues. Our comparison of PrISEC with PredUs, a recently developed method for predicting interface residues of a query protein based on the known interface residues of its (global) structural homologs, shows that performance superior or comparable to that of PredUs can be obtained using only local surface structural similarity. PrISEC is available as a Web server at http://prise.cs.iastate.edu/ CONCLUSIONS Local surface structural similarity based methods offer a simple, efficient, and effective approach to predict protein-protein interface residues.
Collapse
Affiliation(s)
- Rafael A Jordan
- Department of Computer Science, Iowa State University, Ames, IA 50011, USA
- Department of Systems and Computer Engineering, Pontificia Universidad Javeriana, Cali, Colombia
| | - Yasser EL-Manzalawy
- Department of Computer Science, Iowa State University, Ames, IA 50011, USA
- Department of Systems and Computer Engineering, Al-Azhar University, Cairo, Egypt
| | - Drena Dobbs
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50011, USA
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA
| | - Vasant Honavar
- Department of Computer Science, Iowa State University, Ames, IA 50011, USA
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA
| |
Collapse
|
141
|
Abstract
Recent success stories concerning the targeting of protein-protein interactions (PPIs) have led to an increased focus on this challenging target class for drug discovery. This article explores various avenues to assess the druggability of PPIs and describes a druggability decision flow chart, which can be applied to any PPI target. This flow chart not only covers small molecules but also peptidomimetics, peptides and conformationally restricted peptides as potential modalities for targeting PPIs. Additionally, a retrospective analysis of PPI druggability using various computational tools is summarized. The application of a systematic approach as presented in this paper will increase confidence that modulators (e.g., small organic molecules or peptides) can ultimately be identified for a particular target before a decision is made to commit significant discovery resources.
Collapse
|
142
|
Partner-aware prediction of interacting residues in protein-protein complexes from sequence data. PLoS One 2011; 6:e29104. [PMID: 22194998 PMCID: PMC3237601 DOI: 10.1371/journal.pone.0029104] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2011] [Accepted: 11/21/2011] [Indexed: 12/22/2022] Open
Abstract
Computational prediction of residues that participate in protein-protein interactions is a difficult task, and state of the art methods have shown only limited success in this arena. One possible problem with these methods is that they try to predict interacting residues without incorporating information about the partner protein, although it is unclear how much partner information could enhance prediction performance. To address this issue, the two following comparisons are of crucial significance: (a) comparison between the predictability of inter-protein residue pairs, i.e., predicting exactly which residue pairs interact with each other given two protein sequences; this can be achieved by either combining conventional single-protein predictions or making predictions using a new model trained directly on the residue pairs, and the performance of these two approaches may be compared: (b) comparison between the predictability of the interacting residues in a single protein (irrespective of the partner residue or protein) from conventional methods and predictions converted from the pair-wise trained model. Using these two streams of training and validation procedures and employing similar two-stage neural networks, we showed that the models trained on pair-wise contacts outperformed the partner-unaware models in predicting both interacting pairs and interacting single-protein residues. Prediction performance decreased with the size of the conformational change upon complex formation; this trend is similar to docking, even though no structural information was used in our prediction. An example application that predicts two partner-specific interfaces of a protein was shown to be effective, highlighting the potential of the proposed approach. Finally, a preliminary attempt was made to score docking decoy poses using prediction of interacting residue pairs; this analysis produced an encouraging result.
Collapse
|
143
|
Zellner H, Staudigel M, Trenner T, Bittkowski M, Wolowski V, Icking C, Merkl R. Prescont: Predicting protein-protein interfaces utilizing four residue properties. Proteins 2011; 80:154-68. [DOI: 10.1002/prot.23172] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2011] [Revised: 08/18/2011] [Accepted: 08/29/2011] [Indexed: 12/26/2022]
|
144
|
Qiu Z, Wang X. Prediction of protein-protein interaction sites using patch-based residue characterization. J Theor Biol 2011; 293:143-50. [PMID: 22037062 DOI: 10.1016/j.jtbi.2011.10.021] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2011] [Revised: 09/13/2011] [Accepted: 10/15/2011] [Indexed: 10/15/2022]
Abstract
Identifying protein-protein interaction sites provides important clues to the function of a protein and is becoming increasingly relevant in topics such as systems biology and drug discovery. Using a patch-based model for residue characterization, we trained random forest classifiers for residue-based interface prediction, which was followed by a clustering procedure to produce patches for patch-based interface prediction. For residue-based interface prediction, our method achieves a specificity rate of 0.7 and a sensitivity rate of 0.78. For patch-based interface prediction, a success rate of 0.80 is achieved. Based on same datasets, we also compare it with several published methods. The results show that our method is a successful predictor for residue-based and patch-based interface prediction.
Collapse
Affiliation(s)
- Zhijun Qiu
- The State Key Laboratory of Structural Analysis of Industrial Equipment, Dalian University of Technology, 2 Ling-Gong Road, Dalian 116024, China
| | | |
Collapse
|
145
|
Gromiha MM, Saranya N, Selvaraj S, Jayaram B, Fukui K. Sequence and structural features of binding site residues in protein-protein complexes: comparison with protein-nucleic acid complexes. Proteome Sci 2011; 9 Suppl 1:S13. [PMID: 22166143 PMCID: PMC3289074 DOI: 10.1186/1477-5956-9-s1-s13] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Background Protein-protein interactions are important for several cellular processes. Understanding the mechanism of protein-protein recognition and predicting the binding sites in protein-protein complexes are long standing goals in molecular and computational biology. Methods We have developed an energy based approach for identifying the binding site residues in protein–protein complexes. The binding site residues have been analyzed with sequence and structure based parameters such as binding propensity, neighboring residues in the vicinity of binding sites, conservation score and conformational switching. Results We observed that the binding propensities of amino acid residues are specific for protein-protein complexes. Further, typical dipeptides and tripeptides showed high preference for binding, which is unique to protein-protein complexes. Most of the binding site residues are highly conserved among homologous sequences. Our analysis showed that 7% of residues changed their conformations upon protein-protein complex formation and it is 9.2% and 6.6% in the binding and non-binding sites, respectively. Specifically, the residues Glu, Lys, Leu and Ser changed their conformation from coil to helix/strand and from helix to coil/strand. Leu, Ser, Thr and Val prefer to change their conformation from strand to coil/helix. Conclusions The results obtained in this study will be helpful for understanding and predicting the binding sites in protein-protein complexes.
Collapse
Affiliation(s)
- M Michael Gromiha
- Department of Biotechnology, Indian Institute of Technology Madras, Chennai 600 036, Tamilnadu, India.
| | | | | | | | | |
Collapse
|
146
|
Sahu SS, Panda G. Efficient localization of hot spots in proteins using a novel S-transform based filtering approach. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:1235-1246. [PMID: 21778522 DOI: 10.1109/tcbb.2010.109] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Protein-protein interactions govern almost all biological processes and the underlying functions of proteins. The interaction sites of protein depend on the 3D structure which in turn depends on the amino acid sequence. Hence, prediction of protein function from its primary sequence is an important and challenging task in bioinformatics. Identification of the amino acids (hot spots) that leads to the characteristic frequency signifying a particular biological function is really a tedious job in proteomic signal processing. In this paper, we have proposed a new promising technique for identification of hot spots in proteins using an efficient time-frequency filtering approach known as the S-transform filtering. The S-transform is a powerful linear time-frequency representation and is especially useful for the filtering in the time-frequency domain. The potential of the new technique is analyzed in identifying hot spots in proteins and the result obtained is compared with the existing methods. The results demonstrate that the proposed method is superior to its counterparts and is consistent with results based on biological methods for identification of the hot spots. The proposed method also reveals some new hot spots which need further investigation and validation by the biological community.
Collapse
Affiliation(s)
- Sitanshu Sekhar Sahu
- Department of Electronics and Communication Engineering, National Institute of Technology, Rourkela 769008, Orissa, India.
| | | |
Collapse
|
147
|
Chen R, Chen W, Yang S, Wu D, Wang Y, Tian Y, Shi Y. Rigorous assessment and integration of the sequence and structure based features to predict hot spots. BMC Bioinformatics 2011; 12:311. [PMID: 21798070 PMCID: PMC3176265 DOI: 10.1186/1471-2105-12-311] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2010] [Accepted: 07/29/2011] [Indexed: 12/02/2022] Open
Abstract
Background Systematic mutagenesis studies have shown that only a few interface residues termed hot spots contribute significantly to the binding free energy of protein-protein interactions. Therefore, hot spots prediction becomes increasingly important for well understanding the essence of proteins interactions and helping narrow down the search space for drug design. Currently many computational methods have been developed by proposing different features. However comparative assessment of these features and furthermore effective and accurate methods are still in pressing need. Results In this study, we first comprehensively collect the features to discriminate hot spots and non-hot spots and analyze their distributions. We find that hot spots have lower relASA and larger relative change in ASA, suggesting hot spots tend to be protected from bulk solvent. In addition, hot spots have more contacts including hydrogen bonds, salt bridges, and atomic contacts, which favor complexes formation. Interestingly, we find that conservation score and sequence entropy are not significantly different between hot spots and non-hot spots in Ab+ dataset (all complexes). While in Ab- dataset (antigen-antibody complexes are excluded), there are significant differences in two features between hot pots and non-hot spots. Secondly, we explore the predictive ability for each feature and the combinations of features by support vector machines (SVMs). The results indicate that sequence-based feature outperforms other combinations of features with reasonable accuracy, with a precision of 0.69, a recall of 0.68, an F1 score of 0.68, and an AUC of 0.68 on independent test set. Compared with other machine learning methods and two energy-based approaches, our approach achieves the best performance. Moreover, we demonstrate the applicability of our method to predict hot spots of two protein complexes. Conclusion Experimental results show that support vector machine classifiers are quite effective in predicting hot spots based on sequence features. Hot spots cannot be fully predicted through simple analysis based on physicochemical characteristics, but there is reason to believe that integration of features and machine learning methods can remarkably improve the predictive performance for hot spots.
Collapse
Affiliation(s)
- Ruoying Chen
- 1College of Life Sciences, Graduate University of Chinese Academy ofSciences, Beijing 100049, China
| | | | | | | | | | | | | |
Collapse
|
148
|
Sequence- and interactome-based prediction of viral protein hotspots targeting host proteins: a case study for HIV Nef. PLoS One 2011; 6:e20735. [PMID: 21738584 PMCID: PMC3125164 DOI: 10.1371/journal.pone.0020735] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2011] [Accepted: 05/08/2011] [Indexed: 01/03/2023] Open
Abstract
Virus proteins alter protein pathways of the host toward the synthesis of viral particles by breaking and making edges via binding to host proteins. In this study, we developed a computational approach to predict viral sequence hotspots for binding to host proteins based on sequences of viral and host proteins and literature-curated virus-host protein interactome data. We use a motif discovery algorithm repeatedly on collections of sequences of viral proteins and immediate binding partners of their host targets and choose only those motifs that are conserved on viral sequences and highly statistically enriched among binding partners of virus protein targeted host proteins. Our results match experimental data on binding sites of Nef to host proteins such as MAPK1, VAV1, LCK, HCK, HLA-A, CD4, FYN, and GNB2L1 with high statistical significance but is a poor predictor of Nef binding sites on highly flexible, hoop-like regions. Predicted hotspots recapture CD8 cell epitopes of HIV Nef highlighting their importance in modulating virus-host interactions. Host proteins potentially targeted or outcompeted by Nef appear crowding the T cell receptor, natural killer cell mediated cytotoxicity, and neurotrophin signaling pathways. Scanning of HIV Nef motifs on multiple alignments of hepatitis C protein NS5A produces results consistent with literature, indicating the potential value of the hotspot discovery in advancing our understanding of virus-host crosstalk.
Collapse
|
149
|
Xue LC, Dobbs D, Honavar V. HomPPI: a class of sequence homology based protein-protein interface prediction methods. BMC Bioinformatics 2011; 12:244. [PMID: 21682895 PMCID: PMC3213298 DOI: 10.1186/1471-2105-12-244] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2010] [Accepted: 06/17/2011] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND Although homology-based methods are among the most widely used methods for predicting the structure and function of proteins, the question as to whether interface sequence conservation can be effectively exploited in predicting protein-protein interfaces has been a subject of debate. RESULTS We studied more than 300,000 pair-wise alignments of protein sequences from structurally characterized protein complexes, including both obligate and transient complexes. We identified sequence similarity criteria required for accurate homology-based inference of interface residues in a query protein sequence.Based on these analyses, we developed HomPPI, a class of sequence homology-based methods for predicting protein-protein interface residues. We present two variants of HomPPI: (i) NPS-HomPPI (Non partner-specific HomPPI), which can be used to predict interface residues of a query protein in the absence of knowledge of the interaction partner; and (ii) PS-HomPPI (Partner-specific HomPPI), which can be used to predict the interface residues of a query protein with a specific target protein.Our experiments on a benchmark dataset of obligate homodimeric complexes show that NPS-HomPPI can reliably predict protein-protein interface residues in a given protein, with an average correlation coefficient (CC) of 0.76, sensitivity of 0.83, and specificity of 0.78, when sequence homologs of the query protein can be reliably identified. NPS-HomPPI also reliably predicts the interface residues of intrinsically disordered proteins. Our experiments suggest that NPS-HomPPI is competitive with several state-of-the-art interface prediction servers including those that exploit the structure of the query proteins. The partner-specific classifier, PS-HomPPI can, on a large dataset of transient complexes, predict the interface residues of a query protein with a specific target, with a CC of 0.65, sensitivity of 0.69, and specificity of 0.70, when homologs of both the query and the target can be reliably identified. The HomPPI web server is available at http://homppi.cs.iastate.edu/. CONCLUSIONS Sequence homology-based methods offer a class of computationally efficient and reliable approaches for predicting the protein-protein interface residues that participate in either obligate or transient interactions. For query proteins involved in transient interactions, the reliability of interface residue prediction can be improved by exploiting knowledge of putative interaction partners.
Collapse
Affiliation(s)
- Li C Xue
- Department of Computer Science, Iowa State University, Ames, IA 50011, USA.
| | | | | |
Collapse
|
150
|
Falero A, Caballero A, Trigueros S, Pérez C, Campos J, Marrero K, Fando R. Characterization of the single-stranded DNA binding protein pV(VGJΦ) of VGJΦ phage from Vibrio cholerae. BIOCHIMICA ET BIOPHYSICA ACTA 2011; 1814:1107-12. [PMID: 21586349 DOI: 10.1016/j.bbapap.2011.04.013] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2010] [Revised: 03/22/2011] [Accepted: 04/21/2011] [Indexed: 01/01/2023]
Abstract
pV(VGJΦ), a single-stranded DNA binding protein of the vibriophage VGJΦ was subject to biochemical analysis. Here, we show that this protein has a general affinity for single-stranded DNA (ssDNA) as documented by Electrophoretic Mobility Shift Assay (EMSA). The apparent molecular weight of the monomer is about 12.7kDa as measured by HPLC-SEC. Moreover, isoelectrofocusing showed an isoelectric point for pV(VGJΦ) of 6.82 pH units. Size exclusion chromatography in 150mM NaCl, 50mM sodium phosphate buffer, pH 7.0 revealed a major protein species of 27.0kDa, suggesting homodimeric protein architecture. Furthermore, pV(VGJΦ) binds ssDNA at extreme temperatures and the complex was stable after extended incubation times. Upon frozen storage at -20°C for a year the protein retained its integrity, biological activity and oligomericity. On the other hand, bioinformatics analysis predicted that pV(VGJΦ) protein has a disordered C-terminal, which might be involved in its functional activity. All the aforementioned features make pV(VGJΦ) interesting for biotechnological applications.
Collapse
Affiliation(s)
- Alina Falero
- Department of Molecular Biology, National Center for Scientific Research, Havana, Cuba.
| | | | | | | | | | | | | |
Collapse
|