1
|
Durairaj J, Melillo E, Bouwmeester HJ, Beekwilder J, de Ridder D, van Dijk ADJ. Integrating structure-based machine learning and co-evolution to investigate specificity in plant sesquiterpene synthases. PLoS Comput Biol 2021; 17:e1008197. [PMID: 33750949 PMCID: PMC8016262 DOI: 10.1371/journal.pcbi.1008197] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Revised: 04/01/2021] [Accepted: 02/15/2021] [Indexed: 12/19/2022] Open
Abstract
Sesquiterpene synthases (STSs) catalyze the formation of a large class of plant volatiles called sesquiterpenes. While thousands of putative STS sequences from diverse plant species are available, only a small number of them have been functionally characterized. Sequence identity-based screening for desired enzymes, often used in biotechnological applications, is difficult to apply here as STS sequence similarity is strongly affected by species. This calls for more sophisticated computational methods for functionality prediction. We investigate the specificity of precursor cation formation in these elusive enzymes. By inspecting multi-product STSs, we demonstrate that STSs have a strong selectivity towards one precursor cation. We use a machine learning approach combining sequence and structure information to accurately predict precursor cation specificity for STSs across all plant species. We combine this with a co-evolutionary analysis on the wealth of uncharacterized putative STS sequences, to pinpoint residues and distant functional contacts influencing cation formation and reaction pathway selection. These structural factors can be used to predict and engineer enzymes with specific functions, as we demonstrate by predicting and characterizing two novel STSs from Citrus bergamia.
Collapse
Affiliation(s)
- Janani Durairaj
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, The Netherlands
| | | | - Harro J. Bouwmeester
- Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands
| | - Jules Beekwilder
- Bioscience, Wageningen Plant Research, Wageningen University and Research, Wageningen, The Netherlands
- Laboratory of Plant Physiology, Department of Plant Sciences, Wageningen University and Research, Wageningen, The Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, The Netherlands
| | - Aalt D. J. van Dijk
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, The Netherlands
- Biometris, Department of Plant Sciences, Wageningen University and Research, Wageningen, The Netherlands
| |
Collapse
|
2
|
Akdel M, Durairaj J, de Ridder D, van Dijk ADJ. Caretta - A multiple protein structure alignment and feature extraction suite. Comput Struct Biotechnol J 2020; 18:981-992. [PMID: 32368333 PMCID: PMC7186369 DOI: 10.1016/j.csbj.2020.03.011] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2019] [Revised: 02/01/2020] [Accepted: 03/13/2020] [Indexed: 02/06/2023] Open
Abstract
The vast number of protein structures currently available opens exciting opportunities for machine learning on proteins, aimed at predicting and understanding functional properties. In particular, in combination with homology modelling, it is now possible to not only use sequence features as input for machine learning, but also structure features. However, in order to do so, robust multiple structure alignments are imperative. Here we present Caretta, a multiple structure alignment suite meant for homologous but sequentially divergent protein families which consistently returns accurate alignments with a higher coverage than current state-of-the-art tools. Caretta is available as a GUI and command-line application and additionally outputs an aligned structure feature matrix for a given set of input structures, which can readily be used in downstream steps for supervised or unsupervised machine learning. We show Caretta’s performance on two benchmark datasets, and present an example application of Caretta in predicting the conformational state of cyclin-dependent kinases.
Collapse
Affiliation(s)
- Mehmet Akdel
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, The Netherlands
| | - Janani Durairaj
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, The Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, The Netherlands
| | - Aalt D J van Dijk
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, The Netherlands.,Mathematical and Statistical Methods - Biometris, Department of Plant Sciences, Wageningen University and Research, The Netherlands
| |
Collapse
|
3
|
Rosenfeld L, Heyne M, Shifman JM, Papo N. Protein Engineering by Combined Computational and In Vitro Evolution Approaches. Trends Biochem Sci 2016; 41:421-433. [PMID: 27061494 DOI: 10.1016/j.tibs.2016.03.002] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2016] [Revised: 02/29/2016] [Accepted: 03/09/2016] [Indexed: 12/30/2022]
Abstract
Two alternative strategies are commonly used to study protein-protein interactions (PPIs) and to engineer protein-based inhibitors. In one approach, binders are selected experimentally from combinatorial libraries of protein mutants that are displayed on a cell surface. In the other approach, computational modeling is used to explore an astronomically large number of protein sequences to select a small number of sequences for experimental testing. While both approaches have some limitations, their combination produces superior results in various protein engineering applications. Such applications include the design of novel binders and inhibitors, the enhancement of affinity and specificity, and the mapping of binding epitopes. The combination of these approaches also aids in the understanding of the specificity profiles of various PPIs.
Collapse
Affiliation(s)
- Lior Rosenfeld
- Department of Biotechnology Engineering and the National Institute of Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Michael Heyne
- Department of Biotechnology Engineering and the National Institute of Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer-Sheva, Israel; Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Julia M Shifman
- Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel.
| | - Niv Papo
- Department of Biotechnology Engineering and the National Institute of Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer-Sheva, Israel.
| |
Collapse
|
4
|
Kundu K, Costa F, Backofen R. A graph kernel approach for alignment-free domain-peptide interaction prediction with an application to human SH3 domains. Bioinformatics 2013; 29:i335-43. [PMID: 23813002 PMCID: PMC3694653 DOI: 10.1093/bioinformatics/btt220] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
MOTIVATION State-of-the-art experimental data for determining binding specificities of peptide recognition modules (PRMs) is obtained by high-throughput approaches like peptide arrays. Most prediction tools applicable to this kind of data are based on an initial multiple alignment of the peptide ligands. Building an initial alignment can be error-prone, especially in the case of the proline-rich peptides bound by the SH3 domains. RESULTS Here, we present a machine-learning approach based on an efficient graph-kernel technique to predict the specificity of a large set of 70 human SH3 domains, which are an important class of PRMs. The graph-kernel strategy allows us to (i) integrate several types of physico-chemical information for each amino acid, (ii) consider high-order correlations between these features and (iii) eliminate the need for an initial peptide alignment. We build specialized models for each human SH3 domain and achieve competitive predictive performance of 0.73 area under precision-recall curve, compared with 0.27 area under precision-recall curve for state-of-the-art methods based on position weight matrices. We show that better models can be obtained when we use information on the noninteracting peptides (negative examples), which is currently not used by the state-of-the art approaches based on position weight matrices. To this end, we analyze two strategies to identify subsets of high confidence negative data. The techniques introduced here are more general and hence can also be used for any other protein domains, which interact with short peptides (i.e. other PRMs). AVAILABILITY The program with the predictive models can be found at http://www.bioinf.uni-freiburg.de/Software/SH3PepInt/SH3PepInt.tar.gz. We also provide a genome-wide prediction for all 70 human SH3 domains, which can be found under http://www.bioinf.uni-freiburg.de/Software/SH3PepInt/Genome-Wide-Predictions.tar.gz. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kousik Kundu
- Bioinformatics Group, Department of Computer Science, Georges-Köhler-Allee 106, 79110 Freiburg, Germany
| | | | | |
Collapse
|
5
|
Li N, Stein RSL, He W, Komives E, Wang W. Identification of methyllysine peptides binding to chromobox protein homolog 6 chromodomain in the human proteome. Mol Cell Proteomics 2013; 12:2750-60. [PMID: 23842000 DOI: 10.1074/mcp.o112.025015] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Methylation is one of the important post-translational modifications that play critical roles in regulating protein functions. Proteomic identification of this post-translational modification and understanding how it affects protein activity remain great challenges. We tackled this problem from the aspect of methylation mediating protein-protein interaction. Using the chromodomain of human chromobox protein homolog 6 as a model system, we developed a systematic approach that integrates structure modeling, bioinformatics analysis, and peptide microarray experiments to identify lysine residues that are methylated and recognized by the chromodomain in the human proteome. Given the important role of chromobox protein homolog 6 as a reader of histone modifications, it was interesting to find that the majority of its interacting partners identified via this approach function in chromatin remodeling and transcriptional regulation. Our study not only illustrates a novel angle for identifying methyllysines on a proteome-wide scale and elucidating their potential roles in regulating protein function, but also suggests possible strategies for engineering the chromodomain-peptide interface to enhance the recognition of and manipulate the signal transduction mediated by such interactions.
Collapse
Affiliation(s)
- Nan Li
- Department of Chemistry and Biochemistry, 9500 Gilman Drive, University of California, San Diego, La Jolla, California 92093-0359
| | | | | | | | | |
Collapse
|
6
|
González AJ, Liao L, Wu CH. Prediction of contact matrix for protein-protein interaction. ACTA ACUST UNITED AC 2013; 29:1018-25. [PMID: 23418186 DOI: 10.1093/bioinformatics/btt076] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
MOTIVATION Prediction of protein-protein interaction has become an important part of systems biology in reverse engineering the biological networks for better understanding the molecular biology of the cell. Although significant progress has been made in terms of prediction accuracy, most computational methods only predict whether two proteins interact but not their interacting residues-the information that can be very valuable for understanding the interaction mechanisms and designing modulation of the interaction. In this work, we developed a computational method to predict the interacting residue pairs-contact matrix for interacting protein domains, whose rows and columns correspond to the residues in the two interacting domains respectively and whose values (1 or 0) indicate whether the corresponding residues (do or do not) interact. RESULTS Our method is based on supervised learning using support vector machines. For each domain involved in a given domain-domain interaction (DDI), an interaction profile hidden Markov model (ipHMM) is first built for the domain family, and then each residue position for a member domain sequence is represented as a 20-dimension vector of Fisher scores, characterizing how similar it is as compared with the family profile at that position. Each element of the contact matrix for a sequence pair is now represented by a feature vector from concatenating the vectors of the two corresponding residues, and the task is to predict the element value (1 or 0) from the feature vector. A support vector machine is trained for a given DDI, using either a consensus contact matrix or contact matrices for individual sequence pairs, and is tested by leave-one-out cross validation. The performance averaged over a set of 115 DDIs collected from the 3 DID database shows significant improvement (sensitivity up to 85%, and specificity up to 85%), as compared with a multiple sequence alignment-based method (sensitivity 57%, and specificity 78%) previously reported in the literature. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alvaro J González
- Department of Computer and Information Sciences, University of Delaware, Newark, DE 19716, USA
| | | | | |
Collapse
|
7
|
Hawkins JC, Zhu H, Teyra J, Pisabarro MT. Reduced false positives in PDZ binding prediction using sequence and structural descriptors. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:1492-1503. [PMID: 22508908 DOI: 10.1109/tcbb.2012.54] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Abstract—Identifying the binding partners of proteins is a problem of fundamental importance in computational biology. The PDZ is one of the most common and well-studied protein binding domains, hence it is a perfect model system for designing protein binding predictors. The standard approach to identifying the binding partners of PDZ domains uses multiple sequence alignments to infer the set of contact residues that are used in a predictive model. We expand on the sequence alignment approach by incorporating structural information to generate descriptors of the binding site geometry. Furthermore, we generate a real-value score for binary predictions by applying a filter based on models that predict the probability distributions of contact residues at each of the canonical PDZ ligand binding positions. Under training cross validation, our model produced an order of magnitude more predictions at a false positive proportion (FPP) of 10 percent than our benchmark model chosen from the literature. Evaluated using an independent cross validation, with computationally predicted structures, our model was able to make five times as many predictions as the benchmark model, with a Matthews' correlation coefficient (MCC) of 0.33. In addition, our model achieved a false positive proportion of 0.14, while the benchmark model had a 0.25 false positive proportion.
Collapse
Affiliation(s)
- John C Hawkins
- Structural Bioinformatics, BIOTEC TU Dresden, Dresden, Germany.
| | | | | | | |
Collapse
|
8
|
Hou T, Li N, Li Y, Wang W. Characterization of domain-peptide interaction interface: prediction of SH3 domain-mediated protein-protein interaction network in yeast by generic structure-based models. J Proteome Res 2012; 11:2982-95. [PMID: 22468754 PMCID: PMC3345086 DOI: 10.1021/pr3000688] [Citation(s) in RCA: 184] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Determination of the binding specificity of SH3 domain, a peptide recognition module (PRM), is important to understand their biological functions and reconstruct the SH3-mediated protein-protein interaction network. In the present study, the SH3-peptide interactions for both class I and II SH3 domains were characterized by the intermolecular residue-residue interaction network. We developed generic MIEC-SVM models to infer SH3 domain-peptide recognition specificity that achieved satisfactory prediction accuracy. By investigating the domain-peptide recognition mechanisms at the residue level, we found that the class-I and class-II binding peptides have different binding modes even though they occupy the same binding site of SH3. Furthermore, we predicted the potential binding partners of SH3 domains in the yeast proteome and constructed the SH3-mediated protein-protein interaction network. Comparison with the experimentally determined interactions confirmed the effectiveness of our approach. This study showed that our sophisticated computational approach not only provides a powerful platform to decipher protein recognition code at the molecular level but also allows identification of peptide-mediated protein interactions at a proteomic scale. We believe that such an approach is general to be applicable to other domain-peptide interactions.
Collapse
Affiliation(s)
- Tingjun Hou
- Institute of Functional Nano & Soft Materials (FUNSOM) and Jiangsu Key Laboratory for Carbon-Based Functional Materials & Devices, Soochow University, Suzhou, Jiangsu 215123, China
- College of Pharmaceutical Science, Soochow University, Suzhou, Jiangsu 215123, China
| | - Nan Li
- Department of Chemistry and Biochemistry, University of California at San Diego, La Jolla, California 92093, United States
| | - Youyong Li
- Institute of Functional Nano & Soft Materials (FUNSOM) and Jiangsu Key Laboratory for Carbon-Based Functional Materials & Devices, Soochow University, Suzhou, Jiangsu 215123, China
| | - Wei Wang
- Department of Chemistry and Biochemistry, University of California at San Diego, La Jolla, California 92093, United States
| |
Collapse
|
9
|
Gfeller D. Uncovering new aspects of protein interactions through analysis of specificity landscapes in peptide recognition domains. FEBS Lett 2012; 586:2764-72. [PMID: 22710167 DOI: 10.1016/j.febslet.2012.03.054] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2012] [Revised: 03/27/2012] [Accepted: 03/27/2012] [Indexed: 12/20/2022]
Abstract
Protein interactions underlie all biological processes. An important class of protein interactions, often observed in signaling pathways, consists of peptide recognition domains binding short protein segments on the surface of their target proteins. Recent developments in experimental techniques have uncovered many such interactions and shed new lights on their specificity. To analyze these data, novel computational methods have been introduced that can accurately describe the specificity landscape of peptide recognition domains and predict new interactions. Combining large-scale analysis of binding specificity data with structure-based modeling can further reveal new biological insights into the molecular recognition events underlying signaling pathways.
Collapse
Affiliation(s)
- David Gfeller
- Swiss Institute of Bioinformatics, Quartier Sorge, Bâtiment Génopode, CH-1015 Lausanne, Switzerland.
| |
Collapse
|
10
|
Hou T, Li Y, Wang W. Prediction of peptides binding to the PKA RIIalpha subunit using a hierarchical strategy. ACTA ACUST UNITED AC 2011; 27:1814-21. [PMID: 21586518 DOI: 10.1093/bioinformatics/btr294] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Favorable interaction between the regulatory subunit of the cAMP-dependent protein kinase (PKA) and a peptide in A-kinase anchoring proteins (AKAPs) is critical for translocating PKA to the subcellular sites where the enzyme phosphorylates its substrates. It is very hard to identify AKAPs peptides binding to PKA due to the high sequence diversity of AKAPs. RESULTS We propose a hierarchical and efficient approach, which combines molecular dynamics (MD) simulations, free energy calculations, virtual mutagenesis (VM) and bioinformatics analyses, to predict peptides binding to the PKA RIIα regulatory subunit in the human proteome systematically. Our approach successfully retrieved 15 out of 18 documented RIIα-binding peptides. Literature curation supported that many newly predicted peptides might be true AKAPs. Here, we present the first systematic search for AKAP peptides in the human proteome, which is useful to further experimental identification of AKAPs and functional analysis of their biological roles.
Collapse
Affiliation(s)
- Tingjun Hou
- Institute of Functional Nano & Soft Materials (FUNSOM) and Jiangsu Key Laboratory for Carbon-Based Functional Materials & Devices, Soochow University, Suzhou, Jiangsu 215123, PR China.
| | | | | |
Collapse
|
11
|
Shao X, Tan CSH, Voss C, Li SSC, Deng N, Bader GD. A regression framework incorporating quantitative and negative interaction data improves quantitative prediction of PDZ domain-peptide interaction from primary sequence. ACTA ACUST UNITED AC 2010; 27:383-90. [PMID: 21127034 PMCID: PMC3031032 DOI: 10.1093/bioinformatics/btq657] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Motivation: Predicting protein interactions involving peptide recognition domains is essential for understanding the many important biological processes they mediate. It is important to consider the binding strength of these interactions to help us construct more biologically relevant protein interaction networks that consider cellular context and competition between potential binders. Results: We developed a novel regression framework that considers both positive (quantitative) and negative (qualitative) interaction data available for mouse PDZ domains to quantitatively predict interactions between PDZ domains, a large peptide recognition domain family, and their peptide ligands using primary sequence information. First, we show that it is possible to learn from existing quantitative and negative interaction data to infer the relative binding strength of interactions involving previously unseen PDZ domains and/or peptides given their primary sequence. Performance was measured using cross-validated hold out testing and testing with previously unseen PDZ domain–peptide interactions. Second, we find that incorporating negative data improves quantitative interaction prediction. Third, we show that sequence similarity is an important prediction performance determinant, which suggests that experimentally collecting additional quantitative interaction data for underrepresented PDZ domain subfamilies will improve prediction. Availability and Implementation: The Matlab code for our SemiSVR predictor and all data used here are available at http://baderlab.org/Data/PDZAffinity. Contact:gary.bader@utoronto.ca; dengnaiyang@cau.edu.cn Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiaojian Shao
- Department of Applied Mathematics, College of Science, China Agricultural University, Beijing, 100083, China
| | | | | | | | | | | |
Collapse
|
12
|
van Dijk ADJ, Morabito G, Fiers M, van Ham RCHJ, Angenent GC, Immink RGH. Sequence motifs in MADS transcription factors responsible for specificity and diversification of protein-protein interaction. PLoS Comput Biol 2010; 6:e1001017. [PMID: 21124869 PMCID: PMC2991254 DOI: 10.1371/journal.pcbi.1001017] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2010] [Accepted: 10/27/2010] [Indexed: 11/18/2022] Open
Abstract
Protein sequences encompass tertiary structures and contain information about specific molecular interactions, which in turn determine biological functions of proteins. Knowledge about how protein sequences define interaction specificity is largely missing, in particular for paralogous protein families with high sequence similarity, such as the plant MADS domain transcription factor family. In comparison to the situation in mammalian species, this important family of transcription regulators has expanded enormously in plant species and contains over 100 members in the model plant species Arabidopsis thaliana. Here, we provide insight into the mechanisms that determine protein-protein interaction specificity for the Arabidopsis MADS domain transcription factor family, using an integrated computational and experimental approach. Plant MADS proteins have highly similar amino acid sequences, but their dimerization patterns vary substantially. Our computational analysis uncovered small sequence regions that explain observed differences in dimerization patterns with reasonable accuracy. Furthermore, we show the usefulness of the method for prediction of MADS domain transcription factor interaction networks in other plant species. Introduction of mutations in the predicted interaction motifs demonstrated that single amino acid mutations can have a large effect and lead to loss or gain of specific interactions. In addition, various performed bioinformatics analyses shed light on the way evolution has shaped MADS domain transcription factor interaction specificity. Identified protein-protein interaction motifs appeared to be strongly conserved among orthologs, indicating their evolutionary importance. We also provide evidence that mutations in these motifs can be a source for sub- or neo-functionalization. The analyses presented here take us a step forward in understanding protein-protein interactions and the interplay between protein sequences and network evolution.
Collapse
Affiliation(s)
| | | | - Martijn Fiers
- Plant Research International, Bioscience, Wageningen, The Netherlands
| | | | - Gerco C. Angenent
- Plant Research International, Bioscience, Wageningen, The Netherlands
- Centre for BioSystems Genomics (CBSG), Wageningen, The Netherlands
| | - Richard G. H. Immink
- Plant Research International, Bioscience, Wageningen, The Netherlands
- Centre for BioSystems Genomics (CBSG), Wageningen, The Netherlands
- * E-mail:
| |
Collapse
|
13
|
Collaborative actions in anti-trypanosomatid chemotherapy with partners from disease endemic areas. Trends Parasitol 2010; 26:395-403. [DOI: 10.1016/j.pt.2010.04.012] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2009] [Revised: 04/28/2010] [Accepted: 04/29/2010] [Indexed: 11/22/2022]
|
14
|
Eo HS, Kim S, Koo H, Kim W. A machine learning based method for the prediction of G protein-coupled receptor-binding PDZ domain proteins. Mol Cells 2009; 27:629-34. [PMID: 19533032 DOI: 10.1007/s10059-009-0091-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2008] [Revised: 05/01/2009] [Accepted: 05/12/2009] [Indexed: 10/20/2022] Open
Abstract
G protein-coupled receptors (GPCRs) are part of multi-protein networks called 'receptosomes'. These GPCR interacting proteins (GIPs) in the receptosomes control the targeting, trafficking and signaling of GPCRs. PDZ domain proteins constitute the largest protein family among the GIPs, and the predominant function of the PDZ domain proteins is to assemble signaling pathway components into close proximity by recognition of the last four C-terminal amino acids of GPCRs. We present here a machine learning based approach for the identification of GPCR-binding PDZ domain proteins. In order to characterize the network of interactions between amino acid residues that contribute to the stability of the PDZ domain-ligand complex and to encode the complex into a feature vector, amino acid contact matrices and physicochemical distance matrix were constructed and adopted. This novel machine learning based method displayed high performance for the identification of PDZ domain-ligand interactions and allowed the identification of novel GPCR-PDZ domain protein interactions.
Collapse
Affiliation(s)
- Hae-Seok Eo
- School of Biological Sciences, Seoul National University, Seoul 151-742, Korea
| | | | | | | |
Collapse
|
15
|
Wunderlich Z, Mirny LA. Using genome-wide measurements for computational prediction of SH2-peptide interactions. Nucleic Acids Res 2009; 37:4629-41. [PMID: 19502496 PMCID: PMC2724268 DOI: 10.1093/nar/gkp394] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Peptide-recognition modules (PRMs) are used throughout biology to mediate protein–protein interactions, and many PRMs are members of large protein domain families. Recent genome-wide measurements describe networks of peptide–PRM interactions. In these networks, very similar PRMs recognize distinct sets of peptides, raising the question of how peptide-recognition specificity is achieved using similar protein domains. The analysis of individual protein complex structures often gives answers that are not easily applicable to other members of the same PRM family. Bioinformatics-based approaches, one the other hand, may be difficult to interpret physically. Here we integrate structural information with a large, quantitative data set of SH2 domain–peptide interactions to study the physical origin of domain–peptide specificity. We develop an energy model, inspired by protein folding, based on interactions between the amino-acid positions in the domain and peptide. We use this model to successfully predict which SH2 domains and peptides interact and uncover the positions in each that are important for specificity. The energy model is general enough that it can be applied to other members of the SH2 family or to new peptides, and the cross-validation results suggest that these energy calculations will be useful for predicting binding interactions. It can also be adapted to study other PRM families, predict optimal peptides for a given SH2 domain, or study other biological interactions, e.g. protein–DNA interactions.
Collapse
Affiliation(s)
- Zeba Wunderlich
- Biophysics Program, Harvard University, Cambridge, MA 02138, USA
| | | |
Collapse
|
16
|
Schillinger C, Boisguerin P, Krause G. Domain Interaction Footprint: a multi-classification approach to predict domain-peptide interactions. ACTA ACUST UNITED AC 2009; 25:1632-9. [PMID: 19376827 DOI: 10.1093/bioinformatics/btp264] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION The flow of information within cellular pathways largely relies on specific protein-protein interactions. Discovering such interactions that are mostly mediated by peptide recognition modules (PRM) is therefore a fundamental step towards unravelling the complexity of varying pathways. Since peptides can be recognized by more than one PRM and high-throughput experiments are both time consuming and expensive, it would be preferable to narrow down all potential peptide ligands for one specific PRM by a computational method. We at first present Domain Interaction Footprint (DIF) a new approach to predict binding peptides to PRMs merely based on the sequence of the peptides. Second, we show that our method is able to create a multi-classification model that assesses the binding specificity of a given peptide to all examined PRMs at once. RESULTS We first applied our approach to a previously investigated dataset of different SH3 domains and predicted their appropriate peptide ligands with an exceptionally high accuracy. This result outperforms all recent methods trained on the same dataset. Furthermore, we used our technique to build two multi-classification models (SH3 and PDZ domains) to predict the interaction preference between a peptide and every single domain in the corresponding domain family at once. Predicting the domain specificity most reliably, our proposed approach can be seen as a first step towards a complete multi-domain classification model comprised of all domains of one family. Such a comprehensive domain specificity model would benefit the quest for highly specific peptide ligands interacting solely with the domain of choice. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Christian Schillinger
- Leibniz Institute for Molecular Pharmacology, Robert-Roessle-Strasse 10, Berlin, FU-Berlin, Germany
| | | | | |
Collapse
|
17
|
Hou T, Xu Z, Zhang W, McLaughlin WA, Case DA, Xu Y, Wang W. Characterization of domain-peptide interaction interface: a generic structure-based model to decipher the binding specificity of SH3 domains. Mol Cell Proteomics 2008; 8:639-49. [PMID: 19023120 DOI: 10.1074/mcp.m800450-mcp200] [Citation(s) in RCA: 93] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Extensive efforts have been devoted to determining the binding specificity of Src homology 3 (SH3) domains usually in a case-by-case manner. A generic structure-based model is necessary to decipher the protein recognition code of the entire domain family. In this study, we have developed a general framework that combines molecular modeling and a machine learning algorithm to capture the energetic characteristics of the domain-peptide interactions and predict the binding specificity of the SH3 domain family. Our model is not trained for individual SH3 domains; rather it is a generic model for the entire domain family. Our model not only achieved satisfactory prediction accuracy but also provided structural insights into which residues are important for the binding specificity. The success of our framework on SH3 domains suggests that it is possible to establish a theoretical model to decipher the protein recognition code of any modular domain.
Collapse
Affiliation(s)
- Tingjun Hou
- Department of Chemistry and Biochemistry, University of California at San Diego, La Jolla, California 92093, USA
| | | | | | | | | | | | | |
Collapse
|
18
|
Kiel C, Beltrao P, Serrano L. Analyzing Protein Interaction Networks Using Structural Information. Annu Rev Biochem 2008; 77:415-41. [DOI: 10.1146/annurev.biochem.77.062706.133317] [Citation(s) in RCA: 78] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Christina Kiel
- EMBL-CRG Systems Biology Unit, Center de Regulacio Genomica, Barcelona 08003, Spain; ,
| | - Pedro Beltrao
- European Molecular Biology Laboratory, 69117 Heidelberg, Germany;
| | - Luis Serrano
- EMBL-CRG Systems Biology Unit, Center de Regulacio Genomica, Barcelona 08003, Spain; ,
| |
Collapse
|
19
|
Hou T, Zhang W, Case DA, Wang W. Characterization of Domain–Peptide Interaction Interface: A Case Study on the Amphiphysin-1 SH3 Domain. J Mol Biol 2008; 376:1201-14. [DOI: 10.1016/j.jmb.2007.12.054] [Citation(s) in RCA: 176] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2007] [Revised: 12/14/2007] [Accepted: 12/20/2007] [Indexed: 11/25/2022]
|
20
|
Ferraro E, Peluso D, Via A, Ausiello G, Helmer-Citterich M. SH3-Hunter: discovery of SH3 domain interaction sites in proteins. Nucleic Acids Res 2007; 35:W451-4. [PMID: 17485474 PMCID: PMC1933191 DOI: 10.1093/nar/gkm296] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
SH3-Hunter (http://cbm.bio.uniroma2.it/SH3-Hunter/) is a web server for the recognition of putative SH3 domain interaction sites on protein sequences. Given an input query consisting of one or more protein sequences, the server identifies peptides containing poly-proline binding motifs and associates them to a list of SH3 domains, in order to compose peptide-domain pairs. The server can accept a list of peptides and allows users to upload an input file in a proper format. An accurate selection of SH3 domains is available and users can also submit their own SH3 domain sequence. SH3-Hunter evaluates which peptide-domain pair represents a possible interaction pair and produces as output a list of significant interaction sites for each query protein. Each proposed interaction site is associated to a propensity score and sensitivity and precision levels for the prediction. The server prediction capability is based on a neural network model integrating high-throughput pep-spot data with structural information extracted from known SH3-peptide complexes.
Collapse
Affiliation(s)
- Enrico Ferraro
- Centre for Molecular Bioinformatics, Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica, 00133, Rome, Italy.
| | | | | | | | | |
Collapse
|