1
|
Pang E, Lin K. Yeast protein-protein interaction binding sites: prediction from the motif-motif, motif-domain and domain-domain levels. MOLECULAR BIOSYSTEMS 2010; 6:2164-73. [PMID: 20714642 DOI: 10.1039/c0mb00038h] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Interacting proteins can contact with each other at three different levels: by a domain binding to another domain, by a domain binding to a short protein motif, or by a motif binding to another motif. In our previous work, we proposed an approach to predict motif-motif binding sites for the yeast interactome by contrasting high-quality positive interactions and high-quality non-interactions using a simple statistical analysis. Here, we extend this idea to more comprehensively infer binding sites, including domain-domain, domain-motif, and motif-motif interactions. In this study, we integrated 2854 yeast proteins that undergo 13 531 high-quality interactions and 3491 yeast proteins undergoing 578 459 high-quality non-interactions. Overall, we found 6315 significant binding site pairs involving 2371 domains and 637 motifs. Benchmarked using the iPfam, DIP CORE, and MIPS, our inferred results are reliable. Interestingly, some of our predicted binding site pairs may, at least in the yeast genome, guide researchers to assay novel protein-protein interactions by mutagenesis or other experiments. Our work demonstrates that by inferring significant protein-protein binding sites at an aggregate level combining domain-domain, domain-motif and motif-motif levels based on high-quality positive and negative datasets, this method may be capable of identifying the binding site pairs that mediate protein-protein interactions.
Collapse
Affiliation(s)
- Erli Pang
- State Key Laboratory of Earth Surface Processes and Resource Ecology and College of Life Sciences, Beijing Normal University, Beijing 100875, China
| | | |
Collapse
|
2
|
Guo J, Wu X, Zhang DY, Lin K. Genome-wide inference of protein interaction sites: lessons from the yeast high-quality negative protein-protein interaction dataset. Nucleic Acids Res 2008; 36:2002-11. [PMID: 18281313 PMCID: PMC2346601 DOI: 10.1093/nar/gkn016] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
High-throughput studies of protein interactions may have produced, experimentally and computationally, the most comprehensive protein–protein interaction datasets in the completely sequenced genomes. It provides us an opportunity on a proteome scale, to discover the underlying protein interaction patterns. Here, we propose an approach to discovering motif pairs at interaction sites (often 3–8 residues) that are essential for understanding protein functions and helpful for the rational design of protein engineering and folding experiments. A gold standard positive (interacting) dataset and a gold standard negative (non-interacting) dataset were mined to infer the interacting motif pairs that are significantly overrepresented in the positive dataset compared to the negative dataset. Four negative datasets assembled by different strategies were evaluated and the one with the best performance was used as the gold standard negatives for further analysis. Meanwhile, to assess the efficiency of our method in detecting potential interacting motif pairs, other approaches developed previously were compared, and we found that our method achieved the highest prediction accuracy. In addition, many uncharacterized motif pairs of interest were found to be functional with experimental evidence in other species. This investigation demonstrates the important effects of a high-quality negative dataset on the performance of such statistical inference.
Collapse
Affiliation(s)
- Jie Guo
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and College of Life Sciences, Beijing Normal University, Beijing 100875, China
| | | | | | | |
Collapse
|
3
|
van Dijk ADJ, ter Braak CJF, Immink RG, Angenent GC, van Ham RCHJ. Predicting and understanding transcription factor interactions based on sequence level determinants of combinatorial control. ACTA ACUST UNITED AC 2007; 24:26-33. [PMID: 18024974 DOI: 10.1093/bioinformatics/btm539] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Transcription factor interactions are the cornerstone of combinatorial control, which is a crucial aspect of the gene regulatory system. Understanding and predicting transcription factor interactions based on their sequence alone is difficult since they are often part of families of factors sharing high sequence identity. Given the scarcity of experimental data on interactions compared to available sequence data, however, it would be most useful to have accurate methods for the prediction of such interactions. RESULTS We present a method consisting of a Random Forest-based feature-selection procedure that selects relevant motifs out of a set found using a correlated motif search algorithm. Prediction accuracy for several transcription factor families (bZIP, MADS, homeobox and forkhead) reaches 60-90%. In addition, we identified those parts of the sequence that are important for the interaction specificity, and show that these are in agreement with available data. We also used the predictors to perform genome-wide scans for interaction partners and recovered both known and putative new interaction partners.
Collapse
Affiliation(s)
- A D J van Dijk
- Applied Bioinformatics, PRI, Wageningen UR, Droevendaalsesteeg 1, Wageningen, The Netherlands
| | | | | | | | | |
Collapse
|
4
|
Aragues R, Sali A, Bonet J, Marti-Renom MA, Oliva B. Characterization of protein hubs by inferring interacting motifs from protein interactions. PLoS Comput Biol 2007; 3:1761-71. [PMID: 17941705 PMCID: PMC1976338 DOI: 10.1371/journal.pcbi.0030178] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2007] [Accepted: 07/27/2007] [Indexed: 12/19/2022] Open
Abstract
The characterization of protein interactions is essential for understanding biological systems. While genome-scale methods are available for identifying interacting proteins, they do not pinpoint the interacting motifs (e.g., a domain, sequence segments, a binding site, or a set of residues). Here, we develop and apply a method for delineating the interacting motifs of hub proteins (i.e., highly connected proteins). The method relies on the observation that proteins with common interaction partners tend to interact with these partners through a common interacting motif. The sole input for the method are binary protein interactions; neither sequence nor structure information is needed. The approach is evaluated by comparing the inferred interacting motifs with domain families defined for 368 proteins in the Structural Classification of Proteins (SCOP). The positive predictive value of the method for detecting proteins with common SCOP families is 75% at sensitivity of 10%. Most of the inferred interacting motifs were significantly associated with sequence patterns, which could be responsible for the common interactions. We find that yeast hubs with multiple interacting motifs are more likely to be essential than hubs with one or two interacting motifs, thus rationalizing the previously observed correlation between essentiality and the number of interacting partners of a protein. We also find that yeast hubs with multiple interacting motifs evolve slower than the average protein, contrary to the hubs with one or two interacting motifs. The proposed method will help us discover unknown interacting motifs and provide biological insights about protein hubs and their roles in interaction networks. Recent advances in experimental methods have produced a deluge of protein–protein interactions data. However, these methods do not supply information on which specific protein regions are physically in contact during the interactions. Identifying these regions (interfaces) is fundamental for scientific disciplines that require detailed characterizations of protein interactions. In this work, we present a computational method that identifies groups of proteins with similar interfaces. This is achieved by relying on the observation that proteins with common interaction partners tend to interact through similar interfaces. The proposed method retrieves protein interactions from public data repositories and groups proteins that share a sensible number of interacting partners. Proteins within the same group are then labeled with the same “interacting motif” identifier (iMotif). The evaluation performed using known protein domains and structural binding sites suggests that the method is better suited for proteins with multiple interacting partners (hubs). Using yeast data, we show that the cellular essentiality of a gene better correlates with the number of interacting motifs than with the absolute number of interactions.
Collapse
Affiliation(s)
- Ramon Aragues
- Structural Bioinformatics Lab (GRIB), Universitat Pompeu Fabra-IMIM, Barcelona Research Park of Biomedicine (PRBB), Barcelona, Catalonia, Spain
| | - Andrej Sali
- Department of Biopharmaceutical Sciences, University of California San Francisco, San Francisco, California, United States of America
- Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, California, United States of America
- California Institute for Quantitative Biomedical Research, University of California San Francisco, San Francisco, California, United States of America
| | - Jaume Bonet
- Structural Bioinformatics Lab (GRIB), Universitat Pompeu Fabra-IMIM, Barcelona Research Park of Biomedicine (PRBB), Barcelona, Catalonia, Spain
| | - Marc A Marti-Renom
- Structural Genomics Unit, Bioinformatics Department, Centro de Investigación Príncipe Felipe, Valencia, Spain
- * To whom correspondence should be addressed. E-mail: (MAMR); (BO)
| | - Baldo Oliva
- Structural Bioinformatics Lab (GRIB), Universitat Pompeu Fabra-IMIM, Barcelona Research Park of Biomedicine (PRBB), Barcelona, Catalonia, Spain
- * To whom correspondence should be addressed. E-mail: (MAMR); (BO)
| |
Collapse
|
5
|
Henschel A, Winter C, Kim WK, Schroeder M. Using structural motif descriptors for sequence-based binding site prediction. BMC Bioinformatics 2007; 8 Suppl 4:S5. [PMID: 17570148 PMCID: PMC1892084 DOI: 10.1186/1471-2105-8-s4-s5] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Many protein sequences are still poorly annotated. Functional characterization of a protein is often improved by the identification of its interaction partners. Here, we aim to predict protein-protein interactions (PPI) and protein-ligand interactions (PLI) on sequence level using 3D information. To this end, we use machine learning to compile sequential segments that constitute structural features of an interaction site into one profile Hidden Markov Model descriptor. The resulting collection of descriptors can be used to screen sequence databases in order to predict functional sites. RESULTS We generate descriptors for 740 classified types of protein-protein binding sites and for more than 3,000 protein-ligand binding sites. Cross validation reveals that two thirds of the PPI descriptors are sufficiently conserved and significant enough to be used for binding site recognition. We further validate 230 PPIs that were extracted from the literature, where we additionally identify the interface residues. Finally we test ligand-binding descriptors for the case of ATP. From sequences with Swiss-Prot annotation "ATP-binding", we achieve a recall of 25% with a precision of 89%, whereas Prosite's P-loop motif recognizes an equal amount of hits at the expense of a much higher number of false positives (precision: 57%). Our method yields 771 hits with a precision of 96% that were not previously picked up by any Prosite-pattern. CONCLUSION The automatically generated descriptors are a useful complement to known Prosite/InterPro motifs. They serve to predict protein-protein as well as protein-ligand interactions along with their binding site residues for proteins where merely sequence information is available.
Collapse
Affiliation(s)
- Andreas Henschel
- Biotechnological Center, TU Dresden, Tatzberg 47-51, 01307 Dresden, Germany
| | - Christof Winter
- Biotechnological Center, TU Dresden, Tatzberg 47-51, 01307 Dresden, Germany
| | - Wan Kyu Kim
- Biotechnological Center, TU Dresden, Tatzberg 47-51, 01307 Dresden, Germany
- Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX 78712, USA
| | - Michael Schroeder
- Biotechnological Center, TU Dresden, Tatzberg 47-51, 01307 Dresden, Germany
| |
Collapse
|
6
|
Liu H, Liu J. Prediction of domain interactive motif pairs. CONFERENCE PROCEEDINGS : ... ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL CONFERENCE 2007; 2005:7750-3. [PMID: 17282078 DOI: 10.1109/iembs.2005.1616309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
Protein domain-domain interaction pairs supply functional information about the interacting proteins; and finding interaction motif pairs in protein-protein interaction database can deeply disclose the essence of the protein interaction. Up to now, there is little research work on prediction of interaction motif pairs within domain-domain interaction pairs. In this paper, we propose a new method to predict domain interaction motif pairs. We start from collecting contact segment pairs in the PDB protein complexes, and then use the contact segment pairs as seeds to iteratively cluster the protein-protein interaction database with the help of functional domains, finally we generalize the similar segment pair clusters to produce motif pairs. Using our method, we find 528 motif pairs.
Collapse
Affiliation(s)
- Hongbiao Liu
- School of Computer, Wuhan University, Wuhan 430079, China
| | | |
Collapse
|
7
|
Li H, Li J, Wong L. Discovering motif pairs at interaction sites from protein sequences on a proteome-wide scale. Bioinformatics 2006; 22:989-96. [PMID: 16446278 DOI: 10.1093/bioinformatics/btl020] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Protein-protein interaction, mediated by protein interaction sites, is intrinsic to many functional processes in the cell. In this paper, we propose a novel method to discover patterns in protein interaction sites. We observed from protein interaction networks that there exist a kind of significant substructures called interacting protein group pairs, which exhibit an all-versus-all interaction between the two protein-sets in such a pair. The full-interaction between the pair indicates a common interaction mechanism shared by the proteins in the pair, which can be referred as an interaction type. Motif pairs at the interaction sites of the protein group pairs can be used to represent such interaction type, with each motif derived from the sequences of a protein group by standard motif discovery algorithms. The systematic discovery of all pairs of interacting protein groups from large protein interaction networks is a computationally challenging problem. By a careful and sophisticated problem transformation, the problem is solved using efficient algorithms for mining frequent patterns, a problem extensively studied in data mining. RESULTS We found 5349 pairs of interacting protein groups from a yeast interaction dataset. The expected value of sequence identity within the groups is only 7.48%, indicating non-homology within these protein groups. We derived 5343 motif pairs from these group pairs, represented in the form of blocks. Comparing our motifs with domains in the BLOCKS and PRINTS databases, we found that our blocks could be mapped to an average of 3.08 correlated blocks in these two databases. The mapped blocks occur 4221 out of total 6794 domains (protein groups) in these two databases. Comparing our motif pairs with iPfam consisting of 3045 interacting domain pairs derived from PDB, we found 47 matches occurring in 105 distinct PDB complexes. Comparing with another putative domain interaction database InterDom, we found 203 matches. AVAILABILITY http://research.i2r.a-star.edu.sg/BindingMotifPairs/resources. SUPPLEMENTARY INFORMATION http://research.i2r.a-star.edu.sg/BindingMotifPairs and Bioinformatics online.
Collapse
Affiliation(s)
- Haiquan Li
- Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore 119613, Singapore
| | | | | |
Collapse
|
8
|
Fang J, Haasl RJ, Dong Y, Lushington GH. Discover protein sequence signatures from protein-protein interaction data. BMC Bioinformatics 2005; 6:277. [PMID: 16305745 PMCID: PMC1310605 DOI: 10.1186/1471-2105-6-277] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2005] [Accepted: 11/23/2005] [Indexed: 12/13/2022] Open
Abstract
Background The development of high-throughput technologies such as yeast two-hybrid systems and mass spectrometry technologies has made it possible to generate large protein-protein interaction (PPI) datasets. Mining these datasets for underlying biological knowledge has, however, remained a challenge. Results A total of 3108 sequence signatures were found, each of which was shared by a set of guest proteins interacting with one of 944 host proteins in Saccharomyces cerevisiae genome. Approximately 94% of these sequence signatures matched entries in InterPro member databases. We identified 84 distinct sequence signatures from the remaining 172 unknown signatures. The signature sharing information was then applied in predicting sub-cellular localization of yeast proteins and the novel signatures were used in identifying possible interacting sites. Conclusion We reported a method of PPI data mining that facilitated the discovery of novel sequence signatures using a large PPI dataset from S. cerevisiae genome as input. The fact that 94% of discovered signatures were known validated the ability of the approach to identify large numbers of signatures from PPI data. The significance of these discovered signatures was demonstrated by their application in predicting sub-cellular localizations and identifying potential interaction binding sites of yeast proteins.
Collapse
Affiliation(s)
- Jianwen Fang
- Bioinformatics Core Facility, University of Kansas, Lawrence, KS 66045, USA
- Information and Telecommunication Technology Center, University of Kansas, Lawrence, KS 66045, USA
| | - Ryan J Haasl
- Bioinformatics Core Facility, University of Kansas, Lawrence, KS 66045, USA
| | - Yinghua Dong
- Bioinformatics Core Facility, University of Kansas, Lawrence, KS 66045, USA
| | - Gerald H Lushington
- Bioinformatics Core Facility, University of Kansas, Lawrence, KS 66045, USA
- Molecular Graphics and Modeling Laboratory, University of Kansas, Lawrence, KS 66045, USA
| |
Collapse
|
9
|
Mintz S, Shulman-Peleg A, Wolfson HJ, Nussinov R. Generation and analysis of a protein-protein interface data set with similar chemical and spatial patterns of interactions. Proteins 2005; 61:6-20. [PMID: 16184518 DOI: 10.1002/prot.20580] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Protein-protein interfaces are regions between 2 polypeptide chains that are not covalently connected. Here, we have created a nonredundant interface data set generated from all 2-chain interfaces in the Protein Data Bank. This data set is unique, since it contains clusters of interfaces with similar shapes and spatial organization of chemical functional groups. The data set allows statistical investigation of similar interfaces, as well as the identification and analysis of the chemical forces that account for the protein-protein associations. Toward this goal, we have developed I2I-SiteEngine (Interface-to-Interface SiteEngine) [Data set available at http://bioinfo3d.cs.tau.ac.il/Interfaces; Web server: http://bioinfo3d.cs.tau.ac.il/I2I-SiteEngine]. The algorithm recognizes similarities between protein-protein binding surfaces. I2I-SiteEngine is independent of the sequence or the fold of the proteins that comprise the interfaces. In addition to geometry, the method takes into account both the backbone and the side-chain physicochemical properties of the interacting atom groups. Its high efficiency makes it suitable for large-scale database searches and classifications. Below, we briefly describe the I2I-SiteEngine method. We focus on the classification process and the obtained nonredundant protein-protein interface data set. In particular, we analyze the biological significance of the clusters and present examples which illustrate that given constellations of chemical groups in protein-protein binding sites may be preferred, and are observed in proteins with different structures and different functions. We expect that these would yield further information regarding the forces stabilizing protein-protein interactions.
Collapse
Affiliation(s)
- Shira Mintz
- Sackler Institute of Molecular Medicine, Department of Human Genetics and Molecular Medicine, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | | | | | | |
Collapse
|