1
|
Topcu E, Ridgeway NH, Biggar KK. PeSA 2.0: A software tool for peptide specificity analysis implementing positive and negative motifs and motif-based peptide scoring. Comput Biol Chem 2022; 101:107753. [PMID: 35998543 DOI: 10.1016/j.compbiolchem.2022.107753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 08/05/2022] [Accepted: 08/09/2022] [Indexed: 11/26/2022]
Abstract
There are a vast number of molecular interactions that occur at the cellular level. Among these molecular interactions, interactions between multiple proteins are a widely studied area of research due to the importance of these interactions in cellular function and their potential in drug development. PeSA is a desktop application developed to facilitate the in vitro peptide study analysis to predict protein-protein interactions. PeSA can effortlessly generate visual outputs like motifs, bar charts, and visual matrices. Our implementation of PeSA version 2.0 includes additional tools, including the ability to further score peptide lists for consensus amongst interactions. The software is also able to design de novo peptides based on sequence motifs (sequence generator), which can be used to help design additional experiments for motif validation. Further, the efficacy of the sequence generator was validated using the lysine methyltransferase, SETD8, to identify new substrates of methylation based on motif-based predictions developed using PeSA2.0.
Collapse
Affiliation(s)
- Emine Topcu
- Institute of Biochemistry and Department of Biology, Carleton University, 1125 Colonel By Drive, Ottawa, Ontario K1N 5B6, Canada
| | - Nashira H Ridgeway
- Institute of Biochemistry and Department of Biology, Carleton University, 1125 Colonel By Drive, Ottawa, Ontario K1N 5B6, Canada
| | - Kyle K Biggar
- Institute of Biochemistry and Department of Biology, Carleton University, 1125 Colonel By Drive, Ottawa, Ontario K1N 5B6, Canada.
| |
Collapse
|
2
|
Casadio R, Martelli PL, Savojardo C. Machine learning solutions for predicting protein–protein interactions. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Rita Casadio
- Biocomputing Group University of Bologna Bologna Italy
| | | | | |
Collapse
|
3
|
Wang P, Zhang G, Yu ZG, Huang G. A Deep Learning and XGBoost-Based Method for Predicting Protein-Protein Interaction Sites. Front Genet 2021; 12:752732. [PMID: 34764983 PMCID: PMC8576272 DOI: 10.3389/fgene.2021.752732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 09/20/2021] [Indexed: 11/29/2022] Open
Abstract
Knowledge about protein-protein interactions is beneficial in understanding cellular mechanisms. Protein-protein interactions are usually determined according to their protein-protein interaction sites. Due to the limitations of current techniques, it is still a challenging task to detect protein-protein interaction sites. In this article, we presented a method based on deep learning and XGBoost (called DeepPPISP-XGB) for predicting protein-protein interaction sites. The deep learning model served as a feature extractor to remove redundant information from protein sequences. The Extreme Gradient Boosting algorithm was used to construct a classifier for predicting protein-protein interaction sites. The DeepPPISP-XGB achieved the following results: area under the receiver operating characteristic curve of 0.681, a recall of 0.624, and area under the precision-recall curve of 0.339, being competitive with the state-of-the-art methods. We also validated the positive role of global features in predicting protein-protein interaction sites.
Collapse
Affiliation(s)
- Pan Wang
- School of Electrical Engineering, Shaoyang University, Shaoyang, China
| | - Guiyang Zhang
- School of Electrical Engineering, Shaoyang University, Shaoyang, China
| | - Zu-Guo Yu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, China
| | - Guohua Huang
- School of Electrical Engineering, Shaoyang University, Shaoyang, China
| |
Collapse
|
4
|
Slater O, Miller B, Kontoyianni M. Decoding Protein-protein Interactions: An Overview. Curr Top Med Chem 2021; 20:855-882. [PMID: 32101126 DOI: 10.2174/1568026620666200226105312] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2019] [Revised: 11/27/2019] [Accepted: 11/27/2019] [Indexed: 12/24/2022]
Abstract
Drug discovery has focused on the paradigm "one drug, one target" for a long time. However, small molecules can act at multiple macromolecular targets, which serves as the basis for drug repurposing. In an effort to expand the target space, and given advances in X-ray crystallography, protein-protein interactions have become an emerging focus area of drug discovery enterprises. Proteins interact with other biomolecules and it is this intricate network of interactions that determines the behavior of the system and its biological processes. In this review, we briefly discuss networks in disease, followed by computational methods for protein-protein complex prediction. Computational methodologies and techniques employed towards objectives such as protein-protein docking, protein-protein interactions, and interface predictions are described extensively. Docking aims at producing a complex between proteins, while interface predictions identify a subset of residues on one protein that could interact with a partner, and protein-protein interaction sites address whether two proteins interact. In addition, approaches to predict hot spots and binding sites are presented along with a representative example of our internal project on the chemokine CXC receptor 3 B-isoform and predictive modeling with IP10 and PF4.
Collapse
Affiliation(s)
- Olivia Slater
- Department of Pharmaceutical Sciences, Southern Illinois University, Edwardsville, IL 62026, United States
| | - Bethany Miller
- Department of Pharmaceutical Sciences, Southern Illinois University, Edwardsville, IL 62026, United States
| | - Maria Kontoyianni
- Department of Pharmaceutical Sciences, Southern Illinois University, Edwardsville, IL 62026, United States
| |
Collapse
|
5
|
Savojardo C, Martelli PL, Casadio R. Protein–Protein Interaction Methods and Protein Phase Separation. Annu Rev Biomed Data Sci 2020. [DOI: 10.1146/annurev-biodatasci-011720-104428] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In the last decade, newly developed experimental methods have made it possible to highlight that macromolecules in the cell milieu physically interact to support physiology. This has shifted the problem of protein–protein interaction from a microscopic, electron-density scale to a mesoscopic one. Further, nowadays there is increasing evidence that proteins in the nucleus and in the cytoplasm can aggregate in membraneless organelles for different physiological reasons. In this scenario, it is urgent to face the problem of biomolecule functional annotation with efficient computational methods, suited to extract knowledge from reliable data and transfer information across different domains of investigation. Here, we revise the present state of the art of our knowledge of protein–protein interaction and the computational methods that differently implement it. Furthermore, we explore experimental and computational features of a set of proteins involved in phase separation.
Collapse
Affiliation(s)
- Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology and Interdepartmental Center “Luigi Galvani” for Integrated Studies of Bioinformatics, Biophysics, and Biocomplexity, University of Bologna, 40126 Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology and Interdepartmental Center “Luigi Galvani” for Integrated Studies of Bioinformatics, Biophysics, and Biocomplexity, University of Bologna, 40126 Bologna, Italy
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology and Interdepartmental Center “Luigi Galvani” for Integrated Studies of Bioinformatics, Biophysics, and Biocomplexity, University of Bologna, 40126 Bologna, Italy
- Institute of Biomembranes, Bioenergetics, and Molecular Biotechnologies (IBIOM), Italian National Research Council (CNR), 70126 Bari, Italy
| |
Collapse
|
6
|
Xie Z, Deng X, Shu K. Prediction of Protein-Protein Interaction Sites Using Convolutional Neural Network and Improved Data Sets. Int J Mol Sci 2020; 21:E467. [PMID: 31940793 PMCID: PMC7013409 DOI: 10.3390/ijms21020467] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2019] [Revised: 12/23/2019] [Accepted: 01/08/2020] [Indexed: 12/20/2022] Open
Abstract
Protein-protein interaction (PPI) sites play a key role in the formation of protein complexes, which is the basis of a variety of biological processes. Experimental methods to solve PPI sites are expensive and time-consuming, which has led to the development of different kinds of prediction algorithms. We propose a convolutional neural network for PPI site prediction and use residue binding propensity to improve the positive samples. Our method obtains a remarkable result of the area under the curve (AUC) = 0.912 on the improved data set. In addition, it yields much better results on samples with high binding propensity than on randomly selected samples. This suggests that there are considerable false-positive PPI sites in the positive samples defined by the distance between residue atoms.
Collapse
Affiliation(s)
- Zengyan Xie
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;
| | | | - Kunxian Shu
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;
| |
Collapse
|
7
|
Barreto CAV, Baptista SJ, Preto AJ, Matos-Filipe P, Mourão J, Melo R, Moreira I. Prediction and targeting of GPCR oligomer interfaces. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2020; 169:105-149. [PMID: 31952684 DOI: 10.1016/bs.pmbts.2019.11.007] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
GPCR oligomerization has emerged as a hot topic in the GPCR field in the last years. Receptors that are part of these oligomers can influence each other's function, although it is not yet entirely understood how these interactions work. The existence of such a highly complex network of interactions between GPCRs generates the possibility of alternative targets for new therapeutic approaches. However, challenges still exist in the characterization of these complexes, especially at the interface level. Different experimental approaches, such as FRET or BRET, are usually combined to study GPCR oligomer interactions. Computational methods have been applied as a useful tool for retrieving information from GPCR sequences and the few X-ray-resolved oligomeric structures that are accessible, as well as for predicting new and trustworthy GPCR oligomeric interfaces. Machine-learning (ML) approaches have recently helped with some hindrances of other methods. By joining and evaluating multiple structure-, sequence- and co-evolution-based features on the same algorithm, it is possible to dilute the issues of particular structures and residues that arise from the experimental methodology into all-encompassing algorithms capable of accurately predict GPCR-GPCR interfaces. All these methods used as a single or a combined approach provide useful information about GPCR oligomerization and its role in GPCR function and dynamics. Altogether, we present experimental, computational and machine-learning methods used to study oligomers interfaces, as well as strategies that have been used to target these dynamic complexes.
Collapse
Affiliation(s)
- Carlos A V Barreto
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal
| | - Salete J Baptista
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal; Centro de Ciências e Tecnologias Nucleares, Instituto Superior Técnico, Universidade de Lisboa, CTN, LRS, Portugal
| | - António José Preto
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal
| | - Pedro Matos-Filipe
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal
| | - Joana Mourão
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal; Institute for Interdisciplinary Research, University of Coimbra, Coimbra, Portugal
| | - Rita Melo
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal; Centro de Ciências e Tecnologias Nucleares, Instituto Superior Técnico, Universidade de Lisboa, CTN, LRS, Portugal
| | - Irina Moreira
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal; Science and Technology Faculty, University of Coimbra, Coimbra, Portugal.
| |
Collapse
|
8
|
Gil N, Fajardo EJ, Fiser A. Discovery of receptor-ligand interfaces in the immunoglobulin superfamily. Proteins 2019; 88:135-142. [PMID: 31298437 DOI: 10.1002/prot.25778] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2019] [Revised: 06/21/2019] [Accepted: 07/06/2019] [Indexed: 12/13/2022]
Abstract
Cell-surface-anchored immunoglobulin superfamily (IgSF) proteins are widespread throughout the human proteome, forming crucial components of diverse biological processes including immunity, cell-cell adhesion, and carcinogenesis. IgSF proteins generally function through protein-protein interactions carried out between extracellular, membrane-bound proteins on adjacent cells, known as trans-binding interfaces. These protein-protein interactions constitute a class of pharmaceutical targets important in the treatment of autoimmune diseases, chronic infections, and cancer. A molecular-level understanding of IgSF protein-protein interactions would greatly benefit further drug development. A critical step toward this goal is the reliable identification of IgSF trans-binding interfaces. We propose a novel combination of structure and sequence information to identify trans-binding interfaces in IgSF proteins. We developed a structure-based binding interface prediction approach that can identify broad regions of the protein surface that encompass the binding interfaces and suggests that IgSF proteins possess binding supersites. These interfaces could theoretically be pinpointed using sequence-based conservation analysis, with performance approaching the theoretical upper limit of binding interface prediction accuracy, but achieving this in practice is limited by the current ability to identify an appropriate multiple sequence alignment for conservation analysis. However, an important contribution of combining the two orthogonal methods is that agreement between these approaches can estimate the reliability of the predictions. This approach was benchmarked on the set of 22 IgSF proteins with experimentally solved structures in complex with their ligands. Additionally, we provide structure-based predictions and reliability scores for the 62 IgSF proteins with known structure but yet uncharacterized binding interfaces.
Collapse
Affiliation(s)
- Nelson Gil
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York.,Department of Biochemistry, Albert Einstein College of Medicine, Bronx, New York
| | - Eduardo J Fajardo
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York.,Department of Biochemistry, Albert Einstein College of Medicine, Bronx, New York
| | - Andras Fiser
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York.,Department of Biochemistry, Albert Einstein College of Medicine, Bronx, New York
| |
Collapse
|
9
|
Wong ETC, Gsponer J. Predicting Protein-Protein Interfaces that Bind Intrinsically Disordered Protein Regions. J Mol Biol 2019; 431:3157-3178. [PMID: 31207240 DOI: 10.1016/j.jmb.2019.06.010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2019] [Revised: 06/01/2019] [Accepted: 06/04/2019] [Indexed: 12/18/2022]
Abstract
A long-standing goal in biology is the complete annotation of function and structure on all protein-protein interactions, a large fraction of which is mediated by intrinsically disordered protein regions (IDRs). However, knowledge derived from experimental structures of such protein complexes is disproportionately small due, in part, to challenges in studying interactions of IDRs. Here, we introduce IDRBind, a computational method that by combining gradient boosted trees and conditional random field models predicts binding sites of IDRs with performance approaching state-of-the-art globular interface predictions, making it suitable for proteome-wide applications. Although designed and trained with a focus on molecular recognition features, which are long interaction-mediating-elements in IDRs, IDRBind also predicts the binding sites of short peptides more accurately than existing specialized predictors. Consistent with IDRBind's specificity, a comparison of protein interface categories uncovered uniform trends in multiple physicochemical properties, positioning molecular recognition feature interfaces between peptide and globular interfaces.
Collapse
Affiliation(s)
- Eric T C Wong
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada; Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, BC, Canada
| | - Jörg Gsponer
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada; Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, BC, Canada.
| |
Collapse
|
10
|
Identification of the retinoschisin-binding site on the retinal Na/K-ATPase. PLoS One 2019; 14:e0216320. [PMID: 31048931 PMCID: PMC6497308 DOI: 10.1371/journal.pone.0216320] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2018] [Accepted: 04/19/2019] [Indexed: 01/11/2023] Open
Abstract
X-linked juvenile retinoschisis (XLRS) is a hereditary retinal dystrophy, caused by mutations in the RS1 gene which encodes the secreted protein retinoschisin. In recent years, several molecules have been proposed to interact with retinoschisin, including the retinal Na/K-ATPase, L-voltage gated Ca2+ channels, and specific sugars. We recently showed that the retinal Na/K-ATPase consisting of subunits ATP1A3 and ATP1B2 is essential for anchoring retinoschisin to plasma membranes and identified the glycosylated ATP1B2 subunit as the direct interaction partner for retinoschisin. We now aimed to precisely map the retinoschisin binding domain(s) in ATP1B2. In general, retinoschisin binding was not affected after selective elimination of individual glycosylation sites via site-directed mutagenesis as well as after full enzymatic deglycosylation of ATP1B2. Applying the interface prediction tool PresCont, two putative protein-protein interaction patches (“patch I” and “patch II”) consisting each of four hydrophobic amino acid stretches on the ATP1B2 surface were identified. These were consecutively altered by site-directed mutagenesis. Functional assays with the ATP1B2 patch mutants identified patch II and, specifically, the associated amino acid at position 240 (harboring a threonine in ATP1B2) as crucial for retinoschisin binding to ATP1B2. These and previous results led us to suggest an induced-fit binding mechanism for the interaction between retinoschisin and the Na/K-ATPase, which is dependent on threonine 240 in ATP1B2 allowing the accommodation of hyperflexible retinoschisin spikes by the associated protein-protein interaction patch on ATP1B2.
Collapse
|
11
|
Zeng B, Hönigschmid P, Frishman D. Residue co-evolution helps predict interaction sites in α-helical membrane proteins. J Struct Biol 2019; 206:156-169. [DOI: 10.1016/j.jsb.2019.02.009] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2018] [Revised: 01/30/2019] [Accepted: 02/13/2019] [Indexed: 11/29/2022]
|
12
|
Straub K, Merkl R. Ancestral Sequence Reconstruction as a Tool for the Elucidation of a Stepwise Evolutionary Adaptation. Methods Mol Biol 2019; 1851:171-182. [PMID: 30298397 DOI: 10.1007/978-1-4939-8736-8_9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Ancestral sequence reconstruction (ASR) is a powerful tool to infer primordial sequences from contemporary, i.e., extant ones. An essential element of ASR is the computation of a phylogenetic tree whose leaves are the chosen extant sequences. Most often, the reconstructed sequence related to the root of this tree is of greatest interest: It represents the common ancestor (CA) of the sequences under study. If this sequence encodes a protein, one can "resurrect" the CA by means of gene synthesis technology and study biochemical properties of this extinct predecessor with the help of wet-lab experiments.However, ASR deduces also sequences for all internal nodes of the tree, and the well-considered analysis of these "intermediates" can help to elucidate evolutionary processes. Moreover, one can identify key mutations that alter proteins or protein complexes and are responsible for the differing properties of extant proteins. As an illustrative example, we describe the protocol for the rapid identification of hotspots determining the binding of the two subunits within the heteromeric complex imidazole glycerol phosphate synthase.
Collapse
Affiliation(s)
- Kristina Straub
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Regensburg, Germany
| | - Rainer Merkl
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Regensburg, Germany.
| |
Collapse
|
13
|
Macalino SJY, Basith S, Clavio NAB, Chang H, Kang S, Choi S. Evolution of In Silico Strategies for Protein-Protein Interaction Drug Discovery. Molecules 2018; 23:E1963. [PMID: 30082644 PMCID: PMC6222862 DOI: 10.3390/molecules23081963] [Citation(s) in RCA: 64] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2018] [Revised: 08/03/2018] [Accepted: 08/04/2018] [Indexed: 12/14/2022] Open
Abstract
The advent of advanced molecular modeling software, big data analytics, and high-speed processing units has led to the exponential evolution of modern drug discovery and better insights into complex biological processes and disease networks. This has progressively steered current research interests to understanding protein-protein interaction (PPI) systems that are related to a number of relevant diseases, such as cancer, neurological illnesses, metabolic disorders, etc. However, targeting PPIs are challenging due to their "undruggable" binding interfaces. In this review, we focus on the current obstacles that impede PPI drug discovery, and how recent discoveries and advances in in silico approaches can alleviate these barriers to expedite the search for potential leads, as shown in several exemplary studies. We will also discuss about currently available information on PPI compounds and systems, along with their usefulness in molecular modeling. Finally, we conclude by presenting the limits of in silico application in drug discovery and offer a perspective in the field of computer-aided PPI drug discovery.
Collapse
Affiliation(s)
- Stephani Joy Y Macalino
- College of Pharmacy and Graduate School of Pharmaceutical Sciences, Ewha Womans University, Seoul 03760, Korea.
| | - Shaherin Basith
- College of Pharmacy and Graduate School of Pharmaceutical Sciences, Ewha Womans University, Seoul 03760, Korea.
| | - Nina Abigail B Clavio
- College of Pharmacy and Graduate School of Pharmaceutical Sciences, Ewha Womans University, Seoul 03760, Korea.
| | - Hyerim Chang
- College of Pharmacy and Graduate School of Pharmaceutical Sciences, Ewha Womans University, Seoul 03760, Korea.
| | - Soosung Kang
- College of Pharmacy and Graduate School of Pharmaceutical Sciences, Ewha Womans University, Seoul 03760, Korea.
| | - Sun Choi
- College of Pharmacy and Graduate School of Pharmaceutical Sciences, Ewha Womans University, Seoul 03760, Korea.
| |
Collapse
|
14
|
Jelínek J, Škoda P, Hoksza D. Utilizing knowledge base of amino acids structural neighborhoods to predict protein-protein interaction sites. BMC Bioinformatics 2017; 18:492. [PMID: 29244012 PMCID: PMC5731498 DOI: 10.1186/s12859-017-1921-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein-protein interactions (PPI) play a key role in an investigation of various biochemical processes, and their identification is thus of great importance. Although computational prediction of which amino acids take part in a PPI has been an active field of research for some time, the quality of in-silico methods is still far from perfect. RESULTS We have developed a novel prediction method called INSPiRE which benefits from a knowledge base built from data available in Protein Data Bank. All proteins involved in PPIs were converted into labeled graphs with nodes corresponding to amino acids and edges to pairs of neighboring amino acids. A structural neighborhood of each node was then encoded into a bit string and stored in the knowledge base. When predicting PPIs, INSPiRE labels amino acids of unknown proteins as interface or non-interface based on how often their structural neighborhood appears as interface or non-interface in the knowledge base. We evaluated INSPiRE's behavior with respect to different types and sizes of the structural neighborhood. Furthermore, we examined the suitability of several different features for labeling the nodes. Our evaluations showed that INSPiRE clearly outperforms existing methods with respect to Matthews correlation coefficient. CONCLUSION In this paper we introduce a new knowledge-based method for identification of protein-protein interaction sites called INSPiRE. Its knowledge base utilizes structural patterns of known interaction sites in the Protein Data Bank which are then used for PPI prediction. Extensive experiments on several well-established datasets show that INSPiRE significantly surpasses existing PPI approaches.
Collapse
Affiliation(s)
- Jan Jelínek
- Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Ke Karlovu 3, Prague 2, Czech Republic
| | - Petr Škoda
- Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Ke Karlovu 3, Prague 2, Czech Republic
| | - David Hoksza
- Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Ke Karlovu 3, Prague 2, Czech Republic
| |
Collapse
|
15
|
Jiao X, Ranganathan S. Prediction of interface residue based on the features of residue interaction network. J Theor Biol 2017; 432:49-54. [PMID: 28818468 DOI: 10.1016/j.jtbi.2017.08.014] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2017] [Revised: 07/31/2017] [Accepted: 08/13/2017] [Indexed: 10/19/2022]
Abstract
Protein-protein interaction plays a crucial role in the cellular biological processes. Interface prediction can improve our understanding of the molecular mechanisms of the related processes and functions. In this work, we propose a classification method to recognize the interface residue based on the features of a weighted residue interaction network. The random forest algorithm is used for the prediction and 16 network parameters and the B-factor are acting as the element of the input feature vector. Compared with other similar work, the method is feasible and effective. The relative importance of these features also be analyzed to identify the key feature for the prediction. Some biological meaning of the important feature is explained. The results of this work can be used for the related work about the structure-function relationship analysis via a residue interaction network model.
Collapse
Affiliation(s)
- Xiong Jiao
- Institute of Applied Mechanics and Biomedical Engineering, College of Mechanics, Taiyuan University of Technology, Taiyuan 030024, China; Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, New South Wales 2109, Australia.
| | - Shoba Ranganathan
- Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, New South Wales 2109, Australia
| |
Collapse
|
16
|
Murakami Y, Tripathi LP, Prathipati P, Mizuguchi K. Network analysis and in silico prediction of protein-protein interactions with applications in drug discovery. Curr Opin Struct Biol 2017; 44:134-142. [PMID: 28364585 DOI: 10.1016/j.sbi.2017.02.005] [Citation(s) in RCA: 71] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2016] [Revised: 02/05/2017] [Accepted: 02/23/2017] [Indexed: 11/29/2022]
Abstract
Protein-protein interactions (PPIs) are vital to maintaining cellular homeostasis. Several PPI dysregulations have been implicated in the etiology of various diseases and hence PPIs have emerged as promising targets for drug discovery. Surface residues and hotspot residues at the interface of PPIs form the core regions, which play a key role in modulating cellular processes such as signal transduction and are used as starting points for drug design. In this review, we briefly discuss how PPI networks (PPINs) inferred from experimentally characterized PPI data have been utilized for knowledge discovery and how in silico approaches to PPI characterization can contribute to PPIN-based biological research. Next, we describe the principles of in silico PPI prediction and survey the existing PPI and PPI site prediction servers that are useful for drug discovery. Finally, we discuss the potential of in silico PPI prediction in drug discovery.
Collapse
Affiliation(s)
- Yoichi Murakami
- National Institutes of Biomedical Innovation, Health and Nutrition, 7-6-8 Saito Asagi, Ibaraki, Osaka 567-0085, Japan.
| | - Lokesh P Tripathi
- National Institutes of Biomedical Innovation, Health and Nutrition, 7-6-8 Saito Asagi, Ibaraki, Osaka 567-0085, Japan.
| | - Philip Prathipati
- National Institutes of Biomedical Innovation, Health and Nutrition, 7-6-8 Saito Asagi, Ibaraki, Osaka 567-0085, Japan
| | - Kenji Mizuguchi
- National Institutes of Biomedical Innovation, Health and Nutrition, 7-6-8 Saito Asagi, Ibaraki, Osaka 567-0085, Japan.
| |
Collapse
|
17
|
Ripoche H, Laine E, Ceres N, Carbone A. JET2 Viewer: a database of predicted multiple, possibly overlapping, protein-protein interaction sites for PDB structures. Nucleic Acids Res 2017; 45:D236-D242. [PMID: 27899675 PMCID: PMC5210541 DOI: 10.1093/nar/gkw1053] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2016] [Revised: 10/18/2016] [Accepted: 10/20/2016] [Indexed: 11/13/2022] Open
Abstract
The database JET2 Viewer, openly accessible at http://www.jet2viewer.upmc.fr/, reports putative protein binding sites for all three-dimensional (3D) structures available in the Protein Data Bank (PDB). This knowledge base was generated by applying the computational method JET2 at large-scale on more than 20 000 chains. JET2 strategy yields very precise predictions of interacting surfaces and unravels their evolutionary process and complexity. JET2 Viewer provides an online intelligent display, including interactive 3D visualization of the binding sites mapped onto PDB structures and suitable files recording JET2 analyses. Predictions were evaluated on more than 15 000 experimentally characterized protein interfaces. This is, to our knowledge, the largest evaluation of a protein binding site prediction method. The overall performance of JET2 on all interfaces are: Sen = 52.52, PPV = 51.24, Spe = 80.05, Acc = 75.89. The data can be used to foster new strategies for protein-protein interactions modulation and interaction surface redesign.
Collapse
Affiliation(s)
- Hugues Ripoche
- Sorbonne Universités, UPMC University Paris 06, CNRS, IBPS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France
| | - Elodie Laine
- Sorbonne Universités, UPMC University Paris 06, CNRS, IBPS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France
| | - Nicoletta Ceres
- CNRS UMR 5086/University Lyon I, Institut de Biologie et Chimie des Proteines, 69367 Lyon, France
| | - Alessandra Carbone
- Sorbonne Universités, UPMC University Paris 06, CNRS, IBPS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France .,Institut Universitaire de France, 75005 Paris, France
| |
Collapse
|
18
|
Tonddast-Navaei S, Skolnick J. Are protein-protein interfaces special regions on a protein's surface? J Chem Phys 2016; 143:243149. [PMID: 26723634 DOI: 10.1063/1.4937428] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Protein-protein interactions (PPIs) are involved in many cellular processes. Experimentally obtained protein quaternary structures provide the location of protein-protein interfaces, the surface region of a given protein that interacts with another. These regions are termed half-interfaces (HIs). Canonical HIs cover roughly one third of a protein's surface and were found to have more hydrophobic residues than the non-interface surface region. In addition, the classical view of protein HIs was that there are a few (if not one) HIs per protein that are structurally and chemically unique. However, on average, a given protein interacts with at least a dozen others. This raises the question of whether they use the same or other HIs. By copying HIs from monomers with the same folds in solved quaternary structures, we introduce the concept of geometric HIs (HIs whose geometry has a significant match to other known interfaces) and show that on average they cover three quarters of a protein's surface. We then demonstrate that in some cases, these geometric HI could result in real physical interactions (which may or may not be biologically relevant). The composition of the new HIs is on average more charged compared to most known ones, suggesting that the current protein interface database is biased towards more hydrophobic, possibly more obligate, complexes. Finally, our results provide evidence for interface fuzziness and PPI promiscuity. Thus, the classical view of unique, well defined HIs needs to be revisited as HIs are another example of coarse-graining that is used by nature.
Collapse
Affiliation(s)
- Sam Tonddast-Navaei
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, 250 14th Street N.W., Atlanta, Georgia 30318, USA
| | - Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, 250 14th Street N.W., Atlanta, Georgia 30318, USA
| |
Collapse
|
19
|
Esmaielbeiki R, Krawczyk K, Knapp B, Nebel JC, Deane CM. Progress and challenges in predicting protein interfaces. Brief Bioinform 2016; 17:117-31. [PMID: 25971595 PMCID: PMC4719070 DOI: 10.1093/bib/bbv027] [Citation(s) in RCA: 85] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Revised: 03/18/2015] [Indexed: 12/31/2022] Open
Abstract
The majority of biological processes are mediated via protein-protein interactions. Determination of residues participating in such interactions improves our understanding of molecular mechanisms and facilitates the development of therapeutics. Experimental approaches to identifying interacting residues, such as mutagenesis, are costly and time-consuming and thus, computational methods for this purpose could streamline conventional pipelines. Here we review the field of computational protein interface prediction. We make a distinction between methods which address proteins in general and those targeted at antibodies, owing to the radically different binding mechanism of antibodies. We organize the multitude of currently available methods hierarchically based on required input and prediction principles to provide an overview of the field.
Collapse
|
20
|
Wierschin T, Wang K, Welter M, Waack S, Stanke M. Combining features in a graphical model to predict protein binding sites. Proteins 2015; 83:844-52. [PMID: 25663045 DOI: 10.1002/prot.24775] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2014] [Revised: 01/16/2015] [Accepted: 01/26/2015] [Indexed: 11/08/2022]
Abstract
Large efforts have been made in classifying residues as binding sites in proteins using machine learning methods. The prediction task can be translated into the computational challenge of assigning each residue the label binding site or non-binding site. Observational data comes from various possibly highly correlated sources. It includes the structure of the protein but not the structure of the complex. The model class of conditional random fields (CRFs) has previously successfully been used for protein binding site prediction. Here, a new CRF-approach is presented that models the dependencies of residues using a general graphical structure defined as a neighborhood graph and thus our model makes fewer independence assumptions on the labels than sequential labeling approaches. A novel node feature "change in free energy" is introduced into the model, which is then denoted by ΔF-CRF. Parameters are trained with an online large-margin algorithm. Using the standard feature class relative accessible surface area alone, the general graph-structure CRF already achieves higher prediction accuracy than the linear chain CRF of Li et al. ΔF-CRF performs significantly better on a large range of false positive rates than the support-vector-machine-based program PresCont of Zellner et al. on a homodimer set containing 128 chains. ΔF-CRF has a broader scope than PresCont since it is not constrained to protein subgroups and requires no multiple sequence alignment. The improvement is attributed to the advantageous combination of the novel node feature with the standard feature and to the adopted parameter training method.
Collapse
Affiliation(s)
- Torsten Wierschin
- Institute of Mathematics and Computer Science, University of Greifswald, 17487, Greifswald, Germany
| | | | | | | | | |
Collapse
|
21
|
Aumentado-Armstrong TT, Istrate B, Murgita RA. Algorithmic approaches to protein-protein interaction site prediction. Algorithms Mol Biol 2015; 10:7. [PMID: 25713596 PMCID: PMC4338852 DOI: 10.1186/s13015-015-0033-9] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2014] [Accepted: 01/07/2015] [Indexed: 12/19/2022] Open
Abstract
Interaction sites on protein surfaces mediate virtually all biological activities, and their identification holds promise for disease treatment and drug design. Novel algorithmic approaches for the prediction of these sites have been produced at a rapid rate, and the field has seen significant advancement over the past decade. However, the most current methods have not yet been reviewed in a systematic and comprehensive fashion. Herein, we describe the intricacies of the biological theory, datasets, and features required for modern protein-protein interaction site (PPIS) prediction, and present an integrative analysis of the state-of-the-art algorithms and their performance. First, the major sources of data used by predictors are reviewed, including training sets, evaluation sets, and methods for their procurement. Then, the features employed and their importance in the biological characterization of PPISs are explored. This is followed by a discussion of the methodologies adopted in contemporary prediction programs, as well as their relative performance on the datasets most recently used for evaluation. In addition, the potential utility that PPIS identification holds for rational drug design, hotspot prediction, and computational molecular docking is described. Finally, an analysis of the most promising areas for future development of the field is presented.
Collapse
|
22
|
Topham CM, Smith JC. Tri-peptide reference structures for the calculation of relative solvent accessible surface area in protein amino acid residues. Comput Biol Chem 2014; 54:33-43. [PMID: 25544680 DOI: 10.1016/j.compbiolchem.2014.11.007] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2014] [Revised: 11/23/2014] [Accepted: 11/30/2014] [Indexed: 10/24/2022]
Abstract
Relative amino acid residue solvent accessibility values allow the quantitative comparison of atomic solvent-accessible surface areas in different residue types and physical environments in proteins and in protein structural alignments. Geometry-optimised tri-peptide structures in extended solvent-exposed reference conformations have been obtained for 43 amino acid residue types at a high level of quantum chemical theory. Significant increases in side-chain solvent accessibility, offset by reductions in main-chain atom solvent exposure, were observed for standard residue types in partially geometry-optimised structures when compared to non-minimised models built from identical sets of proper dihedral angles abstracted from the literature. Optimisation of proper dihedral angles led most notably to marked increases of up to 54% in proline main-chain atom solvent accessibility compared to literature values. Similar effects were observed for fully-optimised tri-peptides in implicit solvent. The relief of internal strain energy was associated with systematic variation in N, C(α) and C(β) atom solvent accessibility across all standard residue types. The results underline the importance of optimisation of 'hard' degrees of freedom (bond lengths and valence bond angles) and improper dihedral angle values from force field or other context-independent reference values, and impact on the use of standardised fixed internal co-ordinate geometry in sampling approaches to the determination of absolute values of protein amino acid residue solvent accessibility. Quantum chemical methods provide a useful and accurate alternative to molecular mechanics methods to perform energy minimisation of peptides containing non-standard (chemically modified) amino acid residues frequently present in experimental protein structure data sets, for which force field parameters may not be available. Reference tri-peptide atomic co-ordinate sets including hydrogen atoms are made freely available.
Collapse
Affiliation(s)
- Christopher M Topham
- Molecular Forces Consulting, 40 Rue Boyssonne, Toulouse 31400, France; Computational Molecular Biophysics, IWR der Universität Heidelberg, Im Neuenheimer Feld 368, Heidelberg D-69120, Germany; University of Tennessee/Oak Ridge National Laboratory, Center for Molecular Biophysics, P.O. Box 2008, Oak Ridge, TN 37831-6309, USA; Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, M407 Walters Life Sciences, 1414 Cumberland Avenue, Knoxville, TN 37996, USA.
| | - Jeremy C Smith
- Computational Molecular Biophysics, IWR der Universität Heidelberg, Im Neuenheimer Feld 368, Heidelberg D-69120, Germany; University of Tennessee/Oak Ridge National Laboratory, Center for Molecular Biophysics, P.O. Box 2008, Oak Ridge, TN 37831-6309, USA; Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, M407 Walters Life Sciences, 1414 Cumberland Avenue, Knoxville, TN 37996, USA
| |
Collapse
|
23
|
Dong Z, Wang K, Dang TKL, Gültas M, Welter M, Wierschin T, Stanke M, Waack S. CRF-based models of protein surfaces improve protein-protein interaction site predictions. BMC Bioinformatics 2014; 15:277. [PMID: 25124108 PMCID: PMC4150965 DOI: 10.1186/1471-2105-15-277] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2013] [Accepted: 08/01/2014] [Indexed: 11/13/2022] Open
Abstract
Background The identification of protein-protein interaction sites is a computationally challenging task and important for understanding the biology of protein complexes. There is a rich literature in this field. A broad class of approaches assign to each candidate residue a real-valued score that measures how likely it is that the residue belongs to the interface. The prediction is obtained by thresholding this score. Some probabilistic models classify the residues on the basis of the posterior probabilities. In this paper, we introduce pairwise conditional random fields (pCRFs) in which edges are not restricted to the backbone as in the case of linear-chain CRFs utilized by Li et al. (2007). In fact, any 3D-neighborhood relation can be modeled. On grounds of a generalized Viterbi inference algorithm and a piecewise training process for pCRFs, we demonstrate how to utilize pCRFs to enhance a given residue-wise score-based protein-protein interface predictor on the surface of the protein under study. The features of the pCRF are solely based on the interface predictions scores of the predictor the performance of which shall be improved. Results We performed three sets of experiments with synthetic scores assigned to the surface residues of proteins taken from the data set PlaneDimers compiled by Zellner et al. (2011), from the list published by Keskin et al. (2004) and from the very recent data set due to Cukuroglu et al. (2014). That way we demonstrated that our pCRF-based enhancer is effective given the interface residue score distribution and the non-interface residue score are unimodal. Moreover, the pCRF-based enhancer is also successfully applicable, if the distributions are only unimodal over a certain sub-domain. The improvement is then restricted to that domain. Thus we were able to improve the prediction of the PresCont server devised by Zellner et al. (2011) on PlaneDimers. Conclusions Our results strongly suggest that pCRFs form a methodological framework to improve residue-wise score-based protein-protein interface predictors given the scores are appropriately distributed. A prototypical implementation of our method is accessible at http://ppicrf.informatik.uni-goettingen.de/index.html.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Stephan Waack
- Institute of Computer Science, University of Göttingen, Goldschmidtstr, 7, 37077 Göttingen, Germany.
| |
Collapse
|
24
|
Andreani J, Guerois R. Evolution of protein interactions: From interactomes to interfaces. Arch Biochem Biophys 2014; 554:65-75. [DOI: 10.1016/j.abb.2014.05.010] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2014] [Revised: 04/28/2014] [Accepted: 05/12/2014] [Indexed: 12/16/2022]
|
25
|
Peterhoff D, Beer B, Rajendran C, Kumpula EP, Kapetaniou E, Guldan H, Wierenga RK, Sterner R, Babinger P. A comprehensive analysis of the geranylgeranylglyceryl phosphate synthase enzyme family identifies novel members and reveals mechanisms of substrate specificity and quaternary structure organization. Mol Microbiol 2014; 92:885-99. [PMID: 24684232 DOI: 10.1111/mmi.12596] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/26/2014] [Indexed: 12/13/2022]
Abstract
Geranylgeranylglyceryl phosphate synthase (GGGPS) family enzymes catalyse the formation of an ether bond between glycerol-1-phosphate and polyprenyl diphosphates. They are essential for the biosynthesis of archaeal membrane lipids, but also occur in bacterial species, albeit with unknown physiological function. It has been known that there exist two phylogenetic groups (I and II) of GGGPS family enzymes, but a comprehensive study has been missing. We therefore visualized the variability within the family by applying a sequence similarity network, and biochemically characterized 17 representative GGGPS family enzymes regarding their catalytic activities and substrate specificities. Moreover, we present the first crystal structures of group II archaeal and bacterial enzymes. Our analysis revealed that the previously uncharacterized bacterial enzymes from group II have GGGPS activity like the archaeal enzymes and differ from the bacterial group I enzymes that are heptaprenylglyceryl phosphate synthases. The length of the isoprenoid substrate is determined in group II GGGPS enzymes by 'limiter residues' that are different from those in group I enzymes, as shown by site-directed mutagenesis. Most of the group II enzymes form hexamers. We could disrupt these hexamers to stable and catalytically active dimers by mutating a single amino acid that acts as an 'aromatic anchor'.
Collapse
Affiliation(s)
- David Peterhoff
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Regensburg, 93040, Germany
| | | | | | | | | | | | | | | | | |
Collapse
|
26
|
Rodrigues JPGLM, Bonvin AMJJ. Integrative computational modeling of protein interactions. FEBS J 2014; 281:1988-2003. [DOI: 10.1111/febs.12771] [Citation(s) in RCA: 86] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2013] [Revised: 01/03/2014] [Accepted: 02/19/2014] [Indexed: 01/09/2023]
Affiliation(s)
- João P. G. L. M. Rodrigues
- Computational Structural Biology Group; Bijvoet Center for Biomolecular Research; Utrecht University; the Netherlands
| | - Alexandre M. J. J. Bonvin
- Computational Structural Biology Group; Bijvoet Center for Biomolecular Research; Utrecht University; the Netherlands
| |
Collapse
|
27
|
Bhaskara RM, Padhi A, Srinivasan N. Accurate prediction of interfacial residues in two-domain proteins using evolutionary information: implications for three-dimensional modeling. Proteins 2013; 82:1219-34. [PMID: 24375512 DOI: 10.1002/prot.24486] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2013] [Revised: 11/04/2013] [Accepted: 11/19/2013] [Indexed: 01/08/2023]
Abstract
With the preponderance of multidomain proteins in eukaryotic genomes, it is essential to recognize the constituent domains and their functions. Often function involves communications across the domain interfaces, and the knowledge of the interacting sites is essential to our understanding of the structure-function relationship. Using evolutionary information extracted from homologous domains in at least two diverse domain architectures (single and multidomain), we predict the interface residues corresponding to domains from the two-domain proteins. We also use information from the three-dimensional structures of individual domains of two-domain proteins to train naïve Bayes classifier model to predict the interfacial residues. Our predictions are highly accurate (∼85%) and specific (∼95%) to the domain-domain interfaces. This method is specific to multidomain proteins which contain domains in at least more than one protein architectural context. Using predicted residues to constrain domain-domain interaction, rigid-body docking was able to provide us with accurate full-length protein structures with correct orientation of domains. We believe that these results can be of considerable interest toward rational protein and interaction design, apart from providing us with valuable information on the nature of interactions.
Collapse
|
28
|
Zhou N, Zhang J, Feng L, Lu B, Wang Z, Sun R, Wu C, Bao J. IntApop: a web service for predicting apoptotic protein interactions in humans. Biosystems 2013; 114:238-44. [PMID: 24120734 DOI: 10.1016/j.biosystems.2013.09.007] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2013] [Revised: 09/06/2013] [Accepted: 09/26/2013] [Indexed: 01/31/2023]
Abstract
Apoptosis, a type of cell death, is necessary for maintaining tissue homeostasis and removing malignant cells. Interrupted apoptosis process contributes to carcinogenesis, developmental defects, autoimmune diseases and neurological disorders. Due to the complexity of the process, the molecular dynamics and relative interactions of individual proteins responsible for the activation or inhibition of apoptosis should be researched systematically. In this study, we integrate known protein interactions from databases DIP, IntAct, MINT, HPRD and BioGRID by Naïve Bayes classifier. The receiver operation characteristic (ROC) curve with the area under the ROC curve (AUC) of 0.797 indicates it has a good performance in prediction. Then, we predict the global human apoptotic protein interactions network. Within it, we not only identify the already known interactions of caspases (caspase-8/-10, caspase-9, caspase-3/-6/-7) and Bcl-2 family, but also reveal that Bid can interact with casein kinases (CSK21/22/2B, KC1A, KC1E); both of B2LA1 and B2CL2 can interact with Bid, Bax and Bak; caspase-8 interacts with autophagic proteins (MLP3B, MLP3A and LRRk2). Consequently, we make an initial step to develop the web service IntApop that provides an appropriate platform for apoptosis researchers, systems biologists and translational clinician scientists to predict apoptotic protein interactions in human. In addition, the interaction network can be visualized online, making it a widely applicable systems biology tool for apoptosis and cancer researchers.
Collapse
Affiliation(s)
- Nan Zhou
- School of Life Sciences & Key Laboratory of Bio-resources, Ministry of Education, Sichuan University, Chengdu 610064, China
| | | | | | | | | | | | | | | |
Collapse
|
29
|
Carugo O. Frequency of dipeptides and antidipeptides. Comput Struct Biotechnol J 2013; 8:e201308001. [PMID: 24688741 PMCID: PMC3962099 DOI: 10.5936/csbj.201308001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2013] [Revised: 07/18/2013] [Accepted: 05/09/2013] [Indexed: 12/16/2022] Open
Abstract
Although it is reasonable to expect that the frequency of a generic dipeptide XY in proteins is the same of its counterpart YX, on the basis of an accurate statistical analysis of a large number of protein sequences, it appears that some dipeptides XY are considerably more frequent than their mirror images YX, referred to as antidipeptides. Given that it has been verified that this unexpected anisotropic frequency of occurrence is unbiased by the type of protein sequences that are analyzed, it is possible to conclude that this is a genuine phenomenon. Nevertheless, it was impossible to find the mechanism underlying this unexpected phenomenon, which does not seem to be related to diverse conformational propensities, to the different conformational flexibility of the peptide/antidipeptide pair, to dissimilar accessibility to the solvent or to gene random mutations.
Collapse
Affiliation(s)
- Oliviero Carugo
- Department of Chemistry, University of Pavia, viale Taramelli12, I-27100 Pavia, Italy ; Department of Structural and Computational Biology, Max F. Perutz Laboratories, Vienna University, Campus Vienna Biocenter 5, A-1030 Vienna, Austria
| |
Collapse
|
30
|
Andreani J, Faure G, Guerois R. InterEvScore: a novel coarse-grained interface scoring function using a multi-body statistical potential coupled to evolution. ACTA ACUST UNITED AC 2013; 29:1742-9. [PMID: 23652426 DOI: 10.1093/bioinformatics/btt260] [Citation(s) in RCA: 69] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
MOTIVATION Structural prediction of protein interactions currently remains a challenging but fundamental goal. In particular, progress in scoring functions is critical for the efficient discrimination of near-native interfaces among large sets of decoys. Many functions have been developed using knowledge-based potentials, but few make use of multi-body interactions or evolutionary information, although multi-residue interactions are crucial for protein-protein binding and protein interfaces undergo significant selection pressure to maintain their interactions. RESULTS This article presents InterEvScore, a novel scoring function using a coarse-grained statistical potential including two- and three-body interactions, which provides each residue with the opportunity to contribute in its most favorable local structural environment. Combination of this potential with evolutionary information considerably improves scoring results on the 54 test cases from the widely used protein docking benchmark for which evolutionary information can be collected. We analyze how our way to include evolutionary information gradually increases the discriminative power of InterEvScore. Comparison with several previously published scoring functions (ZDOCK, ZRANK and SPIDER) shows the significant progress brought by InterEvScore. AVAILABILITY http://biodev.cea.fr/interevol/interevscore CONTACT guerois@cea.fr SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jessica Andreani
- CEA, iBiTecS, Service de Bioenergetique Biologie Structurale et Mecanismes SB2SM, Laboratoire de Biologie Structurale et Radiobiologie LBSR, F-91191 Gif sur Yvette, France
| | | | | |
Collapse
|
31
|
Zybailov BL, Glazko GV, Jaiswal M, Raney KD. Large Scale Chemical Cross-linking Mass Spectrometry Perspectives. ACTA ACUST UNITED AC 2013; 6:001. [PMID: 25045217 PMCID: PMC4101816 DOI: 10.4172/jpb.s2-001] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
The spectacular heterogeneity of a complex protein mixture from biological samples becomes even more difficult to tackle when one’s attention is shifted towards different protein complex topologies, transient interactions, or localization of PPIs. Meticulous protein-by-protein affinity pull-downs and yeast-two-hybrid screens are the two approaches currently used to decipher proteome-wide interaction networks. Another method is to employ chemical cross-linking, which gives not only identities of interactors, but could also provide information on the sites of interactions and interaction interfaces. Despite significant advances in mass spectrometry instrumentation over the last decade, mapping Protein-Protein Interactions (PPIs) using chemical cross-linking remains time consuming and requires substantial expertise, even in the simplest of systems. While robust methodologies and software exist for the analysis of binary PPIs and also for the single protein structure refinement using cross-linking-derived constraints, undertaking a proteome-wide cross-linking study is highly complex. Difficulties include i) identifying cross-linkers of the right length and selectivity that could capture interactions of interest; ii) enrichment of the cross-linked species; iii) identification and validation of the cross-linked peptides and cross-linked sites. In this review we examine existing literature aimed at the large-scale protein cross-linking and discuss possible paths for improvement. We also discuss short-length cross-linkers of broad specificity such as formaldehyde and diazirine-based photo-cross-linkers. These cross-linkers could potentially capture many types of interactions, without strict requirement for a particular amino-acid to be present at a given protein-protein interface. How these shortlength, broad specificity cross-linkers be applied to proteome-wide studies? We will suggest specific advances in methodology, instrumentation and software that are needed to make such a leap.
Collapse
Affiliation(s)
- Boris L Zybailov
- Department of Biochemistry and Molecular Biology, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Galina V Glazko
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Mihir Jaiswal
- UALR/UAMS Joint Bioinformatics Program, University of Arkansas Little Rock, Little Rock, AR, USA
| | - Kevin D Raney
- Department of Biochemistry and Molecular Biology, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| |
Collapse
|
32
|
Peterhoff D, Zellner H, Guldan H, Merkl R, Sterner R, Babinger P. Dimerization Determines Substrate Specificity of a Bacterial Prenyltransferase. Chembiochem 2012; 13:1297-303. [DOI: 10.1002/cbic.201200127] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2012] [Indexed: 01/19/2023]
|
33
|
Janda JO, Busch M, Kück F, Porfenenko M, Merkl R. CLIPS-1D: analysis of multiple sequence alignments to deduce for residue-positions a role in catalysis, ligand-binding, or protein structure. BMC Bioinformatics 2012; 13:55. [PMID: 22480135 PMCID: PMC3391178 DOI: 10.1186/1471-2105-13-55] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2011] [Accepted: 04/05/2012] [Indexed: 11/12/2022] Open
Abstract
Background One aim of the in silico characterization of proteins is to identify all residue-positions, which are crucial for function or structure. Several sequence-based algorithms exist, which predict functionally important sites. However, with respect to sequence information, many functionally and structurally important sites are hard to distinguish and consequently a large number of incorrectly predicted functional sites have to be expected. This is why we were interested to design a new classifier that differentiates between functionally and structurally important sites and to assess its performance on representative datasets. Results We have implemented CLIPS-1D, which predicts a role in catalysis, ligand-binding, or protein structure for residue-positions in a mutually exclusive manner. By analyzing a multiple sequence alignment, the algorithm scores conservation as well as abundance of residues at individual sites and their local neighborhood and categorizes by means of a multiclass support vector machine. A cross-validation confirmed that residue-positions involved in catalysis were identified with state-of-the-art quality; the mean MCC-value was 0.34. For structurally important sites, prediction quality was considerably higher (mean MCC = 0.67). For ligand-binding sites, prediction quality was lower (mean MCC = 0.12), because binding sites and structurally important residue-positions share conservation and abundance values, which makes their separation difficult. We show that classification success varies for residues in a class-specific manner. This is why our algorithm computes residue-specific p-values, which allow for the statistical assessment of each individual prediction. CLIPS-1D is available as a Web service at http://www-bioinf.uni-regensburg.de/. Conclusions CLIPS-1D is a classifier, whose prediction quality has been determined separately for catalytic sites, ligand-binding sites, and structurally important sites. It generates hypotheses about residue-positions important for a set of homologous proteins and focuses on conservation and abundance signals. Thus, the algorithm can be applied in cases where function cannot be transferred from well-characterized proteins by means of sequence comparison.
Collapse
Affiliation(s)
- Jan-Oliver Janda
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, 93040 Regensburg, Germany.
| | | | | | | | | |
Collapse
|