1
|
Ridha F, Gromiha MM. MPA-Pred: A machine learning approach for predicting the binding affinity of membrane protein-protein complexes. Proteins 2024; 92:499-508. [PMID: 37949651 DOI: 10.1002/prot.26633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 10/05/2023] [Accepted: 10/25/2023] [Indexed: 11/12/2023]
Abstract
Membrane protein-protein interactions are essential for several functions including cell signaling, ion transport, and enzymatic activity. These interactions are mainly dictated by their binding affinities. Although several methods are available for predicting the binding affinity of protein-protein complexes, there exists no specific method for membrane protein-protein complexes. In this work, we collected the experimental binding affinity data for a set of 114 membrane protein-protein complexes and derived several structure and sequence-based features. Our analysis on the relationship between binding affinity and the features revealed that the important factors mainly depend on the type of membrane protein and the functional class of the protein. Specifically, aromatic and charged residues at the interface, and aromatic-aromatic and electrostatic interactions are found to be important to understand the binding affinity. Further, we developed a method, MPA-Pred, for predicting the binding affinity of membrane protein-protein complexes using a machine learning approach. It showed an average correlation and mean absolute error of 0.83 and 0.91 kcal/mol, respectively, using the jack-knife test on a set of 114 complexes. We have also developed a web server and it is available at https://web.iitm.ac.in/bioinfo2/MPA-Pred/. This method can be used for predicting the affinity of membrane protein-protein complexes at a large scale and aid to improve drug design strategies.
Collapse
Affiliation(s)
- Fathima Ridha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, India
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, India
- Department of Computer Science, National University of Singapore, Singapore, Singapore
| |
Collapse
|
2
|
Kumar R, Srivastava Y, Muthuramalingam P, Singh SK, Verma G, Tiwari S, Tandel N, Beura SK, Panigrahi AR, Maji S, Sharma P, Rai PK, Prajapati DK, Shin H, Tyagi RK. Understanding Mutations in Human SARS-CoV-2 Spike Glycoprotein: A Systematic Review & Meta-Analysis. Viruses 2023; 15:856. [PMID: 37112836 PMCID: PMC10142771 DOI: 10.3390/v15040856] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Revised: 03/19/2023] [Accepted: 03/20/2023] [Indexed: 03/30/2023] Open
Abstract
Genetic variant(s) of concern (VoC) of SARS-CoV-2 have been emerging worldwide due to mutations in the gene encoding spike glycoprotein. We performed comprehensive analyses of spike protein mutations in the significant variant clade of SARS-CoV-2, using the data available on the Nextstrain server. We selected various mutations, namely, A222V, N439K, N501Y, L452R, Y453F, E484K, K417N, T478K, L981F, L212I, N856K, T547K, G496S, and Y369C for this study. These mutations were chosen based on their global entropic score, emergence, spread, transmission, and their location in the spike receptor binding domain (RBD). The relative abundance of these mutations was mapped with global mutation D614G as a reference. Our analyses suggest the rapid emergence of newer global mutations alongside D614G, as reported during the recent waves of COVID-19 in various parts of the world. These mutations could be instrumentally imperative for the transmission, infectivity, virulence, and host immune system's evasion of SARS-CoV-2. The probable impact of these mutations on vaccine effectiveness, antigenic diversity, antibody interactions, protein stability, RBD flexibility, and accessibility to human cell receptor ACE2 was studied in silico. Overall, the present study can help researchers to design the next generation of vaccines and biotherapeutics to combat COVID-19 infection.
Collapse
Affiliation(s)
- Reetesh Kumar
- Faculty of Agricultural Sciences, Institute of Applied Sciences & Humanities, GLA University, Mathura 281406, India
- Department of Biotherapeutics, CSIR-Institute of Microbial Technology (IMTECH), Chandigarh 160036, India
| | - Yogesh Srivastava
- Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Pandiyan Muthuramalingam
- Division of Horticultural Science, Gyeongsang National University, Jinju 52725, Republic of Korea
| | - Sunil Kumar Singh
- Department of Zoology, School of Biological Sciences, Central University of Punjab, Ghudda, Bathinda 151401, India
| | - Geetika Verma
- Department of Biotherapeutics, CSIR-Institute of Microbial Technology (IMTECH), Chandigarh 160036, India
| | - Savitri Tiwari
- Division of Life Sciences, Department of Biosciences, School of Basic and Applied Sciences, Galgotias University, Gautam Buddha Nagar, Greater Noida 201310, India
| | - Nikunj Tandel
- Institute of Science, Nirma University, SG Highway, Gujarat 382481, India
| | - Samir Kumar Beura
- Department of Zoology, School of Biological Sciences, Central University of Punjab, Ghudda, Bathinda 151401, India
| | | | - Somnath Maji
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Prakriti Sharma
- Biomedical Parasitology and Translational-Immunology Lab, CSIR-Institute of Microbial Technology (IMTECH), Chandigarh 160036, India
| | - Pankaj Kumar Rai
- Department of Biotechnology, IIET, Invertis University, Bareilly 243001, India
| | | | - Hyunsuk Shin
- Division of Horticultural Science, Gyeongsang National University, Jinju 52725, Republic of Korea
| | - Rajeev K. Tyagi
- Biomedical Parasitology and Translational-Immunology Lab, CSIR-Institute of Microbial Technology (IMTECH), Chandigarh 160036, India
| |
Collapse
|
3
|
Matos-Filipe P, Preto AJ, Koukos PI, Mourão J, Bonvin AMJJ, Moreira IS. MENSAdb: a thorough structural analysis of membrane protein dimers. Database (Oxford) 2021; 2021:baab013. [PMID: 33822911 PMCID: PMC8023553 DOI: 10.1093/database/baab013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2020] [Revised: 01/19/2021] [Accepted: 03/01/2021] [Indexed: 11/14/2022]
Abstract
Membrane proteins (MPs) are key players in a variety of different cellular processes and constitute the target of around 60% of all Food and Drug Administration-approved drugs. Despite their importance, there is still a massive lack of relevant structural, biochemical and mechanistic information mainly due to their localization within the lipid bilayer. To help fulfil this gap, we developed the MEmbrane protein dimer Novel Structure Analyser database (MENSAdb). This interactive web application summarizes the evolutionary and physicochemical properties of dimeric MPs to expand the available knowledge on the fundamental principles underlying their formation. Currently, MENSAdb contains features of 167 unique MPs (63% homo- and 37% heterodimers) and brings insights into the conservation of residues, accessible solvent area descriptors, average B-factors, intermolecular contacts at 2.5 Å and 4.0 Å distance cut-offs, hydrophobic contacts, hydrogen bonds, salt bridges, π-π stacking, T-stacking and cation-π interactions. The regular update and organization of all these data into a unique platform will allow a broad community of researchers to collect and analyse a large number of features efficiently, thus facilitating their use in the development of prediction models associated with MPs. Database URL: http://www.moreiralab.com/resources/mensadb.
Collapse
Affiliation(s)
- Pedro Matos-Filipe
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra 3005-504, Portugal
| | - António J Preto
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra 3005-504, Portugal
- PhD Programme in Experimental Biology and Biomedicine, Institute for Interdisciplinary Research, University of Coimbra, Coimbra, 3030-789, Portugal
| | - Panagiotis I Koukos
- Bijvoet Centre for Biomolecular Research, Faculty of Science—Chemistry, Utrecht University, Utrecht, 3584, CH, Netherlands
| | - Joana Mourão
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra 3005-504, Portugal
| | - Alexandre M J J Bonvin
- Bijvoet Centre for Biomolecular Research, Faculty of Science—Chemistry, Utrecht University, Utrecht, 3584, CH, Netherlands
| | - Irina S Moreira
- Department of Life Sciences, University of Coimbra, Coimbra, 3000-456, Portugal
- Center for Neuroscience and Cell Biology, Center for Innovative Biomedicine and Biotechnology, University of Coimbra, Coimbra, Portugal
| |
Collapse
|
4
|
Deng A, Zhang H, Wang W, Zhang J, Fan D, Chen P, Wang B. Developing Computational Model to Predict Protein-Protein Interaction Sites Based on the XGBoost Algorithm. Int J Mol Sci 2020; 21:E2274. [PMID: 32218345 PMCID: PMC7178137 DOI: 10.3390/ijms21072274] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Revised: 03/10/2020] [Accepted: 03/23/2020] [Indexed: 12/27/2022] Open
Abstract
The study of protein-protein interaction is of great biological significance, and the prediction of protein-protein interaction sites can promote the understanding of cell biological activity and will be helpful for drug development. However, uneven distribution between interaction and non-interaction sites is common because only a small number of protein interactions have been confirmed by experimental techniques, which greatly affects the predictive capability of computational methods. In this work, two imbalanced data processing strategies based on XGBoost algorithm were proposed to re-balance the original dataset from inherent relationship between positive and negative samples for the prediction of protein-protein interaction sites. Herein, a feature extraction method was applied to represent the protein interaction sites based on evolutionary conservatism of proteins, and the influence of overlapping regions of positive and negative samples was considered in prediction performance. Our method showed good prediction performance, such as prediction accuracy of 0.807 and MCC of 0.614, on an original dataset with 10,455 surface residues but only 2297 interface residues. Experimental results demonstrated the effectiveness of our XGBoost-based method.
Collapse
Affiliation(s)
- Aijun Deng
- Key Laboratory of Metallurgical Emission Reduction & Resources Recycling (Anhui University of Technology), Ministry of Education, Ma'anshan 243002, China
- School of Metallurgical Engineering, Anhui University of Technology, Ma'anshan 243032, China
- Department of Engineering, University of Leicester, Leicester LE1 7RH, UK
| | - Huan Zhang
- School of Electrical and Information Engineering, Anhui University of Technology, Ma'anshan 243032, China
| | - Wenyan Wang
- School of Electrical and Information Engineering, Anhui University of Technology, Ma'anshan 243032, China
| | - Jun Zhang
- Co-Innovation Center for Information Supply & Assurance Technology, Anhui University, Hefei 230032, China
| | - Dingdong Fan
- School of Metallurgical Engineering, Anhui University of Technology, Ma'anshan 243032, China
| | - Peng Chen
- Co-Innovation Center for Information Supply & Assurance Technology, Anhui University, Hefei 230032, China
| | - Bing Wang
- Key Laboratory of Metallurgical Emission Reduction & Resources Recycling (Anhui University of Technology), Ministry of Education, Ma'anshan 243002, China
- School of Electrical and Information Engineering, Anhui University of Technology, Ma'anshan 243032, China
- Co-Innovation Center for Information Supply & Assurance Technology, Anhui University, Hefei 230032, China
| |
Collapse
|
5
|
Carugo O. Atomic displacement parameters in structural biology. Amino Acids 2018; 50:775-786. [DOI: 10.1007/s00726-018-2574-y] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2017] [Accepted: 04/19/2018] [Indexed: 01/14/2023]
|
6
|
Qiu Z, Zhou B, Yuan J. Protein–protein interaction site predictions with minimum covariance determinant and Mahalanobis distance. J Theor Biol 2017; 433:57-63. [DOI: 10.1016/j.jtbi.2017.08.026] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Revised: 08/26/2017] [Accepted: 08/30/2017] [Indexed: 10/18/2022]
|
7
|
Kuo TH, Li KB. Predicting Protein-Protein Interaction Sites Using Sequence Descriptors and Site Propensity of Neighboring Amino Acids. Int J Mol Sci 2016; 17:ijms17111788. [PMID: 27792167 PMCID: PMC5133789 DOI: 10.3390/ijms17111788] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2016] [Revised: 10/14/2016] [Accepted: 10/18/2016] [Indexed: 12/17/2022] Open
Abstract
Information about the interface sites of Protein–Protein Interactions (PPIs) is useful for many biological research works. However, despite the advancement of experimental techniques, the identification of PPI sites still remains as a challenging task. Using a statistical learning technique, we proposed a computational tool for predicting PPI interaction sites. As an alternative to similar approaches requiring structural information, the proposed method takes all of the input from protein sequences. In addition to typical sequence features, our method takes into consideration that interaction sites are not randomly distributed over the protein sequence. We characterized this positional preference using protein complexes with known structures, proposed a numerical index to estimate the propensity and then incorporated the index into a learning system. The resulting predictor, without using structural information, yields an area under the ROC curve (AUC) of 0.675, recall of 0.597, precision of 0.311 and accuracy of 0.583 on a ten-fold cross-validation experiment. This performance is comparable to the previous approach in which structural information was used. Upon introducing the B-factor data to our predictor, we demonstrated that the AUC can be further improved to 0.750. The tool is accessible at http://bsaltools.ym.edu.tw/predppis.
Collapse
Affiliation(s)
- Tzu-Hao Kuo
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei 112, Taiwan.
| | - Kuo-Bin Li
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei 112, Taiwan.
- Office of Information Management, National Yang-Ming University Hospital, Yilan 260, Taiwan.
| |
Collapse
|
8
|
Chakravarty D, Janin J, Robert CH, Chakrabarti P. Changes in protein structure at the interface accompanying complex formation. IUCRJ 2015; 2:643-52. [PMID: 26594372 PMCID: PMC4645109 DOI: 10.1107/s2052252515015250] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/12/2015] [Accepted: 08/16/2015] [Indexed: 06/05/2023]
Abstract
Protein interactions are essential in all biological processes. The changes brought about in the structure when a free component forms a complex with another molecule need to be characterized for a proper understanding of molecular recognition as well as for the successful implementation of docking algorithms. Here, unbound (U) and bound (B) forms of protein structures from the Protein-Protein Interaction Affinity Database are compared in order to enumerate the changes that occur at the interface atoms/residues in terms of the solvent-accessible surface area (ASA), secondary structure, temperature factors (B factors) and disorder-to-order transitions. It is found that the interface atoms optimize contacts with the atoms in the partner protein, which leads to an increase in their ASA in the bound interface in the majority (69%) of the proteins when compared with the unbound interface, and this is independent of the root-mean-square deviation between the U and B forms. Changes in secondary structure during the transition indicate a likely extension of helices and strands at the expense of turns and coils. A reduction in flexibility during complex formation is reflected in the decrease in B factors of the interface residues on going from the U form to the B form. There is, however, no distinction in flexibility between the interface and the surface in the monomeric structure, thereby highlighting the potential problem of using B factors for the prediction of binding sites in the unbound form for docking another protein. 16% of the proteins have missing (disordered) residues in the U form which are observed (ordered) in the B form, mostly with an irregular conformation; the data set also shows differences in the composition of interface and non-interface residues in the disordered polypeptide segments as well as differences in their surface burial.
Collapse
Affiliation(s)
- Devlina Chakravarty
- Department of Biochemistry, Bose Institute , P-1/12 CIT Scheme VIIM, Kolkata 700 054, India
| | - Joël Janin
- IBBMC, CNRS UMR 8619, Universite Paris-Sud 11 , Orsay, France
| | - Charles H Robert
- CNRS Laboratoire de Biochimie Theorique, Institut de Biologie Physico-Chimique (IBPC), Universite Paris Diderot, Sorbonne Paris Cité , 13 Rue Pierre et Marie Curie, 75005 Paris, France
| | - Pinak Chakrabarti
- Department of Biochemistry, Bose Institute , P-1/12 CIT Scheme VIIM, Kolkata 700 054, India
| |
Collapse
|
9
|
Aumentado-Armstrong TT, Istrate B, Murgita RA. Algorithmic approaches to protein-protein interaction site prediction. Algorithms Mol Biol 2015; 10:7. [PMID: 25713596 PMCID: PMC4338852 DOI: 10.1186/s13015-015-0033-9] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2014] [Accepted: 01/07/2015] [Indexed: 12/19/2022] Open
Abstract
Interaction sites on protein surfaces mediate virtually all biological activities, and their identification holds promise for disease treatment and drug design. Novel algorithmic approaches for the prediction of these sites have been produced at a rapid rate, and the field has seen significant advancement over the past decade. However, the most current methods have not yet been reviewed in a systematic and comprehensive fashion. Herein, we describe the intricacies of the biological theory, datasets, and features required for modern protein-protein interaction site (PPIS) prediction, and present an integrative analysis of the state-of-the-art algorithms and their performance. First, the major sources of data used by predictors are reviewed, including training sets, evaluation sets, and methods for their procurement. Then, the features employed and their importance in the biological characterization of PPISs are explored. This is followed by a discussion of the methodologies adopted in contemporary prediction programs, as well as their relative performance on the datasets most recently used for evaluation. In addition, the potential utility that PPIS identification holds for rational drug design, hotspot prediction, and computational molecular docking is described. Finally, an analysis of the most promising areas for future development of the field is presented.
Collapse
|
10
|
Sriwastava BK, Basu S, Maulik U, Plewczynski D. PPIcons: identification of protein-protein interaction sites in selected organisms. J Mol Model 2013; 19:4059-70. [PMID: 23729008 PMCID: PMC3744667 DOI: 10.1007/s00894-013-1886-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2013] [Accepted: 05/06/2013] [Indexed: 01/08/2023]
Abstract
The physico-chemical properties of interaction interfaces have a crucial role in characterization of protein-protein interactions (PPI). In silico prediction of participating amino acids helps to identify interface residues for further experimental verification using mutational analysis, or inhibition studies by screening library of ligands against given protein. Given the unbound structure of a protein and the fact that it forms a complex with another known protein, the objective of this work is to identify the residues that are involved in the interaction. We attempt to predict interaction sites in protein complexes using local composition of amino acids together with their physico-chemical characteristics. The local sequence segments (LSS) are dissected from the protein sequences using a sliding window of 21 amino acids. The list of LSSs is passed to the support vector machine (SVM) predictor, which identifies interacting residue pairs considering their inter-atom distances. We have analyzed three different model organisms of Escherichia coli, Saccharomyces Cerevisiae and Homo sapiens, where the numbers of considered hetero-complexes are equal to 40, 123 and 33 respectively. Moreover, the unified multi-organism PPI meta-predictor is also developed under the current work by combining the training databases of above organisms. The PPIcons interface residues prediction method is measured by the area under ROC curve (AUC) equal to 0.82, 0.75, 0.72 and 0.76 for the aforementioned organisms and the meta-predictor respectively.
Collapse
Affiliation(s)
- Brijesh K. Sriwastava
- Department of Computer Science and Engineering, Government College of Engineering and Leather Technology, Kolkata, 700098 India
| | - Subhadip Basu
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, 700032 India
| | - Ujjwal Maulik
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, 700032 India
| | - Dariusz Plewczynski
- Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, 02-106 Warsaw, Poland
- Department of Physical Chemistry, Faculty of Pharmacy, Medical University of Warsaw, 02-097 Warsaw, Poland
| |
Collapse
|
11
|
La D, Kong M, Hoffman W, Choi YI, Kihara D. Predicting permanent and transient protein-protein interfaces. Proteins 2013; 81:805-18. [PMID: 23239312 PMCID: PMC4084939 DOI: 10.1002/prot.24235] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2012] [Revised: 11/19/2012] [Accepted: 11/28/2012] [Indexed: 11/11/2022]
Abstract
Protein-protein interactions (PPIs) are involved in diverse functions in a cell. To optimize functional roles of interactions, proteins interact with a spectrum of binding affinities. Interactions are conventionally classified into permanent and transient, where the former denotes tight binding between proteins that result in strong complexes, whereas the latter compose of relatively weak interactions that can dissociate after binding to regulate functional activity at specific time point. Knowing the type of interactions has significant implications for understanding the nature and function of PPIs. In this study, we constructed amino acid substitution models that capture mutation patterns at permanent and transient type of protein interfaces, which were found to be different with statistical significance. Using the substitution models, we developed a novel computational method that predicts permanent and transient protein binding interfaces (PBIs) in protein surfaces. Without knowledge of the interacting partner, the method uses a single query protein structure and a multiple sequence alignment of the sequence family. Using a large dataset of permanent and transient proteins, we show that our method, BindML+, performs very well in protein interface classification. A very high area under the curve (AUC) value of 0.957 was observed when predicted protein binding sites were classified. Remarkably, near prefect accuracy was achieved with an AUC of 0.991 when actual binding sites were classified. The developed method will be also useful for protein design of permanent and transient PBIs.
Collapse
Affiliation(s)
- David La
- Department of Biological Sciences, College of Science, Purdue University, West Lafayette, IN, 47907, USA
- Markey Center for Structural Biology, Purdue University, West Lafayette, IN, 47907, USA
| | - Misun Kong
- Department of Biological Sciences, College of Science, Purdue University, West Lafayette, IN, 47907, USA
| | - William Hoffman
- Department of Biological Sciences, College of Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Youn Im Choi
- Department of Biological Sciences, College of Science, Purdue University, West Lafayette, IN, 47907, USA
- Markey Center for Structural Biology, Purdue University, West Lafayette, IN, 47907, USA
| | - Daisuke Kihara
- Department of Biological Sciences, College of Science, Purdue University, West Lafayette, IN, 47907, USA
- Department of Computer Science, College of Science, Purdue University, West Lafayette, IN, 47907, USA
- Markey Center for Structural Biology, Purdue University, West Lafayette, IN, 47907, USA
| |
Collapse
|
12
|
Xiong Y, Liu J, Zhang W, Zeng T. Prediction of heme binding residues from protein sequences with integrative sequence profiles. Proteome Sci 2012; 10 Suppl 1:S20. [PMID: 22759579 PMCID: PMC3380730 DOI: 10.1186/1477-5956-10-s1-s20] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Background The heme-protein interactions are essential for various biological processes such as electron transfer, catalysis, signal transduction and the control of gene expression. The knowledge of heme binding residues can provide crucial clues to understand these activities and aid in functional annotation, however, insufficient work has been done on the research of heme binding residues from protein sequence information. Methods We propose a sequence-based approach for accurate prediction of heme binding residues by a novel integrative sequence profile coupling position specific scoring matrices with heme specific physicochemical properties. In order to select the informative physicochemical properties, we design an intuitive feature selection scheme by combining a greedy strategy with correlation analysis. Results Our integrative sequence profile approach for prediction of heme binding residues outperforms the conventional methods using amino acid and evolutionary information on the 5-fold cross validation and the independent tests. Conclusions The novel feature of an integrative sequence profile achieves good performance using a reduced set of feature vector elements.
Collapse
Affiliation(s)
- Yi Xiong
- School of Computer, Wuhan University, Wuhan 430072, China.
| | | | | | | |
Collapse
|
13
|
Jordan RA, EL-Manzalawy Y, Dobbs D, Honavar V. Predicting protein-protein interface residues using local surface structural similarity. BMC Bioinformatics 2012; 13:41. [PMID: 22424103 PMCID: PMC3386866 DOI: 10.1186/1471-2105-13-41] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2011] [Accepted: 03/18/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Identification of the residues in protein-protein interaction sites has a significant impact in problems such as drug discovery. Motivated by the observation that the set of interface residues of a protein tend to be conserved even among remote structural homologs, we introduce PrISE, a family of local structural similarity-based computational methods for predicting protein-protein interface residues. RESULTS We present a novel representation of the surface residues of a protein in the form of structural elements. Each structural element consists of a central residue and its surface neighbors. The PrISE family of interface prediction methods uses a representation of structural elements that captures the atomic composition and accessible surface area of the residues that make up each structural element. Each of the members of the PrISE methods identifies for each structural element in the query protein, a collection of similar structural elements in its repository of structural elements and weights them according to their similarity with the structural element of the query protein. PrISEL relies on the similarity between structural elements (i.e. local structural similarity). PrISEG relies on the similarity between protein surfaces (i.e. general structural similarity). PrISEC, combines local structural similarity and general structural similarity to predict interface residues. These predictors label the central residue of a structural element in a query protein as an interface residue if a weighted majority of the structural elements that are similar to it are interface residues, and as a non-interface residue otherwise. The results of our experiments using three representative benchmark datasets show that the PrISEC outperforms PrISEL and PrISEG; and that PrISEC is highly competitive with state-of-the-art structure-based methods for predicting protein-protein interface residues. Our comparison of PrISEC with PredUs, a recently developed method for predicting interface residues of a query protein based on the known interface residues of its (global) structural homologs, shows that performance superior or comparable to that of PredUs can be obtained using only local surface structural similarity. PrISEC is available as a Web server at http://prise.cs.iastate.edu/ CONCLUSIONS Local surface structural similarity based methods offer a simple, efficient, and effective approach to predict protein-protein interface residues.
Collapse
Affiliation(s)
- Rafael A Jordan
- Department of Computer Science, Iowa State University, Ames, IA 50011, USA
- Department of Systems and Computer Engineering, Pontificia Universidad Javeriana, Cali, Colombia
| | - Yasser EL-Manzalawy
- Department of Computer Science, Iowa State University, Ames, IA 50011, USA
- Department of Systems and Computer Engineering, Al-Azhar University, Cairo, Egypt
| | - Drena Dobbs
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50011, USA
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA
| | - Vasant Honavar
- Department of Computer Science, Iowa State University, Ames, IA 50011, USA
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA
| |
Collapse
|
14
|
Zellner H, Staudigel M, Trenner T, Bittkowski M, Wolowski V, Icking C, Merkl R. Prescont: Predicting protein-protein interfaces utilizing four residue properties. Proteins 2011; 80:154-68. [DOI: 10.1002/prot.23172] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2011] [Revised: 08/18/2011] [Accepted: 08/29/2011] [Indexed: 12/26/2022]
|
15
|
Li P, Pok G, Jung KS, Shon HS, Ryu KH. QSE: A new 3-D solvent exposure measure for the analysis of protein structure. Proteomics 2011; 11:3793-801. [PMID: 21761564 DOI: 10.1002/pmic.201100189] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2011] [Revised: 06/29/2011] [Accepted: 07/05/2011] [Indexed: 11/05/2022]
Abstract
Solvent exposure of amino acids measures how deep residues are buried in tertiary structure of proteins, and hence it provides important information for analyzing and predicting protein structure and functions. Existing methods of calculating solvent exposure such as accessible surface area, relative accessible surface area, residue depth, contact number, and half-sphere exposure still have some limitations. In this article, we propose a novel solvent exposure measure named quadrant-sphere exposure (QSE) based on eight quadrants derived from spherical neighborhood. The proposed measure forms a microenvironment around Cα atom as a sphere with a radius of 13 Å, and subdivides it into eight quadrants according to a rectangular coordinate system constructed based on geometric relationships of backbone atoms. The number of neighboring Cα atoms whose labels are the same is given as the QSE value of the center Cα atom at hand. As evidenced by histograms that show very different distributions for different structure configurations, the proposed measure captures local properties that are characteristic for a residue's eight-directional neighborhood within a sphere. Compared with other measures, QSE provides a different view of solvent exposure, and provides information that is specific for different tertiary structure. As the experimental results show, QSE measure can potentially be used in protein structure analysis and predictions.
Collapse
Affiliation(s)
- Peipei Li
- College of Electrical and Computer Engineering, Chungbuk National University, Chungbuk, Korea
| | | | | | | | | |
Collapse
|
16
|
Chen R, Chen W, Yang S, Wu D, Wang Y, Tian Y, Shi Y. Rigorous assessment and integration of the sequence and structure based features to predict hot spots. BMC Bioinformatics 2011; 12:311. [PMID: 21798070 PMCID: PMC3176265 DOI: 10.1186/1471-2105-12-311] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2010] [Accepted: 07/29/2011] [Indexed: 12/02/2022] Open
Abstract
Background Systematic mutagenesis studies have shown that only a few interface residues termed hot spots contribute significantly to the binding free energy of protein-protein interactions. Therefore, hot spots prediction becomes increasingly important for well understanding the essence of proteins interactions and helping narrow down the search space for drug design. Currently many computational methods have been developed by proposing different features. However comparative assessment of these features and furthermore effective and accurate methods are still in pressing need. Results In this study, we first comprehensively collect the features to discriminate hot spots and non-hot spots and analyze their distributions. We find that hot spots have lower relASA and larger relative change in ASA, suggesting hot spots tend to be protected from bulk solvent. In addition, hot spots have more contacts including hydrogen bonds, salt bridges, and atomic contacts, which favor complexes formation. Interestingly, we find that conservation score and sequence entropy are not significantly different between hot spots and non-hot spots in Ab+ dataset (all complexes). While in Ab- dataset (antigen-antibody complexes are excluded), there are significant differences in two features between hot pots and non-hot spots. Secondly, we explore the predictive ability for each feature and the combinations of features by support vector machines (SVMs). The results indicate that sequence-based feature outperforms other combinations of features with reasonable accuracy, with a precision of 0.69, a recall of 0.68, an F1 score of 0.68, and an AUC of 0.68 on independent test set. Compared with other machine learning methods and two energy-based approaches, our approach achieves the best performance. Moreover, we demonstrate the applicability of our method to predict hot spots of two protein complexes. Conclusion Experimental results show that support vector machine classifiers are quite effective in predicting hot spots based on sequence features. Hot spots cannot be fully predicted through simple analysis based on physicochemical characteristics, but there is reason to believe that integration of features and machine learning methods can remarkably improve the predictive performance for hot spots.
Collapse
Affiliation(s)
- Ruoying Chen
- 1College of Life Sciences, Graduate University of Chinese Academy ofSciences, Beijing 100049, China
| | | | | | | | | | | | | |
Collapse
|
17
|
Xiong Y, Liu J, Wei DQ. An accurate feature-based method for identifying DNA-binding residues on protein surfaces. Proteins 2011; 79:509-17. [PMID: 21069866 DOI: 10.1002/prot.22898] [Citation(s) in RCA: 62] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Proteins that interact with DNA play vital roles in all mechanisms of gene expression and regulation. In order to understand these activities, it is crucial to analyze and identify DNA-binding residues on DNA-binding protein surfaces. Here, we proposed two novel features B-factor and packing density in combination with several conventional features to characterize the DNA-binding residues in a well-constructed representative dataset of 119 protein-DNA complexes from the Protein Data Bank (PDB). Based on the selected features, a prediction model for DNA-binding residues was constructed using support vector machine (SVM). The predictor was evaluated using a 5-fold cross validation on above dataset of 123 DNA-binding proteins. Moreover, two independent datasets of 83 DNA-bound protein structures and their corresponding DNA-free forms were compiled. The B-factor and packing density features were statistically analyzed on these 83 pairs of holo-apo proteins structures. Finally, we developed the SVM model to accurately predict DNA-binding residues on protein surface, given the DNA-free structure of a protein. Results showed here indicate that our method represents a significant improvement of previously existing approaches such as DISPLAR. The observation suggests that our method will be useful in studying protein-DNA interactions to guide consequent works such as site-directed mutagenesis and protein-DNA docking.
Collapse
Affiliation(s)
- Yi Xiong
- School of Computer, Wuhan University, Wuhan 430072, People's Republic of China
| | | | | |
Collapse
|
18
|
Perkins JR, Diboun I, Dessailly BH, Lees JG, Orengo C. Transient protein-protein interactions: structural, functional, and network properties. Structure 2011; 18:1233-43. [PMID: 20947012 DOI: 10.1016/j.str.2010.08.007] [Citation(s) in RCA: 382] [Impact Index Per Article: 27.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2010] [Revised: 07/13/2010] [Accepted: 08/02/2010] [Indexed: 11/28/2022]
Abstract
Transient interactions, which involve protein interactions that are formed and broken easily, are important in many aspects of cellular function. Here we describe structural and functional properties of transient interactions between globular domains and between globular domains, short peptides, and disordered regions. The importance of posttranslational modifications in transient interactions is also considered. We review techniques used in the detection of the different types of transient protein-protein interactions. We also look at the role of transient interactions within protein-protein interaction networks and consider their contribution to different aspects of these networks.
Collapse
Affiliation(s)
- James R Perkins
- Department of Structural and Molecular Biology, University College of London, Gower Street, WC1E 6BT London, UK.
| | | | | | | | | |
Collapse
|
19
|
Using Support Vector Machine Combined with Post-processing Procedure to Improve Prediction of Interface Residues in Transient Complexes. Protein J 2009; 28:369-74. [DOI: 10.1007/s10930-009-9203-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|