1
|
Phung DK, Pilotto S, Matelska D, Blombach F, Pinotsis N, Hovan L, Gervasio FL, Werner F. Archaeal NusA2 is the ancestor of ribosomal protein eS7 in eukaryotes. Structure 2025; 33:149-159.e6. [PMID: 39504966 DOI: 10.1016/j.str.2024.10.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2024] [Revised: 09/06/2024] [Accepted: 10/10/2024] [Indexed: 11/08/2024]
Abstract
N-utilization substance A (NusA) is a regulatory factor with pleiotropic functions in gene expression in bacteria. Archaea encode two conserved small proteins, NusA1 and NusA2, with domains orthologous to the two RNA binding K Homology (KH) domains of NusA. Here, we report the crystal structures of NusA2 from Sulfolobus acidocaldarius and Saccharolobus solfataricus obtained at 3.1 Å and 1.68 Å, respectively. NusA2 comprises an N-terminal zinc finger followed by two KH-like domains lacking the GXXG signature. Despite the loss of the GXXG motif, NusA2 binds single-stranded RNA. Mutations in the zinc finger domain compromise the structural integrity of NusA2 at high temperatures and molecular dynamics simulations indicate that zinc binding provides an energy barrier preventing the domain from reaching unfolded states. A structure-guided phylogenetic analysis of the KH-like domains supports the notion that the NusA2 clade is ancestral to the ribosomal protein eS7 in eukaryotes, implying a potential role of NusA2 in translation.
Collapse
Affiliation(s)
- Duy Khanh Phung
- RNAP Laboratory, Institute for Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Simona Pilotto
- RNAP Laboratory, Institute for Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Dorota Matelska
- RNAP Laboratory, Institute for Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Fabian Blombach
- RNAP Laboratory, Institute for Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Nikos Pinotsis
- Institute for Structural and Molecular Biology, Birkbeck College, London WC1E 7HX, UK
| | - Ladislav Hovan
- Pharmaceutical Sciences, University of Geneva, 1206 Genève, Switzerland
| | - Francesco Luigi Gervasio
- Pharmaceutical Sciences, University of Geneva, 1206 Genève, Switzerland; Institute of Pharmaceutical Sciences of Western Switzerland (ISPSO), University of Geneva, 1206 Genève, Switzerland; Department of Chemistry, University College London, London WC1E 6BT, UK
| | - Finn Werner
- RNAP Laboratory, Institute for Structural and Molecular Biology, University College London, London WC1E 6BT, UK.
| |
Collapse
|
2
|
Zhu C, Zhang C, Shang T, Zhang C, Zhai S, Cao L, Xu Z, Su Z, Song Y, Su A, Li C, Duan H. GAPS: a geometric attention-based network for peptide binding site identification by the transfer learning approach. Brief Bioinform 2024; 25:bbae297. [PMID: 38990514 PMCID: PMC11238429 DOI: 10.1093/bib/bbae297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2024] [Revised: 04/28/2024] [Accepted: 06/07/2024] [Indexed: 07/12/2024] Open
Abstract
Protein-peptide interactions (PPepIs) are vital to understanding cellular functions, which can facilitate the design of novel drugs. As an essential component in forming a PPepI, protein-peptide binding sites are the basis for understanding the mechanisms involved in PPepIs. Therefore, accurately identifying protein-peptide binding sites becomes a critical task. The traditional experimental methods for researching these binding sites are labor-intensive and time-consuming, and some computational tools have been invented to supplement it. However, these computational tools have limitations in generality or accuracy due to the need for ligand information, complex feature construction, or their reliance on modeling based on amino acid residues. To deal with the drawbacks of these computational algorithms, we describe a geometric attention-based network for peptide binding site identification (GAPS) in this work. The proposed model utilizes geometric feature engineering to construct atom representations and incorporates multiple attention mechanisms to update relevant biological features. In addition, the transfer learning strategy is implemented for leveraging the protein-protein binding sites information to enhance the protein-peptide binding sites recognition capability, taking into account the common structure and biological bias between proteins and peptides. Consequently, GAPS demonstrates the state-of-the-art performance and excellent robustness in this task. Moreover, our model exhibits exceptional performance across several extended experiments including predicting the apo protein-peptide, protein-cyclic peptide and the AlphaFold-predicted protein-peptide binding sites. These results confirm that the GAPS model is a powerful, versatile, stable method suitable for diverse binding site predictions.
Collapse
Affiliation(s)
- Cheng Zhu
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Chaowang Road, Gongshu District, Hangzhou 310014, China
| | - Chengyun Zhang
- AI Department, Shanghai Highslab Therapeutics. Inc, Zhangheng Road, Pudong New Area, Shanghai 201203, China
| | - Tianfeng Shang
- AI Department, Shanghai Highslab Therapeutics. Inc, Zhangheng Road, Pudong New Area, Shanghai 201203, China
| | - Chenhao Zhang
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Chaowang Road, Gongshu District, Hangzhou 310014, China
| | - Silong Zhai
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Chaowang Road, Gongshu District, Hangzhou 310014, China
| | - Lujing Cao
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Chaowang Road, Gongshu District, Hangzhou 310014, China
| | - Zhenyu Xu
- AI Department, Shanghai Highslab Therapeutics. Inc, Zhangheng Road, Pudong New Area, Shanghai 201203, China
| | - Zhihao Su
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Chaowang Road, Gongshu District, Hangzhou 310014, China
| | - Ying Song
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Chaowang Road, Gongshu District, Hangzhou 310014, China
| | - An Su
- College of Chemical Engineering, Zhejiang University of Technology, Chaowang Road, Gongshu District, Hangzhou 310014, China
| | - Chengxi Li
- College of Chemical and Biological Engineering, Zhejiang University, Yuhangtang Road, Xihu District, Hangzhou 310027, China
| | - Hongliang Duan
- Faculty of Applied Sciences, Macao Polytechnic University, R. de Luís Gonzaga Gomes, Macao 999078, China
| |
Collapse
|
3
|
Villanueva-Cañas JL, Fernandez-Fuentes N, Saul D, Kosinsky RL, Teyssier C, Rogalska ME, Pérez FP, Oliva B, Notredame C, Beato M, Sharma P. Evolutionary analysis reveals the role of a non-catalytic domain of peptidyl arginine deiminase 2 in transcriptional regulation. iScience 2024; 27:109584. [PMID: 38623337 PMCID: PMC11016909 DOI: 10.1016/j.isci.2024.109584] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 02/13/2024] [Accepted: 03/25/2024] [Indexed: 04/17/2024] Open
Abstract
Peptidyl arginine deiminases (PADIs) catalyze protein citrullination, a post-translational conversion of arginine to citrulline. The most widely expressed member of this family, PADI2, regulates cellular processes that impact several diseases. We hypothesized that we could gain new insights into PADI2 function through a systematic evolutionary and structural analysis. Here, we identify 20 positively selected PADI2 residues, 16 of which are structurally exposed and maintain PADI2 interactions with cognate proteins. Many of these selected residues reside in non-catalytic regions of PADI2. We validate the importance of a prominent loop in the middle domain that encompasses PADI2 L162, a residue under positive selection. This site is essential for interaction with the transcription elongation factor (P-TEFb) and mediates the active transcription of the oncogenes c-MYC, and CCNB1, as well as impacting cellular proliferation. These insights could be key to understanding and addressing the role of the PADI2 c-MYC axis in cancer progression.
Collapse
Affiliation(s)
- José Luis Villanueva-Cañas
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology (BIST), Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Narcis Fernandez-Fuentes
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, Ceredigion, United Kingdom
| | - Dominik Saul
- Division of Endocrinology, Mayo Clinic, Rochester, MN 55905, USA; Robert and Arlene Kogod Center on Aging, Mayo Clinic, Rochester, MN 55905, USA
- Department of Trauma and Reconstructive Surgery, BG Clinic, University of Tübingen, Tübingen, Germany
| | | | - Catherine Teyssier
- Institut de Recherche en Cancérologie de Montpellier (IRCM), INSERM U1194, Université de Montpellier, Institut Du Cancer de Montpellier (ICM), F-34298 Montpellier, France
| | - Malgorzata Ewa Rogalska
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology (BIST), Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Ferran Pegenaute Pérez
- Live-Cell Structural Biology Laboratory, Department of Medicine and Life Sciences, E-08005 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Baldomero Oliva
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Structural Bioinformatics Laboratory (GRIB-IMIM), Department of Medicine and Life Sciences, E-08003 Barcelona, Spain
| | - Cedric Notredame
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology (BIST), Dr. Aiguader 88, 08003 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Miguel Beato
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology (BIST), Dr. Aiguader 88, 08003 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Priyanka Sharma
- Institut de Pharmacologie et de Biologie Structurale, IPBS, Université de Toulouse, CNRS, UPS, Toulouse, France
| |
Collapse
|
4
|
BIPSPI+: Mining Type-Specific Datasets of Protein Complexes to Improve Protein Binding Site Prediction. J Mol Biol 2022; 434:167556. [DOI: 10.1016/j.jmb.2022.167556] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2021] [Revised: 03/12/2022] [Accepted: 03/16/2022] [Indexed: 11/20/2022]
|
5
|
Kozlovskii I, Popov P. Protein-Peptide Binding Site Detection Using 3D Convolutional Neural Networks. J Chem Inf Model 2021; 61:3814-3823. [PMID: 34292750 DOI: 10.1021/acs.jcim.1c00475] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Peptides and peptide-based molecules represent a promising therapeutic modality targeting intracellular protein-protein interactions, potentially combining the beneficial properties of biologics and small-molecule drugs. Protein-peptide complexes occupy a unique niche of interaction interfaces with respect to protein-protein and protein-small molecule complexes. Protein-peptide binding site identification resembles image object detection, a field that had been revolutionalized with computer vision techniques. We present a new protein-peptide binding site detection method called BiteNetPp by harnessing the power of 3D convolutional neural network. Our method employs a tensor-based representation of spatial protein structures, which is fed to 3D convolutional neural network, resulting in probability scores and coordinates of the binding "hot spots" in the input structures. We used the domain adaptation technique to fine-tune model trained on protein-small molecule complexes using a manually curated set of protein-peptide structures. BiteNetPp consistently outperforms existing state-of-the-art methods in the independent test benchmark. It takes less than a second to analyze a single-protein structure, making BiteNetPp suitable for the large-scale analysis of protein-peptide binding sites.
Collapse
Affiliation(s)
- Igor Kozlovskii
- iMolecule, Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Moscow 121205, Russia
| | - Petr Popov
- iMolecule, Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Moscow 121205, Russia
| |
Collapse
|
6
|
Slater O, Miller B, Kontoyianni M. Decoding Protein-protein Interactions: An Overview. Curr Top Med Chem 2021; 20:855-882. [PMID: 32101126 DOI: 10.2174/1568026620666200226105312] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2019] [Revised: 11/27/2019] [Accepted: 11/27/2019] [Indexed: 12/24/2022]
Abstract
Drug discovery has focused on the paradigm "one drug, one target" for a long time. However, small molecules can act at multiple macromolecular targets, which serves as the basis for drug repurposing. In an effort to expand the target space, and given advances in X-ray crystallography, protein-protein interactions have become an emerging focus area of drug discovery enterprises. Proteins interact with other biomolecules and it is this intricate network of interactions that determines the behavior of the system and its biological processes. In this review, we briefly discuss networks in disease, followed by computational methods for protein-protein complex prediction. Computational methodologies and techniques employed towards objectives such as protein-protein docking, protein-protein interactions, and interface predictions are described extensively. Docking aims at producing a complex between proteins, while interface predictions identify a subset of residues on one protein that could interact with a partner, and protein-protein interaction sites address whether two proteins interact. In addition, approaches to predict hot spots and binding sites are presented along with a representative example of our internal project on the chemokine CXC receptor 3 B-isoform and predictive modeling with IP10 and PF4.
Collapse
Affiliation(s)
- Olivia Slater
- Department of Pharmaceutical Sciences, Southern Illinois University, Edwardsville, IL 62026, United States
| | - Bethany Miller
- Department of Pharmaceutical Sciences, Southern Illinois University, Edwardsville, IL 62026, United States
| | - Maria Kontoyianni
- Department of Pharmaceutical Sciences, Southern Illinois University, Edwardsville, IL 62026, United States
| |
Collapse
|
7
|
Galaxy InteractoMIX: An Integrated Computational Platform for the Study of Protein-Protein Interaction Data. J Mol Biol 2020; 433:166656. [PMID: 32976910 DOI: 10.1016/j.jmb.2020.09.015] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Revised: 08/30/2020] [Accepted: 09/16/2020] [Indexed: 12/19/2022]
Abstract
Protein interactions play a crucial role among the different functions of a cell and are central to our understanding of cellular processes both in health and disease. Here we present Galaxy InteractoMIX (http://galaxy.interactomix.com), a platform composed of 13 different computational tools each addressing specific aspects of the study of protein-protein interactions, ranging from large-scale cross-species protein-wide interactomes to atomic resolution level of protein complexes. Galaxy InteractoMIX provides an intuitive interface where users can retrieve consolidated interactomics data distributed across several databases or uncover links between diseases and genes by analyzing the interactomes underlying these diseases. The platform makes possible large-scale prediction and curation protein interactions using the conservation of motifs, interology, or presence or absence of key sequence signatures. The range of structure-based tools includes modeling and analysis of protein complexes, delineation of interfaces and the modeling of peptides acting as inhibitors of protein-protein interactions. Galaxy InteractoMIX includes a range of ready-to-use workflows to run complex analyses requiring minimal intervention by users. The potential range of applications of the platform covers different aspects of life science, biomedicine, biotechnology and drug discovery where protein associations are studied.
Collapse
|
8
|
Multiple protein-DNA interfaces unravelled by evolutionary information, physico-chemical and geometrical properties. PLoS Comput Biol 2020; 16:e1007624. [PMID: 32012150 PMCID: PMC7018136 DOI: 10.1371/journal.pcbi.1007624] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Revised: 02/13/2020] [Accepted: 12/20/2019] [Indexed: 02/06/2023] Open
Abstract
Interactions between proteins and nucleic acids are at the heart of many essential biological processes. Despite increasing structural information about how these interactions may take place, our understanding of the usage made of protein surfaces by nucleic acids is still very limited. This is in part due to the inherent complexity associated to protein surface deformability and evolution. In this work, we present a method that contributes to decipher such complexity by predicting protein-DNA interfaces and characterizing their properties. It relies on three biologically and physically meaningful descriptors, namely evolutionary conservation, physico-chemical properties and surface geometry. We carefully assessed its performance on several hundreds of protein structures and compared it to several machine-learning state-of-the-art methods. Our approach achieves a higher sensitivity compared to the other methods, with a similar precision. Importantly, we show that it is able to unravel ‘hidden’ binding sites by applying it to unbound protein structures and to proteins binding to DNA via multiple sites and in different conformations. It is also applicable to the detection of RNA-binding sites, without significant loss of performance. This confirms that DNA and RNA-binding sites share similar properties. Our method is implemented as a fully automated tool, JETDNA2, freely accessible at: http://www.lcqb.upmc.fr/JET2DNA. We also provide a new dataset of 187 protein-DNA complex structures, along with a subset of 82 associated unbound structures. The set represents the largest body of high-resolution crystallographic structures of protein-DNA complexes, use biological protein assemblies as DNA-binding units, and covers all major types of protein-DNA interactions. It is available at: http://www.lcqb.upmc.fr/PDNAbenchmarks. Protein-DNA interactions are essential to living organisms and their impairment is associated to many diseases. For these reasons, they have become increasingly important therapeutic targets. Experimental structure determination has revealed different binding motifs and modes, associated to different functions. Yet, the available structural data gives us only a glimpse of the multiplicity and complexity of protein surface usage by DNA. In this work, we use a three-layer model to describe and predict DNA-binding sites at protein surfaces. Given a protein, we consider the way its residues are conserved through evolution, their physico-chemical properties and geometrical shapes to decrypt its surface. We are able to detect a large portion of interacting residues with good precision, even when they are ‘hidden’ by conformational changes. We highlight cases where one protein binds DNA via distinct regions to perform different functions. We are able to uncover the alternative binding sites and relate their properties with their specific roles. Our work can help guiding mutagenesis experiments and the development of new drugs specifically targeting one site while limiting possible side effects.
Collapse
|
9
|
Sanchez-Garcia R, Sorzano COS, Carazo JM, Segura J. BIPSPI: a method for the prediction of partner-specific protein-protein interfaces. Bioinformatics 2019; 35:470-477. [PMID: 30020406 PMCID: PMC6361243 DOI: 10.1093/bioinformatics/bty647] [Citation(s) in RCA: 62] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Accepted: 07/17/2018] [Indexed: 11/15/2022] Open
Abstract
Motivation Protein-Protein Interactions (PPI) are essentials for most cellular processes and thus, unveiling how proteins interact is a crucial question that can be better understood by identifying which residues are responsible for the interaction. Computational approaches are orders of magnitude cheaper and faster than experimental ones, leading to proliferation of multiple methods aimed to predict which residues belong to the interface of an interaction. Results We present BIPSPI, a new machine learning-based method for the prediction of partner-specific PPI sites. Contrary to most binding site prediction methods, the proposed approach takes into account a pair of interacting proteins rather than a single one in order to predict partner-specific binding sites. BIPSPI has been trained employing sequence-based and structural features from both protein partners of each complex compiled in the Protein-Protein Docking Benchmark version 5.0 and in an additional set independently compiled. Also, a version trained only on sequences has been developed. The performance of our approach has been assessed by a leave-one-out cross-validation over different benchmarks, outperforming state-of-the-art methods. Availability and implementation BIPSPI web server is freely available at http://bipspi.cnb.csic.es. BIPSPI code is available at https://github.com/bioinsilico/BIPSPI. Docker image is available at https://hub.docker.com/r/bioinsilico/bipspi/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ruben Sanchez-Garcia
- GN7 of the Spanish National Institute for Bioinformatics (INB), Biocomputing Unit, National Center of Biotechnology (CSIC), Instruct Image Processing Center, Madrid, Spain
| | - C O S Sorzano
- GN7 of the Spanish National Institute for Bioinformatics (INB), Biocomputing Unit, National Center of Biotechnology (CSIC), Instruct Image Processing Center, Madrid, Spain
| | - J M Carazo
- GN7 of the Spanish National Institute for Bioinformatics (INB), Biocomputing Unit, National Center of Biotechnology (CSIC), Instruct Image Processing Center, Madrid, Spain
| | - Joan Segura
- GN7 of the Spanish National Institute for Bioinformatics (INB), Biocomputing Unit, National Center of Biotechnology (CSIC), Instruct Image Processing Center, Madrid, Spain
| |
Collapse
|
10
|
Dequeker C, Laine E, Carbone A. Decrypting protein surfaces by combining evolution, geometry, and molecular docking. Proteins 2019; 87:952-965. [PMID: 31199528 PMCID: PMC6852240 DOI: 10.1002/prot.25757] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Revised: 05/09/2019] [Accepted: 06/07/2019] [Indexed: 01/30/2023]
Abstract
The growing body of experimental and computational data describing how proteins interact with each other has emphasized the multiplicity of protein interactions and the complexity underlying protein surface usage and deformability. In this work, we propose new concepts and methods toward deciphering such complexity. We introduce the notion of interacting region to account for the multiple usage of a protein's surface residues by several partners and for the variability of protein interfaces coming from molecular flexibility. We predict interacting patches by crossing evolutionary, physicochemical and geometrical properties of the protein surface with information coming from complete cross-docking (CC-D) simulations. We show that our predictions match well interacting regions and that the different sources of information are complementary. We further propose an indicator of whether a protein has a few or many partners. Our prediction strategies are implemented in the dynJET2 algorithm and assessed on a new dataset of 262 protein on which we performed CC-D. The code and the data are available at: http://www.lcqb.upmc.fr/dynJET2/.
Collapse
Affiliation(s)
- Chloé Dequeker
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), Paris, France
| | - Elodie Laine
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), Paris, France
| | - Alessandra Carbone
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), Paris, France.,Institut Universitaire de France (IUF), Paris, France
| |
Collapse
|
11
|
Wong ETC, Gsponer J. Predicting Protein-Protein Interfaces that Bind Intrinsically Disordered Protein Regions. J Mol Biol 2019; 431:3157-3178. [PMID: 31207240 DOI: 10.1016/j.jmb.2019.06.010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2019] [Revised: 06/01/2019] [Accepted: 06/04/2019] [Indexed: 12/18/2022]
Abstract
A long-standing goal in biology is the complete annotation of function and structure on all protein-protein interactions, a large fraction of which is mediated by intrinsically disordered protein regions (IDRs). However, knowledge derived from experimental structures of such protein complexes is disproportionately small due, in part, to challenges in studying interactions of IDRs. Here, we introduce IDRBind, a computational method that by combining gradient boosted trees and conditional random field models predicts binding sites of IDRs with performance approaching state-of-the-art globular interface predictions, making it suitable for proteome-wide applications. Although designed and trained with a focus on molecular recognition features, which are long interaction-mediating-elements in IDRs, IDRBind also predicts the binding sites of short peptides more accurately than existing specialized predictors. Consistent with IDRBind's specificity, a comparison of protein interface categories uncovered uniform trends in multiple physicochemical properties, positioning molecular recognition feature interfaces between peptide and globular interfaces.
Collapse
Affiliation(s)
- Eric T C Wong
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada; Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, BC, Canada
| | - Jörg Gsponer
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada; Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, BC, Canada.
| |
Collapse
|
12
|
Garcia-Garcia J, Valls-Comamala V, Guney E, Andreu D, Muñoz FJ, Fernandez-Fuentes N, Oliva B. iFrag: A Protein–Protein Interface Prediction Server Based on Sequence Fragments. J Mol Biol 2017; 429:382-389. [DOI: 10.1016/j.jmb.2016.11.034] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2016] [Revised: 11/27/2016] [Accepted: 11/30/2016] [Indexed: 01/08/2023]
|
13
|
Ripoche H, Laine E, Ceres N, Carbone A. JET2 Viewer: a database of predicted multiple, possibly overlapping, protein-protein interaction sites for PDB structures. Nucleic Acids Res 2017; 45:D236-D242. [PMID: 27899675 PMCID: PMC5210541 DOI: 10.1093/nar/gkw1053] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2016] [Revised: 10/18/2016] [Accepted: 10/20/2016] [Indexed: 11/13/2022] Open
Abstract
The database JET2 Viewer, openly accessible at http://www.jet2viewer.upmc.fr/, reports putative protein binding sites for all three-dimensional (3D) structures available in the Protein Data Bank (PDB). This knowledge base was generated by applying the computational method JET2 at large-scale on more than 20 000 chains. JET2 strategy yields very precise predictions of interacting surfaces and unravels their evolutionary process and complexity. JET2 Viewer provides an online intelligent display, including interactive 3D visualization of the binding sites mapped onto PDB structures and suitable files recording JET2 analyses. Predictions were evaluated on more than 15 000 experimentally characterized protein interfaces. This is, to our knowledge, the largest evaluation of a protein binding site prediction method. The overall performance of JET2 on all interfaces are: Sen = 52.52, PPV = 51.24, Spe = 80.05, Acc = 75.89. The data can be used to foster new strategies for protein-protein interactions modulation and interaction surface redesign.
Collapse
Affiliation(s)
- Hugues Ripoche
- Sorbonne Universités, UPMC University Paris 06, CNRS, IBPS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France
| | - Elodie Laine
- Sorbonne Universités, UPMC University Paris 06, CNRS, IBPS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France
| | - Nicoletta Ceres
- CNRS UMR 5086/University Lyon I, Institut de Biologie et Chimie des Proteines, 69367 Lyon, France
| | - Alessandra Carbone
- Sorbonne Universités, UPMC University Paris 06, CNRS, IBPS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France .,Institut Universitaire de France, 75005 Paris, France
| |
Collapse
|
14
|
InteractoMIX: a suite of computational tools to exploit interactomes in biological and clinical research. Biochem Soc Trans 2016; 44:917-24. [DOI: 10.1042/bst20150001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2015] [Indexed: 01/18/2023]
Abstract
Virtually all the biological processes that occur inside or outside cells are mediated by protein–protein interactions (PPIs). Hence, the charting and description of the PPI network, initially in organisms, the interactome, but more recently in specific tissues, is essential to fully understand cellular processes both in health and disease. The study of PPIs is also at the heart of renewed efforts in the medical and biotechnological arena in the quest of new therapeutic targets and drugs. Here, we present a mini review of 11 computational tools and resources tools developed by us to address different aspects of PPIs: from interactome level to their atomic 3D structural details. We provided details on each specific resource, aims and purpose and compare with equivalent tools in the literature. All the tools are presented in a centralized, one-stop, web site: InteractoMIX (http://interactomix.com).
Collapse
|
15
|
Esmaielbeiki R, Krawczyk K, Knapp B, Nebel JC, Deane CM. Progress and challenges in predicting protein interfaces. Brief Bioinform 2016; 17:117-31. [PMID: 25971595 PMCID: PMC4719070 DOI: 10.1093/bib/bbv027] [Citation(s) in RCA: 85] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Revised: 03/18/2015] [Indexed: 12/31/2022] Open
Abstract
The majority of biological processes are mediated via protein-protein interactions. Determination of residues participating in such interactions improves our understanding of molecular mechanisms and facilitates the development of therapeutics. Experimental approaches to identifying interacting residues, such as mutagenesis, are costly and time-consuming and thus, computational methods for this purpose could streamline conventional pipelines. Here we review the field of computational protein interface prediction. We make a distinction between methods which address proteins in general and those targeted at antibodies, owing to the radically different binding mechanism of antibodies. We organize the multitude of currently available methods hierarchically based on required input and prediction principles to provide an overview of the field.
Collapse
|
16
|
Local Geometry and Evolutionary Conservation of Protein Surfaces Reveal the Multiple Recognition Patches in Protein-Protein Interactions. PLoS Comput Biol 2015; 11:e1004580. [PMID: 26690684 PMCID: PMC4686965 DOI: 10.1371/journal.pcbi.1004580] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2015] [Accepted: 10/04/2015] [Indexed: 11/19/2022] Open
Abstract
Protein-protein interactions (PPIs) are essential to all biological processes and they represent increasingly important therapeutic targets. Here, we present a new method for accurately predicting protein-protein interfaces, understanding their properties, origins and binding to multiple partners. Contrary to machine learning approaches, our method combines in a rational and very straightforward way three sequence- and structure-based descriptors of protein residues: evolutionary conservation, physico-chemical properties and local geometry. The implemented strategy yields very precise predictions for a wide range of protein-protein interfaces and discriminates them from small-molecule binding sites. Beyond its predictive power, the approach permits to dissect interaction surfaces and unravel their complexity. We show how the analysis of the predicted patches can foster new strategies for PPIs modulation and interaction surface redesign. The approach is implemented in JET2, an automated tool based on the Joint Evolutionary Trees (JET) method for sequence-based protein interface prediction. JET2 is freely available at www.lcqb.upmc.fr/JET2.
Collapse
|
17
|
Fox NK, Brenner SE, Chandonia JM. The value of protein structure classification information-Surveying the scientific literature. Proteins 2015; 83:2025-38. [PMID: 26313554 PMCID: PMC4609302 DOI: 10.1002/prot.24915] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2015] [Revised: 08/06/2015] [Accepted: 08/18/2015] [Indexed: 11/08/2022]
Abstract
The Structural Classification of Proteins (SCOP) and Class, Architecture, Topology, Homology (CATH) databases have been valuable resources for protein structure classification for over 20 years. Development of SCOP (version 1) concluded in June 2009 with SCOP 1.75. The SCOPe (SCOP-extended) database offers continued development of the classic SCOP hierarchy, adding over 33,000 structures. We have attempted to assess the impact of these two decade old resources and guide future development. To this end, we surveyed recent articles to learn how structure classification data are used. Of 571 articles published in 2012-2013 that cite SCOP, 439 actually use data from the resource. We found that the type of use was fairly evenly distributed among four top categories: A) study protein structure or evolution (27% of articles), B) train and/or benchmark algorithms (28% of articles), C) augment non-SCOP datasets with SCOP classification (21% of articles), and D) examine the classification of one protein/a small set of proteins (22% of articles). Most articles described computational research, although 11% described purely experimental research, and a further 9% included both. We examined how CATH and SCOP were used in 158 articles that cited both databases: while some studies used only one dataset, the majority used data from both resources. Protein structure classification remains highly relevant for a diverse range of problems and settings.
Collapse
Affiliation(s)
- Naomi K Fox
- Lawrence Berkeley National Laboratory, Physical Biosciences Division, Berkeley, California, 94720
| | - Steven E Brenner
- Lawrence Berkeley National Laboratory, Physical Biosciences Division, Berkeley, California, 94720.,Department of Plant and Microbial Biology, University of California, Berkeley, California, 94720
| | - John-Marc Chandonia
- Lawrence Berkeley National Laboratory, Physical Biosciences Division, Berkeley, California, 94720
| |
Collapse
|
18
|
Kara A, Vickers M, Swain M, Whitworth DE, Fernandez-Fuentes N. Genome-wide prediction of prokaryotic two-component system networks using a sequence-based meta-predictor. BMC Bioinformatics 2015; 16:297. [PMID: 26384938 PMCID: PMC4575426 DOI: 10.1186/s12859-015-0741-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2015] [Accepted: 09/16/2015] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND Two component systems (TCS) are signalling complexes manifested by a histidine kinase (receptor) and a response regulator (effector). They are the most abundant signalling pathways in prokaryotes and control a wide range of biological processes. The pairing of these two components is highly specific, often requiring costly and time-consuming experimental characterisation. Therefore, there is considerable interest in developing accurate prediction tools to lessen the burden of experimental work and cope with the ever-increasing amount of genomic information. RESULTS We present a novel meta-predictor, MetaPred2CS, which is based on a support vector machine. MetaPred2CS integrates six sequence-based prediction methods: in-silico two-hybrid, mirror-tree, gene fusion, phylogenetic profiling, gene neighbourhood, and gene operon. To benchmark MetaPred2CS, we also compiled a novel high-quality training dataset of experimentally deduced TCS protein pairs for k-fold cross validation, to act as a gold standard for TCS partnership predictions. Combining individual predictions using MetaPred2CS improved performance when compared to the individual methods and in comparison with a current state-of-the-art meta-predictor. CONCLUSION We have developed MetaPred2CS, a support vector machine-based metapredictor for prokaryotic TCS protein pairings. Central to the success of MetaPred2CS is a strategy of integrating individual predictors that improves the overall prediction accuracy, with the in-silico two-hybrid method contributing most to performance. MetaPred2CS outperformed other available systems in our benchmark tests, and is available online at http://metapred2cs.ibers.aber.ac.uk, along with our gold standard dataset of TCS interaction pairs.
Collapse
Affiliation(s)
- Altan Kara
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, SY23 3EB, UK.
| | - Martin Vickers
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, SY23 3EB, UK.
| | - Martin Swain
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, SY23 3EB, UK.
| | - David E Whitworth
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, SY23 3EB, UK.
| | - Narcis Fernandez-Fuentes
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, SY23 3EB, UK.
| |
Collapse
|
19
|
Abstract
Background Protein-DNA interactions play important roles in many biological processes. Computational methods that can accurately predict DNA-binding sites on proteins will greatly expedite research on problems involving protein-DNA interactions. Results This paper presents a method for predicting DNA-binding sites on protein structures. The method represents protein surface patches using labeled graphs and uses a graph kernel method to calculate the similarities between graphs. A new surface patch is predicted to be interface or non-interface patch based on its similarities to known DNA-binding patches and non-DNA-binding patches. The proposed method achieved high accuracy when tested on a representative set of 146 protein-DNA complexes using leave-one-out cross-validation. Then, the method was applied to identify DNA-binding sties on 13 unbound structures of DNA-binding proteins. In each of the unbound structure, the top 1 patch predicted by the proposed method precisely indicated the location of the DNA-binding site. Comparisons with other methods showed that the proposed method was competitive in predicting DNA-binding sites on unbound proteins. Conclusions The proposed method uses graphs to encode the feature's distribution in the 3-dimensional (3D) space. Thus, compared with other vector-based methods, it has the advantage of taking into account the spatial distribution of features on the proteins. Using an efficient kernel method to compare graphs the proposed method also avoids the demanding computations required for 3D objects comparison. It provides a competitive method for predicting DNA-binding sites without requiring structure alignment.
Collapse
|
20
|
Gao L, Jiang X, Fu S, Gong H. In silico identification of potential virulence genes in 1,3-propanediol producer Klebsiella pneumonia. J Biotechnol 2014; 189:9-14. [DOI: 10.1016/j.jbiotec.2014.08.027] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2014] [Revised: 08/19/2014] [Accepted: 08/20/2014] [Indexed: 11/24/2022]
|
21
|
Accurate in silico identification of species-specific acetylation sites by integrating protein sequence-derived and functional features. Sci Rep 2014; 4:5765. [PMID: 25042424 PMCID: PMC4104576 DOI: 10.1038/srep05765] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2014] [Accepted: 07/03/2014] [Indexed: 11/08/2022] Open
Abstract
Lysine acetylation is a reversible post-translational modification, playing an important role in cytokine signaling, transcriptional regulation, and apoptosis. To fully understand acetylation mechanisms, identification of substrates and specific acetylation sites is crucial. Experimental identification is often time-consuming and expensive. Alternative bioinformatics methods are cost-effective and can be used in a high-throughput manner to generate relatively precise predictions. Here we develop a method termed as SSPKA for species-specific lysine acetylation prediction, using random forest classifiers that combine sequence-derived and functional features with two-step feature selection. Feature importance analysis indicates functional features, applied for lysine acetylation site prediction for the first time, significantly improve the predictive performance. We apply the SSPKA model to screen the entire human proteome and identify many high-confidence putative substrates that are not previously identified. The results along with the implemented Java tool, serve as useful resources to elucidate the mechanism of lysine acetylation and facilitate hypothesis-driven experimental design and validation.
Collapse
|
22
|
Andrabi M, Mizuguchi K, Ahmad S. Conformational changes in DNA-binding proteins: relationships with precomplex features and contributions to specificity and stability. Proteins 2013; 82:841-57. [PMID: 24265157 DOI: 10.1002/prot.24462] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2013] [Revised: 10/02/2013] [Accepted: 10/21/2013] [Indexed: 12/22/2022]
Abstract
Both Proteins and DNA undergo conformational changes in order to form functional complexes and also to facilitate interactions with other molecules. These changes have direct implications for the stability and specificity of the complex, as well as the cooperativity of interactions between multiple entities. In this work, we have extensively analyzed conformational changes in DNA-binding proteins by superimposing DNA-bound and unbound pairs of protein structures in a curated database of 90 proteins. We manually examined each of these pairs, unified the authors' annotations, and summarized our observations by classifying conformational changes into six structural categories. We explored a relationship between conformational changes and functional classes, binding motifs, target specificity, biophysical features of unbound proteins, and stability of the complex. In addition, we have also investigated the degree to which the intrinsic flexibility can explain conformational changes in a subset of 52 proteins with high quality coordinate data. Our results indicate that conformational changes in DNA-binding proteins contribute significantly to both the stability of the complex and the specificity of targets recognized by them. We also conclude that most conformational changes occur in proteins interacting with specific DNA targets, even though unbound protein structures may have sufficient information to interact with DNA in a nonspecific manner.
Collapse
Affiliation(s)
| | - Kenji Mizuguchi
- Bioinformatics project, National Institute of Biomedical Innovation, 7-6-8 Saito-Asagi, Ibaraki City, Osaka 567-0085, Japan
| | | |
Collapse
|
23
|
Rakshambikai R, Srinivasan N, Nishant KT. Structural insights into Saccharomyces cerevisiae Msh4-Msh5 complex function using homology modeling. PLoS One 2013; 8:e78753. [PMID: 24244354 PMCID: PMC3828297 DOI: 10.1371/journal.pone.0078753] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2013] [Accepted: 09/20/2013] [Indexed: 11/18/2022] Open
Abstract
The Msh4–Msh5 protein complex in eukaryotes is involved in stabilizing Holliday junctions and its progenitors to facilitate crossing over during Meiosis I. These functions of the Msh4–Msh5 complex are essential for proper chromosomal segregation during the first meiotic division. The Msh4/5 proteins are homologous to the bacterial mismatch repair protein MutS and other MutS homologs (Msh2, Msh3, Msh6). Saccharomyces cerevisiae msh4/5 point mutants were identified recently that show two fold reduction in crossing over, compared to wild-type without affecting chromosome segregation. Three distinct classes of msh4/5 point mutations could be sorted based on their meiotic phenotypes. These include msh4/5 mutations that have a) crossover and viability defects similar to msh4/5 null mutants; b) intermediate defects in crossing over and viability and c) defects only in crossing over. The absence of a crystal structure for the Msh4–Msh5 complex has hindered an understanding of the structural aspects of Msh4–Msh5 function as well as molecular explanation for the meiotic defects observed in msh4/5 mutations. To address this problem, we generated a structural model of the S. cerevisiae Msh4–Msh5 complex using homology modeling. Further, structural analysis tailored with evolutionary information is used to predict sites with potentially critical roles in Msh4–Msh5 complex formation, DNA binding and to explain asymmetry within the Msh4–Msh5 complex. We also provide a structural rationale for the meiotic defects observed in the msh4/5 point mutations. The mutations are likely to affect stability of the Msh4/5 proteins and/or interactions with DNA. The Msh4–Msh5 model will facilitate the design and interpretation of new mutational data as well as structural studies of this important complex involved in meiotic chromosome segregation.
Collapse
Affiliation(s)
| | | | - Koodali Thazath Nishant
- School of Biology, Indian Institute of Science Education and Research, Thiruvananthapuram, India
| |
Collapse
|
24
|
Abstract
The limited sequence similarity of protein sequences with known structures has led to an indispensable need for computational technology to predict their structures. Structural bioinformatics (SB) has become integral in elucidating the sequence-structure-function relationship of a protein. This report focuses on the applications of SB within the context of protein engineering including its limitation and future challenges.
Collapse
Affiliation(s)
- Yee Siew Choong
- Institute for Research in Molecular Medicine (INFORMM), Universiti Sains Malaysia, 11800, Minden, Pulau Pinang, Malaysia,
| | | | | |
Collapse
|
25
|
Niu B, Zhang Y, Ding J, Lu Y, Wang M, Lu W, Yuan X, Yin J. Predicting network of drug-enzyme interaction based on machine learning method. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2013; 1844:214-23. [PMID: 23907006 DOI: 10.1016/j.bbapap.2013.07.008] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/01/2012] [Revised: 07/16/2013] [Accepted: 07/18/2013] [Indexed: 12/11/2022]
Abstract
It is important to correctly and efficiently map drugs and enzymes to their possible interaction network in modern drug research. In this work, a novel approach was introduced to encode drug and enzyme molecules with physicochemical molecular descriptors and pseudo amino acid composition, respectively. Based on this encoding method, Random Forest was adopted to build the drug-enzyme interaction network. After selecting the optimal features that are able to represent the main factors of drug-enzyme interaction in our prediction, a total of 129 features were attained which can be clustered into nine categories: Elemental Analysis, Geometry, Chemistry, Amino Acid Composition, Secondary Structure, Polarity, Molecular Volume, Codon Diversity and Electrostatic Charge. It is further found that Geometry features were the most important of all the features. As a result, our predicting model achieved an MCC of 0.915 and a sensitivity of 87.9% at the specificity level of 99.8% for 10-fold cross-validation test, and achieved an MCC of 0.895 and a sensitivity of 95.7% at the specificity level of 95.4% for independent set test. This article is part of a Special Issue entitled: Computational Proteomics, Systems Biology & Clinical Implications. Guest Editor: Yudong Cai.
Collapse
Affiliation(s)
- Bing Niu
- College of Life Science, Shanghai University, 99 Shang-Da Road, Shanghai 200072, China
| | | | | | | | | | | | | | | |
Collapse
|
26
|
Abstract
Mathematical and computational models are increasingly used to help interpret biomedical data produced by high-throughput genomics and proteomics projects. The application of advanced computer models enabling the simulation of complex biological processes generates hypotheses and suggests experiments. Appropriately interfaced with biomedical databases, models are necessary for rapid access to, and sharing of knowledge through data mining and knowledge discovery approaches.
Collapse
Affiliation(s)
- Santo Motta
- Department of Mathematics and Computer Science, University of Catania, V.le A. Doria, 6, 95125 Catania, Italy.
| | | |
Collapse
|