1
|
Sirugue L, Langenfeld F, Lagarde N, Montes M. PLO3S: Protein LOcal Surficial Similarity Screening. Comput Struct Biotechnol J 2024; 26:1-10. [PMID: 38189058 PMCID: PMC10770625 DOI: 10.1016/j.csbj.2023.12.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 12/01/2023] [Accepted: 12/03/2023] [Indexed: 01/09/2024] Open
Abstract
The study of protein molecular surfaces enables to better understand and predict protein interactions. Different methods have been developed in computer vision to compare surfaces that can be applied to protein molecular surfaces. The present work proposes a method using the Wave Kernel Signature: Protein LOcal Surficial Similarity Screening (PLO3S). The descriptor of the PLO3S method is a local surface shape descriptor projected on a unit sphere mapped onto a 2D plane and called Surface Wave Interpolated Maps (SWIM). PLO3S allows to rapidly compare protein surface shapes through local comparisons to filter large protein surfaces datasets in protein structures virtual screening protocols.
Collapse
Affiliation(s)
- Léa Sirugue
- Laboratoire GBCM, EA7528, Conservatoire National des Arts et Métiers, Hesam Université, 2, rue Conté, Paris, 75003, France
| | - Florent Langenfeld
- Laboratoire GBCM, EA7528, Conservatoire National des Arts et Métiers, Hesam Université, 2, rue Conté, Paris, 75003, France
| | - Nathalie Lagarde
- Laboratoire GBCM, EA7528, Conservatoire National des Arts et Métiers, Hesam Université, 2, rue Conté, Paris, 75003, France
| | - Matthieu Montes
- Laboratoire GBCM, EA7528, Conservatoire National des Arts et Métiers, Hesam Université, 2, rue Conté, Paris, 75003, France
| |
Collapse
|
2
|
Schweke H, Mucchielli MH, Chevrollier N, Gosset S, Lopes A. SURFMAP: A Software for Mapping in Two Dimensions Protein Surface Features. J Chem Inf Model 2022; 62:1595-1601. [DOI: 10.1021/acs.jcim.1c01269] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Hugo Schweke
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), Gif-sur-Yvette 91198, France
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Marie-Hélène Mucchielli
- Université Paris-Saclay, CNRS, INRAE, Université Evry, Institute of Plant Sciences Paris-Saclay (IPS2), Gif-sur-Yvette 91190, France
- Université de Paris, Institute of Plant Sciences Paris-Saclay (IPS2), Gif-sur-Yvette 91190, France
| | | | - Simon Gosset
- Université Paris-Saclay, CNRS, INRAE, Université Evry, Institute of Plant Sciences Paris-Saclay (IPS2), Gif-sur-Yvette 91190, France
- Université de Paris, Institute of Plant Sciences Paris-Saclay (IPS2), Gif-sur-Yvette 91190, France
| | - Anne Lopes
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), Gif-sur-Yvette 91198, France
| |
Collapse
|
3
|
Machat M, Langenfeld F, Craciun D, Sirugue L, Labib T, Lagarde N, Maria M, Montes M. Comparative evaluation of shape retrieval methods on macromolecular surfaces: an application of computer vision methods in structural bioinformatics. Bioinformatics 2021; 37:4375-4382. [PMID: 34247232 PMCID: PMC8652110 DOI: 10.1093/bioinformatics/btab511] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 05/18/2021] [Accepted: 07/08/2021] [Indexed: 11/24/2022] Open
Abstract
MOTIVATION The investigation of the structure of biological systems at the molecular level gives insights about their functions and dynamics. Shape and surface of biomolecules are fundamental to molecular recognition events. Characterizing their geometry can lead to more adequate predictions of their interactions. In the present work, we assess the performance of reference shape retrieval methods from the computer vision community on protein shapes. RESULTS Shape retrieval methods are efficient in identifying orthologous proteins and tracking large conformational changes. This work illustrates the interest for the protein surface shape as a higher-level representation of the protein structure that (i) abstracts the underlying protein sequence, structure or fold, (ii) allows the use of shape retrieval methods to screen large databases of protein structures to identify surficial homologs and possible interacting partners and (iii) opens an extension of the protein structure-function paradigm toward a protein structure-surface(s)-function paradigm. AVAILABILITYAND IMPLEMENTATION All data are available online at http://datasetmachat.drugdesign.fr. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mohamed Machat
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
| | - Florent Langenfeld
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
| | - Daniela Craciun
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
| | - Léa Sirugue
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
| | - Taoufik Labib
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
| | - Nathalie Lagarde
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
| | - Maxime Maria
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
- Laboratoire XLIM, UMR CNRS 7252, Université de Limoges, Limoges 87000, France
| | - Matthieu Montes
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
| |
Collapse
|
4
|
Schweke H, Mucchielli MH, Sacquin-Mora S, Bei W, Lopes A. Protein Interaction Energy Landscapes are Shaped by Functional and also Non-functional Partners. J Mol Biol 2020; 432:1183-1198. [DOI: 10.1016/j.jmb.2019.12.047] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Revised: 12/19/2019] [Accepted: 12/30/2019] [Indexed: 10/25/2022]
|
5
|
Kontopoulos DG, Vlachakis D, Tsiliki G, Kossida S. Structuprint: a scalable and extensible tool for two-dimensional representation of protein surfaces. BMC STRUCTURAL BIOLOGY 2016; 16:4. [PMID: 26911476 PMCID: PMC4765231 DOI: 10.1186/s12900-016-0055-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/10/2015] [Accepted: 02/02/2016] [Indexed: 11/26/2022]
Abstract
BACKGROUND The term 'molecular cartography' encompasses a family of computational methods for two-dimensional transformation of protein structures and analysis of their physicochemical properties. The underlying algorithms comprise multiple manual steps, whereas the few existing implementations typically restrict the user to a very limited set of molecular descriptors. RESULTS We present Structuprint, a free standalone software that fully automates the rendering of protein surface maps, given - at the very least - a directory with a PDB file and an amino acid property. The tool comes with a default database of 328 descriptors, which can be extended or substituted by user-provided ones. The core algorithm comprises the generation of a mould of the protein surface, which is subsequently converted to a sphere and mapped to two dimensions, using the Miller cylindrical projection. Structuprint is partly optimized for multicore computers, making the rendering of animations of entire molecular dynamics simulations feasible. CONCLUSIONS Structuprint is an efficient application, implementing a molecular cartography algorithm for protein surfaces. According to the results of a benchmark, its memory requirements and execution time are reasonable, allowing it to run even on low-end personal computers. We believe that it will be of use - primarily but not exclusively - to structural biologists and computational biochemists.
Collapse
Affiliation(s)
| | - Dimitrios Vlachakis
- Bioinformatics & Medical Informatics Team, Biomedical Research Foundation, Academy of Athens, Athens, Greece.
| | - Georgia Tsiliki
- School of Chemical Engineering, National Technical University of Athens, Athens, Greece.
| | - Sofia Kossida
- IMGT®, The International ImMunoGeneTics Information System®, Université de Montpellier, Laboratoire d'ImmunoGénétique Moléculaire LIGM, UPR CNRS 1142, Institut de Génétique Humaine, Montpellier, France.
| |
Collapse
|
6
|
Affiliation(s)
- Rachel Kolodny
- Department of Computer Science, University of Haifa, Haifa 31905, Israel;
| | - Leonid Pereyaslavets
- Department of Structural Biology, Stanford University, Stanford, California 94305; ,
| | | | - Michael Levitt
- Department of Structural Biology, Stanford University, Stanford, California 94305; ,
| |
Collapse
|
7
|
Yang H, Qureshi R, Sacan A. Protein surface representation and analysis by dimension reduction. Proteome Sci 2012; 10 Suppl 1:S1. [PMID: 22759567 PMCID: PMC3380731 DOI: 10.1186/1477-5956-10-s1-s1] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Protein structures are better conserved than protein sequences, and consequently more functional information is available in structures than in sequences. However, proteins generally interact with other proteins and molecules via their surface regions and a backbone-only analysis of protein structures may miss many of the functional and evolutionary features. Surface information can help better elucidate proteins' functions and their interactions with other proteins. Computational analysis and comparison of protein surfaces is an important challenge to overcome to enable efficient and accurate functional characterization of proteins. Methods In this study we present a new method for representation and comparison of protein surface features. Our method is based on mapping the 3-D protein surfaces onto 2-D maps using various dimension reduction methods. We have proposed area and neighbor based metrics in order to evaluate the accuracy of this surface representation. In order to capture functionally relevant information, we encode geometric and biochemical features of the protein, such as hydrophobicity, electrostatic potential, and curvature, into separate color channels in the 2-D map. The resulting images can then be compared using efficient 2-D image registration methods to identify surface regions and features shared by proteins. Results We demonstrate the utility of our method and characterize its performance using both synthetic and real data. Among the dimension reduction methods investigated, SNE, LandmarkIsomap, Isomap, and Sammon's mapping provide the best performance in preserving the area and neighborhood properties of the original 3-D surface. The enriched 2-D representation is shown to be useful in characterizing the functional site of chymotrypsin and able to detect structural similarities in heat shock proteins. A texture mapping using the 2-D representation is also proposed as an interesting application to structure visualization.
Collapse
Affiliation(s)
- Heng Yang
- Center for Integrated Bioinformatics, School of Biomedical Engineering, Science and Health System, Drexel University, 3120 Market Street, Philadelphia, PA 19104, USA.
| | | | | |
Collapse
|
8
|
Gerloff DL, Woods NT, Farago AA, Monteiro ANA. BRCT domains: A little more than kin, and less than kind. FEBS Lett 2012; 586:2711-6. [PMID: 22584059 DOI: 10.1016/j.febslet.2012.05.005] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2012] [Accepted: 05/01/2012] [Indexed: 01/08/2023]
Abstract
BRCT domains are versatile protein modular domains found as single units or as multiple copies in more than 20 different proteins in the human genome. Interestingly, most BRCT-containing proteins function in the same biological process, the DNA damage response network, but show specificity in their molecular interactions. BRCT domains have been found to bind a wide array of ligands from proteins, phosphorylated linear motifs, and DNA. Here we discuss the biology of BRCT domains and how a domain-centric analysis can aid in the understanding of signal transduction events in the DNA damage response network.
Collapse
Affiliation(s)
- Dietlind L Gerloff
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA 95064, USA
| | | | | | | |
Collapse
|
9
|
Singh R. Learning and Prediction of Complex Molecular Structure-Property Relationships. Mach Learn 2012. [DOI: 10.4018/978-1-60960-818-7.ch518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The problem of modeling and predicting complex structure-property relationships, such as the absorption, distribution, metabolism, and excretion of putative drug molecules is a fundamental one in contemporary drug discovery. An accurate model can not only be used to predict the behavior of a molecule and understand how structural variations may influence molecular property, but also to identify regions of molecular space that hold promise in context of a specific investigation. However, a variety of factors contribute to the difficulty of constructing robust structure activity models for such complex properties. These include conceptual issues related to how well the true bio-chemical property is accounted for by formulation of the specific learning strategy, algorithmic issues associated with determining the proper molecular descriptors, access to small quantities of data, possibly on tens of molecules only, due to the high cost and complexity of the experimental process, and the complex nature of bio-chemical phenomena underlying the data. This chapter attempts to address this problem from the rudiments: the authors first identify and discuss the salient computational issues that span (and complicate) structure-property modeling formulations and present a brief review of the state-of-the-art. The authors then consider a specific problem: that of modeling intestinal drug absorption, where many of the aforementioned factors play a role. In addressing them, their solution uses a novel characterization of molecular space based on the notion of surface-based molecular similarity. This is followed by identifying a statistically relevant set of molecular descriptors, which along with an appropriate machine learning technique, is used to build the structure-property model. The authors propose simultaneous use of both ratio and ordinal error-measures for model construction and validation. The applicability of the approach is demonstrated in a real world case study.
Collapse
|
10
|
Gaston D, Susko E, Roger AJ. A phylogenetic mixture model for the identification of functionally divergent protein residues. ACTA ACUST UNITED AC 2011; 27:2655-63. [PMID: 21840876 DOI: 10.1093/bioinformatics/btr470] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION To understand the evolution of molecular function within protein families, it is important to identify those amino acid residues responsible for functional divergence; i.e. those sites in a protein family that affect cofactor, protein or substrate binding preferences; affinity; catalysis; flexibility; or folding. Type I functional divergence (FD) results from changes in conservation (evolutionary rate) at a site between protein subfamilies, whereas type II FD occurs when there has been a shift in preferences for different amino acid chemical properties. A variety of methods have been developed for identifying both site types in protein subfamilies, both from phylogenetic and information-theoretic angles. However, evaluation of the performance of these methods has typically relied upon a handful of reasonably well-characterized biological datasets or analyses of a single biological example. While experimental validation of many truly functionally divergent sites (true positives) can be relatively straightforward, determining that particular sites do not contribute to functional divergence (i.e. false positives and true negatives) is much more difficult, resulting in noisy 'gold standard' examples. RESULTS We describe a novel, phylogeny-based functional divergence classifier, FunDi. Unlike previous approaches, FunDi uses a unified mixture model-based approach to detect type I and type II FD. To assess FunDi's overall classification performance relative to other methods, we introduce two methods for simulating functionally divergent datasets. We find that the FunDi method performs better than several other predictors over a wide variety of simulation conditions. AVAILABILITY http://rogerlab.biochem.dal.ca/Software CONTACT andrew.roger@dal.ca SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Daniel Gaston
- Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Canada, B3H 1X5
| | | | | |
Collapse
|
11
|
Page MJ, Di Cera E. Combinatorial enzyme design probes allostery and cooperativity in the trypsin fold. J Mol Biol 2010; 399:306-19. [PMID: 20399789 PMCID: PMC2908009 DOI: 10.1016/j.jmb.2010.04.024] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2010] [Revised: 04/12/2010] [Accepted: 04/13/2010] [Indexed: 01/05/2023]
Abstract
Converting one enzyme into another is challenging due to the uneven distribution of important amino acids for function in both protein sequence and structure. We report a strategy for protein engineering allowing an organized mixing and matching of genetic material that leverages lower throughput with increased quality of screens. Our approach successfully tested the contribution of each surface-exposed loop in the trypsin fold alone and the cooperativity of their combinations towards building the substrate selectivity and Na(+)-dependent allosteric activation of the protease domain of human coagulation factor Xa into a bacterial trypsin. As the created proteases lack additional protein domains and protein co-factor activation mechanism requisite for the complexity of blood coagulation, they are stepping-stones towards further understanding and engineering of artificial clotting factors.
Collapse
Affiliation(s)
- Michael J. Page
- Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, California, USA
| | - Enrico Di Cera
- Department of Biochemistry and Molecular Biology, Saint Louis University, St. Louis, Missouri, USA
| |
Collapse
|
12
|
Zhang Q, Zmasek CM, Godzik A. Domain architecture evolution of pattern-recognition receptors. Immunogenetics 2010; 62:263-72. [PMID: 20195594 PMCID: PMC2858798 DOI: 10.1007/s00251-010-0428-1] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2009] [Accepted: 02/03/2010] [Indexed: 12/11/2022]
Abstract
In animals, the innate immune system is the first line of defense against invading microorganisms, and the pattern-recognition receptors (PRRs) are the key components of this system, detecting microbial invasion and initiating innate immune defenses. Two families of PRRs, the intracellular NOD-like receptors (NLRs) and the transmembrane Toll-like receptors (TLRs), are of particular interest because of their roles in a number of diseases. Understanding the evolutionary history of these families and their pattern of evolutionary changes may lead to new insights into the functioning of this critical system. We found that the evolution of both NLR and TLR families included massive species-specific expansions and domain shuffling in various lineages, which resulted in the same domain architectures evolving independently within different lineages in a process that fits the definition of parallel evolution. This observation illustrates both the dynamics of the innate immune system and the effects of "combinatorially constrained" evolution, where existence of the limited numbers of functionally relevant domains constrains the choices of domain architectures for new members in the family, resulting in the emergence of independently evolved proteins with identical domain architectures, often mistaken for orthologs.
Collapse
Affiliation(s)
- Qing Zhang
- Burnham Institute for Medical Research, 10901 North Torrey Pines Road, La Jolla, CA 92037 USA
| | - Christian M. Zmasek
- Burnham Institute for Medical Research, 10901 North Torrey Pines Road, La Jolla, CA 92037 USA
| | - Adam Godzik
- Burnham Institute for Medical Research, 10901 North Torrey Pines Road, La Jolla, CA 92037 USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093 USA
| |
Collapse
|
13
|
Abstract
One of the major challenges in the post-genomic era with hundreds of genomes sequenced is the annotation of protein structure and function. Computational predictions of subcellular localization are an important step toward this end. The development of computational tools that predict targeting and localization has, therefore, been a very active area of research, in particular since the first release of the groundbreaking program PSORT in 1991. The most reliable means of annotating protein structure and function remains homology-based inference, i.e. the transfer of experimental annotations from one protein to its homologs. However, annotations about localization demonstrate how much can be gained from advanced machine learning: more proteins can be annotated more reliably. Contemporary computational tools for the annotation of protein targeting include automatic methods that mine the textual information from the biological literature and molecular biology databases. Some machine learning-based methods that accurately predict features of sorting signals and that use sequence-derived features to predict localization have reached remarkable levels of performance. Sustained prediction accuracy has increased by more than 30 percentage points over the last decade. Here, we review some of the most recent methods for the prediction of subcellular localization and protein targeting that contributed toward this breakthrough.
Collapse
Affiliation(s)
- Shruti Rastogi
- Department of Biochemistry and Molecular Biophysics, Columbia University and Columbia University Center for Computational Biology and Bioinformatics (C2B2), New York, NY, USA
| | | |
Collapse
|
14
|
Abstract
The mapping of physicochemical characteristics onto the surface of a protein provides crucial insights into its function and evolution. This information can be further used in the characterization and identification of similarities within protein surface regions. We propose a novel method which quantitatively compares global and local properties on the protein surface. We have tested the method on comparison of electrostatic potentials and hydrophobicity. The method is based on 3D Zernike descriptors, which provides a compact representation of a given property defined on a protein surface. Compactness and rotational invariance of this descriptor enable fast comparison suitable for database searches. The usefulness of this method is exemplified by studying several protein families including globins, thermophilic and mesophilic proteins, and active sites of TIM beta/alpha barrel proteins. In all the cases studied, the descriptor is able to cluster proteins into functionally relevant groups. The proposed approach can also be easily extended to other surface properties. This protein surface-based approach will add a new way of viewing and comparing proteins to conventional methods, which compare proteins in terms of their primary sequence or tertiary structure.
Collapse
Affiliation(s)
- Lee Sael
- Department of Computer Science, College of Science, Purdue University, West Lafayette, IN, 47907, USA
| | - David La
- Department of Biological Sciences, College of Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Bin Li
- Department of Computer Science, College of Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Raif Rustamov
- Department of Mathematics, College of Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Daisuke Kihara
- Department of Computer Science, College of Science, Purdue University, West Lafayette, IN, 47907, USA
- Department of Biological Sciences, College of Science, Purdue University, West Lafayette, IN, 47907, USA
- Markey Center for Structural Biology, Purdue University, West Lafayette, IN, 47907, USA
- The Bindley Bioscience Center, Purdue University, West Lafayette, IN, 47907, USA
| |
Collapse
|
15
|
Lee D, Redfern O, Orengo C. Predicting protein function from sequence and structure. Nat Rev Mol Cell Biol 2007; 8:995-1005. [PMID: 18037900 DOI: 10.1038/nrm2281] [Citation(s) in RCA: 371] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
16
|
Benoit V, Mucchielli-Giorgi MH, Dumont B, Durosay P, Reymond N, Delacroix H. PPIDD: an extraction and visualisation method of biological protein-protein interfaces. Biochimie 2007; 90:640-7. [PMID: 18086573 DOI: 10.1016/j.biochi.2007.11.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2007] [Accepted: 11/22/2007] [Indexed: 10/22/2022]
Abstract
Today, the information for generating reliable protein-protein complex datasets is not directly accessible from PDB structures. Moreover, in X-ray protein structures, different types of contacts can be observed between proteins: contacts in homodimers or inside heterocomplexes considered to be specific, and contacts induced by crystallogenesis processes, considered to be non-specific. However, none of the databases giving access to protein-protein complexes allows the crystallographic interfaces to be distinguished from the biological interfaces. For this reason we developed PPIDD (Protein-Protein Interface Description Database), an innovative tool, which allows the extraction and visualisation of biological protein-protein interfaces from an annotated subset of crystallographic structures of proteins. This tool is focused on the description of protein-protein interfaces corresponding to well-identified classes of protein assemblies. It permits the representation of any of these protein-protein assemblies (duplex) and their interfaces as well as the export of the corresponding molecular structures under a flexible format, which is an extension of the PDBML. Moreover, PPIDD facilitates the construction of subsets of interfaces presenting user-specified common characteristics, to enhance the understanding of the determinants of specific protein-protein interactions.
Collapse
Affiliation(s)
- Vincent Benoit
- Centre de Génétique Moléculaire, UPR2167, Gif/Orsay DNA MicroArray Platform, F-91198 Gif-sur-Yvette, France
| | | | | | | | | | | |
Collapse
|
17
|
Abstract
Large-scale genome sequencing and structural genomics projects generate numerous sequences and structures for 'hypothetical' proteins without functional characterizations. Detection of homology to experimentally characterized proteins can provide functional clues, but the accuracy of homology-based predictions is limited by the paucity of tools for quantitative comparison of diverging residues responsible for the functional divergence. SURF'S UP! is a web server for analysis of functional relationships in protein families, as inferred from protein surface maps comparison according to the algorithm. It assigns a numerical score to the similarity between patterns of physicochemical features(charge, hydrophobicity) on compared protein surfaces. It allows recognizing clusters of proteins that have similar surfaces, hence presumably similar functions. The server takes as an input a set of protein coordinates and returns files with "spherical coordinates" of proteins in a PDB format and their graphical presentation, a matrix with values of mutual similarities between the surfaces, and the unrooted tree that represents the clustering of similar surfaces, calculated by the neighbor-joining method. SURF'S UP! facilitates the comparative analysis of physicochemical features of the surface, which are the key determinants of the protein function. By concentrating on coarse surface features, SURF'S UP! can work with models obtained from comparative modelling. Although it is designed to analyse the conservation among homologs, it can also be used to compare surfaces of non-homologous proteins with different three-dimensional folds, as long as a functionally meaningful structural superposition is supplied by the user. Another valuable characteristic of our method is the lack of initial assumptions about the functional features to be compared. SURF'S UP! is freely available for academic researchers at http://asia.genesilico.pl/surfs_up/.
Collapse
Affiliation(s)
- Joanna M Sasin
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Warsaw, Poland.
| | | | | |
Collapse
|
18
|
Rossi A, Marti-Renom MA, Sali A. Localization of binding sites in protein structures by optimization of a composite scoring function. Protein Sci 2006; 15:2366-80. [PMID: 16963645 PMCID: PMC2242385 DOI: 10.1110/ps.062247506] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
The rise in the number of functionally uncharacterized protein structures is increasing the demand for structure-based methods for functional annotation. Here, we describe a method for predicting the location of a binding site of a given type on a target protein structure. The method begins by constructing a scoring function, followed by a Monte Carlo optimization, to find a good scoring patch on the protein surface. The scoring function is a weighted linear combination of the z-scores of various properties of protein structure and sequence, including amino acid residue conservation, compactness, protrusion, convexity, rigidity, hydrophobicity, and charge density; the weights are calculated from a set of previously identified instances of the binding-site type on known protein structures. The scoring function can easily incorporate different types of information useful in localization, thus increasing the applicability and accuracy of the approach. To test the method, 1008 known protein structures were split into 20 different groups according to the type of the bound ligand. For nonsugar ligands, such as various nucleotides, binding sites were correctly identified in 55%-73% of the cases. The method is completely automated (http://salilab.org/patcher) and can be applied on a large scale in a structural genomics setting.
Collapse
Affiliation(s)
- Andrea Rossi
- Department of Biopharmaceutical Sciences and Pharmaceutical Chemistry, California Institute for Quantitative Biomedical Research, University of California, San Francisco, California 94143-2552, USA.
| | | | | |
Collapse
|
19
|
Lee WP, Tzou WS. Molecular surface directionality of the DNA-binding protein surface on the earth map. Genet Mol Biol 2006. [DOI: 10.1590/s1415-47572006000200033] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Affiliation(s)
- Wei-Po Lee
- National University of Kaohsiung, Taiwan
| | - Wen-Shyong Tzou
- National Taiwan Ocean University 2, Taiwan; National Taiwan Ocean University 2, Taiwan
| |
Collapse
|
20
|
Soares DC, Gerloff DL, Syme NR, Coulson AFW, Parkinson J, Barlow PN. Large-scale modelling as a route to multiple surface comparisons of the CCP module family. Protein Eng Des Sel 2005; 18:379-88. [PMID: 15976010 DOI: 10.1093/protein/gzi039] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Numerous mammalian proteins are constructed from a limited repertoire of module-types. Proteins belonging to the regulators of complement activation family--crucial for ensuring a complement-mediated immune response is targeted against infectious agents--are composed solely of complement control protein (CCP) modules. In the current study, CCP module sequences were grouped to allow selection of the most appropriate experimentally determined structures to serve as templates in an automated large-scale structure modelling procedure. The resulting 135 individual CCP module models, valuable in their own right, are available at the online database http://www.bru.ed.ac.uk/~dinesh/ccp-db.html. Comparisons of surface properties within a particular family of modules should be more informative than sequence alignments alone. A comparison of surface electrostatic features was undertaken for the first 28 CCP modules of complement receptor type 1 (CR1). Assignments to clusters based on surface properties differ from assignments to clusters based on sequences. This observation might reflect adaptive evolution of surface-exposed residues involved in protein-protein interactions. This illustrative example of a multiple surface-comparison was indeed able to pinpoint functional sites in CR1.
Collapse
Affiliation(s)
- Dinesh C Soares
- Biocomputing Research Unit, Michael Swann Building, University of Edinburgh, The King's Buildings, Edinburgh EH9 3JJ, UK
| | | | | | | | | | | |
Collapse
|
21
|
Laurine E, Manival X, Montgelard C, Bideau C, Bergé-Lefranc JL, Erard M, Verdier JM. PAP IB, a new member of the Reg gene family: cloning, expression, structural properties, and evolution by gene duplication. ACTA ACUST UNITED AC 2005; 1727:177-87. [PMID: 15777617 DOI: 10.1016/j.bbaexp.2005.01.011] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2004] [Revised: 01/20/2005] [Accepted: 01/20/2005] [Indexed: 11/26/2022]
Abstract
Reg proteins are expressed in various organs and are involved in cancers and neurodegenerative diseases. They display a typical C-type lectin-like domain but possess additional highly conserved amino acids. By studying human databases and Expressed Sequence Tags library, we identified a new member called PAP IB. Using probabilistic approaches, we established a phylogenetic tree of eighteen Reg proteins. The dendogram showed that they constitute a superfamily composed of three distinct families (FI to FIII) of paralogues that resulted from duplication. We therefore focused on two proteins, REG Ialpha and PAP IB, belonging to the more closely related FI and FII families, respectively. REG Ialpha and PAP IB share 50% sequence identity. After cloning PAP IB, however, we found that it was expressed almost only in pancreas, unlike REG Ialpha, whose expression is ubiquitous. In addition, by building a model of the structure of PAP IB based on the X-ray structure of REG Ialpha, we observed that the two proteins displayed distinctive surface charge distribution, which may lead to different ligands binding. In spite of their common fold that should result in closely related functions, REG Ialpha and PAP IB are a good example of duplication and divergence, probably with the acquisition of new functions, thus participating in the evolution of the protein repertoire.
Collapse
|
22
|
Lett D, Hsing M, Pio F. Interaction profile-based protein classification of death domain. BMC Bioinformatics 2004; 5:75. [PMID: 15189571 PMCID: PMC459208 DOI: 10.1186/1471-2105-5-75] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2004] [Accepted: 06/09/2004] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The increasing number of protein sequences and 3D structure obtained from genomic initiatives is leading many of us to focus on proteomics, and to dedicate our experimental and computational efforts on the creation and analysis of information derived from 3D structure. In particular, the high-throughput generation of protein-protein interaction data from a few organisms makes such an approach very important towards understanding the molecular recognition that make-up the entire protein-protein interaction network. Since the generation of sequences, and experimental protein-protein interactions increases faster than the 3D structure determination of protein complexes, there is tremendous interest in developing in silico methods that generate such structure for prediction and classification purposes. In this study we focused on classifying protein family members based on their protein-protein interaction distinctiveness. Structure-based classification of protein-protein interfaces has been described initially by Ponstingl et al. 1 and more recently by Valdar et al. 2 and Mintseris et al. 3, from complex structures that have been solved experimentally. However, little has been done on protein classification based on the prediction of protein-protein complexes obtained from homology modeling and docking simulation. RESULTS We have developed an in silico classification system entitled HODOCO (Homology modeling, Docking and Classification Oracle), in which protein Residue Potential Interaction Profiles (RPIPS) are used to summarize protein-protein interaction characteristics. This system applied to a dataset of 64 proteins of the death domain superfamily was used to classify each member into its proper subfamily. Two classification methods were attempted, heuristic and support vector machine learning. Both methods were tested with a 5-fold cross-validation. The heuristic approach yielded a 61% average accuracy, while the machine learning approach yielded an 89% average accuracy. CONCLUSION We have confirmed the reliability and potential value of classifying proteins via their predicted interactions. Our results are in the same range of accuracy as other studies that classify protein-protein interactions from 3D complex structure obtained experimentally. While our classification scheme does not take directly into account sequence information our results are in agreement with functional and sequence based classification of death domain family members.
Collapse
Affiliation(s)
- Drew Lett
- Department of Computer Science, Simon Fraser University, 8888 University Drive, Burnaby, B.C. Canada, V5A 1S6
| | - Michael Hsing
- Department of Molecular Biology and Biochemistry, Simon Fraser University, 8888 University Drive, Burnaby, B.C. Canada, V5A 1S6
| | - Frederic Pio
- Department of Molecular Biology and Biochemistry, Simon Fraser University, 8888 University Drive, Burnaby, B.C. Canada, V5A 1S6
| |
Collapse
|
23
|
Abstract
The classification of a newly identified protein as a member of a superfamily is important for focusing experiments on its most likely functions. Such classification, often performed by hand, has now been fully automated. This sophisticated new approach takes into account not only alignment scores but also a number of other computable attributes, such as functional sites deduced from sequence conservation patterns.
Collapse
Affiliation(s)
- Sabine Dietmann
- Structural Genomics Group, EMBL-EBI Research Programme, Cambridge CB10 1SD, UK
| | | | | |
Collapse
|
24
|
Rigden DJ, Mello LV, Setlow P, Jedrzejas MJ. Structure and mechanism of action of a cofactor-dependent phosphoglycerate mutase homolog from Bacillus stearothermophilus with broad specificity phosphatase activity. J Mol Biol 2002; 315:1129-43. [PMID: 11827481 DOI: 10.1006/jmbi.2001.5290] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The crystal structure of Bacillus stearothermophilus PhoE (originally termed YhfR), a broad specificity monomeric phosphatase with a molecular mass of approximately 24 kDa, has been solved at 2.3 A resolution in order to investigate its structure and function. PhoE, already identified as a homolog of a cofactor-dependent phosphoglycerate mutase, shares with the latter an alpha/beta/alpha sandwich structure spanning, as a structural excursion, a smaller subdomain composed of two alpha-helices and one short beta-strand. The active site contains residues from both the alpha/beta/alpha sandwich and the sub-domain. With the exception of the hydrophilic catalytic machinery conserved throughout the cofactor-dependent phosphoglycerate mutase family, the active-site cleft is strikingly hydrophobic. Docking studies with two diverse, favored substrates show that 3-phosphoglycerate may bind to the catalytic core, while alpha-napthylphosphate binding also involves the hydrophobic portion of the active-site cleft. Combining a highly favorable phospho group binding site common to these substrate binding modes and data from related enzymes, a catalytic mechanism can be proposed that involves formation of a phosphohistidine intermediate on His10 and likely acid-base behavior of Glu83. Other structural factors contributing to the broad substrate specificity of PhoE can be identified. The dynamic independence of the subdomain may enable the active-site cleft to accommodate substrates of different sizes, although similar motions are present in simulations of cofactor-dependent phosphoglycerate mutases, perhaps favoring a more general functional role. A significant number of entries in protein sequence databases, particularly from unfinished microbial genomes, are more similar to PhoE than to cofactor-dependent phosphoglycerate mutases or to fructose-2,6-bisphosphatases. This PhoE structure will therefore serve as a valuable basis for inference of structural and functional characteristics of these proteins.
Collapse
Affiliation(s)
- Daniel J Rigden
- National Centre of Genetic Resources and Biotechnology, Cenargen/Embrapa, S.A.I.N. Parque Rural, Final W5, Asa Norte, Brasília, 70770-900, Brazil
| | | | | | | |
Collapse
|