1
|
Machat M, Langenfeld F, Craciun D, Sirugue L, Labib T, Lagarde N, Maria M, Montes M. Comparative evaluation of shape retrieval methods on macromolecular surfaces: an application of computer vision methods in structural bioinformatics. Bioinformatics 2021; 37:4375-4382. [PMID: 34247232 PMCID: PMC8652110 DOI: 10.1093/bioinformatics/btab511] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 05/18/2021] [Accepted: 07/08/2021] [Indexed: 11/24/2022] Open
Abstract
MOTIVATION The investigation of the structure of biological systems at the molecular level gives insights about their functions and dynamics. Shape and surface of biomolecules are fundamental to molecular recognition events. Characterizing their geometry can lead to more adequate predictions of their interactions. In the present work, we assess the performance of reference shape retrieval methods from the computer vision community on protein shapes. RESULTS Shape retrieval methods are efficient in identifying orthologous proteins and tracking large conformational changes. This work illustrates the interest for the protein surface shape as a higher-level representation of the protein structure that (i) abstracts the underlying protein sequence, structure or fold, (ii) allows the use of shape retrieval methods to screen large databases of protein structures to identify surficial homologs and possible interacting partners and (iii) opens an extension of the protein structure-function paradigm toward a protein structure-surface(s)-function paradigm. AVAILABILITYAND IMPLEMENTATION All data are available online at http://datasetmachat.drugdesign.fr. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mohamed Machat
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
| | - Florent Langenfeld
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
| | - Daniela Craciun
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
| | - Léa Sirugue
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
| | - Taoufik Labib
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
| | - Nathalie Lagarde
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
| | - Maxime Maria
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
- Laboratoire XLIM, UMR CNRS 7252, Université de Limoges, Limoges 87000, France
| | - Matthieu Montes
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
| |
Collapse
|
2
|
Budowski-Tal I, Kolodny R, Mandel-Gutfreund Y. A Novel Geometry-Based Approach to Infer Protein Interface Similarity. Sci Rep 2018; 8:8192. [PMID: 29844500 PMCID: PMC5974305 DOI: 10.1038/s41598-018-26497-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2017] [Accepted: 05/10/2018] [Indexed: 11/21/2022] Open
Abstract
The protein interface is key to understand protein function, providing a vital insight on how proteins interact with each other and with other molecules. Over the years, many computational methods to compare protein structures were developed, yet evaluating interface similarity remains a very difficult task. Here, we present PatchBag – a geometry based method for efficient comparison of protein surfaces and interfaces. PatchBag is a Bag-Of-Words approach, which represents complex objects as vectors, enabling to search interface similarity in a highly efficient manner. Using a novel framework for evaluating interface similarity, we show that PatchBag performance is comparable to state-of-the-art alignment-based structural comparison methods. The great advantage of PatchBag is that it does not rely on sequence or fold information, thus enabling to detect similarities between interfaces in unrelated proteins. We propose that PatchBag can contribute to reveal novel evolutionary and functional relationships between protein interfaces.
Collapse
Affiliation(s)
- Inbal Budowski-Tal
- Faculty of Biology, Technion, Israel Institute of Technology, Haifa, 3200003, Israel.,Department of Computer Science, University of Haifa, Mount Carmel, Haifa, 3498838, Israel
| | - Rachel Kolodny
- Department of Computer Science, University of Haifa, Mount Carmel, Haifa, 3498838, Israel.
| | - Yael Mandel-Gutfreund
- Faculty of Biology, Technion, Israel Institute of Technology, Haifa, 3200003, Israel.
| |
Collapse
|
3
|
Kontopoulos DG, Vlachakis D, Tsiliki G, Kossida S. Structuprint: a scalable and extensible tool for two-dimensional representation of protein surfaces. BMC STRUCTURAL BIOLOGY 2016; 16:4. [PMID: 26911476 PMCID: PMC4765231 DOI: 10.1186/s12900-016-0055-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/10/2015] [Accepted: 02/02/2016] [Indexed: 11/26/2022]
Abstract
BACKGROUND The term 'molecular cartography' encompasses a family of computational methods for two-dimensional transformation of protein structures and analysis of their physicochemical properties. The underlying algorithms comprise multiple manual steps, whereas the few existing implementations typically restrict the user to a very limited set of molecular descriptors. RESULTS We present Structuprint, a free standalone software that fully automates the rendering of protein surface maps, given - at the very least - a directory with a PDB file and an amino acid property. The tool comes with a default database of 328 descriptors, which can be extended or substituted by user-provided ones. The core algorithm comprises the generation of a mould of the protein surface, which is subsequently converted to a sphere and mapped to two dimensions, using the Miller cylindrical projection. Structuprint is partly optimized for multicore computers, making the rendering of animations of entire molecular dynamics simulations feasible. CONCLUSIONS Structuprint is an efficient application, implementing a molecular cartography algorithm for protein surfaces. According to the results of a benchmark, its memory requirements and execution time are reasonable, allowing it to run even on low-end personal computers. We believe that it will be of use - primarily but not exclusively - to structural biologists and computational biochemists.
Collapse
Affiliation(s)
| | - Dimitrios Vlachakis
- Bioinformatics & Medical Informatics Team, Biomedical Research Foundation, Academy of Athens, Athens, Greece.
| | - Georgia Tsiliki
- School of Chemical Engineering, National Technical University of Athens, Athens, Greece.
| | - Sofia Kossida
- IMGT®, The International ImMunoGeneTics Information System®, Université de Montpellier, Laboratoire d'ImmunoGénétique Moléculaire LIGM, UPR CNRS 1142, Institut de Génétique Humaine, Montpellier, France.
| |
Collapse
|
4
|
Zhang Q, Zmasek CM, Godzik A. Domain architecture evolution of pattern-recognition receptors. Immunogenetics 2010; 62:263-72. [PMID: 20195594 PMCID: PMC2858798 DOI: 10.1007/s00251-010-0428-1] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2009] [Accepted: 02/03/2010] [Indexed: 12/11/2022]
Abstract
In animals, the innate immune system is the first line of defense against invading microorganisms, and the pattern-recognition receptors (PRRs) are the key components of this system, detecting microbial invasion and initiating innate immune defenses. Two families of PRRs, the intracellular NOD-like receptors (NLRs) and the transmembrane Toll-like receptors (TLRs), are of particular interest because of their roles in a number of diseases. Understanding the evolutionary history of these families and their pattern of evolutionary changes may lead to new insights into the functioning of this critical system. We found that the evolution of both NLR and TLR families included massive species-specific expansions and domain shuffling in various lineages, which resulted in the same domain architectures evolving independently within different lineages in a process that fits the definition of parallel evolution. This observation illustrates both the dynamics of the innate immune system and the effects of "combinatorially constrained" evolution, where existence of the limited numbers of functionally relevant domains constrains the choices of domain architectures for new members in the family, resulting in the emergence of independently evolved proteins with identical domain architectures, often mistaken for orthologs.
Collapse
Affiliation(s)
- Qing Zhang
- Burnham Institute for Medical Research, 10901 North Torrey Pines Road, La Jolla, CA 92037 USA
| | - Christian M. Zmasek
- Burnham Institute for Medical Research, 10901 North Torrey Pines Road, La Jolla, CA 92037 USA
| | - Adam Godzik
- Burnham Institute for Medical Research, 10901 North Torrey Pines Road, La Jolla, CA 92037 USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093 USA
| |
Collapse
|
5
|
Redfern OC, Dessailly BH, Dallman TJ, Sillitoe I, Orengo CA. FLORA: a novel method to predict protein function from structure in diverse superfamilies. PLoS Comput Biol 2009; 5:e1000485. [PMID: 19714201 PMCID: PMC2721411 DOI: 10.1371/journal.pcbi.1000485] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2008] [Accepted: 07/23/2009] [Indexed: 11/18/2022] Open
Abstract
Predicting protein function from structure remains an active area of interest, particularly for the structural genomics initiatives where a substantial number of structures are initially solved with little or no functional characterisation. Although global structure comparison methods can be used to transfer functional annotations, the relationship between fold and function is complex, particularly in functionally diverse superfamilies that have evolved through different secondary structure embellishments to a common structural core. The majority of prediction algorithms employ local templates built on known or predicted functional residues. Here, we present a novel method (FLORA) that automatically generates structural motifs associated with different functional sub-families (FSGs) within functionally diverse domain superfamilies. Templates are created purely on the basis of their specificity for a given FSG, and the method makes no prior prediction of functional sites, nor assumes specific physico-chemical properties of residues. FLORA is able to accurately discriminate between homologous domains with different functions and substantially outperforms (a 2–3 fold increase in coverage at low error rates) popular structure comparison methods and a leading function prediction method. We benchmark FLORA on a large data set of enzyme superfamilies from all three major protein classes (α, β, αβ) and demonstrate the functional relevance of the motifs it identifies. We also provide novel predictions of enzymatic activity for a large number of structures solved by the Protein Structure Initiative. Overall, we show that FLORA is able to effectively detect functionally similar protein domain structures by purely using patterns of structural conservation of all residues. Understanding how the three-dimensional (3D) molecular structure of proteins influences their function can provide insights into the workings of biological systems. Structural Genomics Initiatives have been set up to investigate these structures on a large scale and make the data available to the wider biological research community. However, in a significant number of cases, there is little known about the functions of the structures that are solved. To address this, computational methods can be used as a predictive tool to guide future experimental investigations. One such approach is to exploit global structural comparison to assign the protein in question to an evolutionary family, which has already been functionally characterised. However, this is problematic in some large evolutionary families, which contain a number of different functional sub-families. We have developed a new method (FLORA) which is able to calculate 3D “motifs” which are specific to each of these sub-families. Any new protein structure can then be compared against these motifs to make a more accurate prediction of its function. Our paper shows that FLORA substantially outperforms other standard approaches for predicting function from structure. We use our method to make confident functional predictions for a set of proteins solved by the structural genomics projects, which could not have been assigned reliably by global structure comparison.
Collapse
Affiliation(s)
- Oliver C. Redfern
- Research Department of Structural and Molecular Biology, University College London, London, United Kingdom
- * E-mail:
| | - Benoît H. Dessailly
- Research Department of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Timothy J. Dallman
- Research Department of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Ian Sillitoe
- Research Department of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Christine A. Orengo
- Research Department of Structural and Molecular Biology, University College London, London, United Kingdom
| |
Collapse
|
6
|
Loewenstein Y, Raimondo D, Redfern OC, Watson J, Frishman D, Linial M, Orengo C, Thornton J, Tramontano A. Protein function annotation by homology-based inference. Genome Biol 2009; 10:207. [PMID: 19226439 PMCID: PMC2688287 DOI: 10.1186/gb-2009-10-2-207] [Citation(s) in RCA: 154] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Where information on homologous proteins is available,
progress is being made in automated prediction of protein function
from sequence and structure. With many genomes now sequenced, computational annotation methods to characterize genes and proteins from their sequence are increasingly important. The BioSapiens Network has developed tools to address all stages of this process, and here we review progress in the automated prediction of protein function based on protein sequence and structure.
Collapse
Affiliation(s)
- Yaniv Loewenstein
- Department of Biological Chemistry, The Hebrew University of Jerusalem, Sudarsky Center, Jerusalem 91904, Israel
| | | | | | | | | | | | | | | | | |
Collapse
|
7
|
Abstract
The mapping of physicochemical characteristics onto the surface of a protein provides crucial insights into its function and evolution. This information can be further used in the characterization and identification of similarities within protein surface regions. We propose a novel method which quantitatively compares global and local properties on the protein surface. We have tested the method on comparison of electrostatic potentials and hydrophobicity. The method is based on 3D Zernike descriptors, which provides a compact representation of a given property defined on a protein surface. Compactness and rotational invariance of this descriptor enable fast comparison suitable for database searches. The usefulness of this method is exemplified by studying several protein families including globins, thermophilic and mesophilic proteins, and active sites of TIM beta/alpha barrel proteins. In all the cases studied, the descriptor is able to cluster proteins into functionally relevant groups. The proposed approach can also be easily extended to other surface properties. This protein surface-based approach will add a new way of viewing and comparing proteins to conventional methods, which compare proteins in terms of their primary sequence or tertiary structure.
Collapse
Affiliation(s)
- Lee Sael
- Department of Computer Science, College of Science, Purdue University, West Lafayette, IN, 47907, USA
| | - David La
- Department of Biological Sciences, College of Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Bin Li
- Department of Computer Science, College of Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Raif Rustamov
- Department of Mathematics, College of Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Daisuke Kihara
- Department of Computer Science, College of Science, Purdue University, West Lafayette, IN, 47907, USA
- Department of Biological Sciences, College of Science, Purdue University, West Lafayette, IN, 47907, USA
- Markey Center for Structural Biology, Purdue University, West Lafayette, IN, 47907, USA
- The Bindley Bioscience Center, Purdue University, West Lafayette, IN, 47907, USA
| |
Collapse
|
8
|
Redfern OC, Dessailly B, Orengo CA. Exploring the structure and function paradigm. Curr Opin Struct Biol 2008; 18:394-402. [PMID: 18554899 PMCID: PMC2561214 DOI: 10.1016/j.sbi.2008.05.007] [Citation(s) in RCA: 95] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2008] [Revised: 04/16/2008] [Accepted: 05/07/2008] [Indexed: 11/29/2022]
Abstract
Advances in protein structure determination, led by the structural genomics initiatives have increased the proportion of novel folds deposited in the Protein Data Bank. However, these structures are often not accompanied by functional annotations with experimental confirmation. In this review, we reassess the meaning of structural novelty and examine its relevance to the complexity of the structure-function paradigm. Recent advances in the prediction of protein function from structure are discussed, as well as new sequence-based methods for partitioning large, diverse superfamilies into biologically meaningful clusters. Obtaining structural data for these functionally coherent groups of proteins will allow us to better understand the relationship between structure and function.
Collapse
Affiliation(s)
- Oliver C Redfern
- Department of Structural and Molecular Biology, University College London, London WC1E 6BT, United Kingdom
| | | | | |
Collapse
|