1
|
Riziotis IG, Ribeiro AJM, Borkakoti N, Thornton JM. The 3D Modules of Enzyme Catalysis: Deconstructing Active Sites into Distinct Functional Entities. J Mol Biol 2023; 435:168254. [PMID: 37652131 DOI: 10.1016/j.jmb.2023.168254] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 08/20/2023] [Accepted: 08/22/2023] [Indexed: 09/02/2023]
Abstract
Enzyme catalysis is governed by a limited toolkit of residues and organic or inorganic co-factors. Therefore, it is expected that recurring residue arrangements will be found across the enzyme space, which perform a defined catalytic function, are structurally similar and occur in unrelated enzymes. Leveraging the integrated information in the Mechanism and Catalytic Site Atlas (M-CSA) (enzyme structure, sequence, catalytic residue annotations, catalysed reaction, detailed mechanism description), 3D templates were derived to represent compact groups of catalytic residues. A fuzzy template-template search, allowed us to identify those recurring motifs, which are conserved or convergent, that we define as the "modules of enzyme catalysis". We show that a large fraction of these modules facilitate binding of metal ions, co-factors and substrates, and are frequently the result of convergent evolution. A smaller number of convergent modules perform a well-defined catalytic role, such as the variants of the catalytic triad (i.e. Ser-His-Asp/Cys-His-Asp) and the saccharide-cleaving Asp/Glu triad. It is also shown that enzymes whose functions have diverged during evolution preserve regions of their active site unaltered, as shown by modules performing similar or identical steps of the catalytic mechanism. We have compiled a comprehensive library of catalytic modules, that characterise a broad spectrum of enzymes. These modules can be used as templates in enzyme design and for better understanding catalysis in 3D.
Collapse
Affiliation(s)
- Ioannis G Riziotis
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, CB10 1SD Cambridge, UK.
| | - António J M Ribeiro
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, CB10 1SD Cambridge, UK
| | - Neera Borkakoti
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, CB10 1SD Cambridge, UK
| | - Janet M Thornton
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, CB10 1SD Cambridge, UK
| |
Collapse
|
2
|
Riziotis IG, Thornton JM. Capturing the geometry, function, and evolution of enzymes with 3D templates. Protein Sci 2022; 31:e4363. [PMID: 35762726 PMCID: PMC9207746 DOI: 10.1002/pro.4363] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 05/06/2022] [Accepted: 05/14/2022] [Indexed: 11/05/2022]
Abstract
Structural templates are 3D signatures representing protein functional sites, such as ligand binding cavities, metal coordination motifs, or catalytic sites. Here we explore methods to generate template libraries and algorithms to query structures for conserved 3D motifs. Applications of templates are discussed, as well as some exemplar cases for examining evolutionary links in enzymes. We also introduce the concept of using more than one template per structure to represent flexible sites, as an approach to better understand catalysis through snapshots captured in enzyme structures. Functional annotation from structure is an important topic that has recently resurfaced due to the new more accurate methods of protein structure prediction. Therefore, we anticipate that template-based functional site detection will be a powerful tool in the task of characterizing a vast number of new protein models.
Collapse
|
3
|
Barnsley KK, Ondrechen MJ. Enzyme active sites: Identification and prediction of function using computational chemistry. Curr Opin Struct Biol 2022; 74:102384. [DOI: 10.1016/j.sbi.2022.102384] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 03/20/2022] [Accepted: 03/28/2022] [Indexed: 11/03/2022]
|
4
|
Riziotis IG, Ribeiro AJ, Borkakoti N, Thornton JM. Conformational variation in enzyme catalysis: A structural study on catalytic residues. J Mol Biol 2022; 434:167517. [PMID: 35240125 PMCID: PMC9005782 DOI: 10.1016/j.jmb.2022.167517] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2021] [Revised: 02/21/2022] [Accepted: 02/23/2022] [Indexed: 11/26/2022]
Abstract
We introduce a pipeline to compare and contrast active sites from homologous enzymes in 3D. Comprehensive structural study covering enzymes from a large functional space. High heterogeneity in magnitude of active site flexibililty between enzyme families. Diffferent catalytic residue types and functions relate to different degrees of flexibility. Four paradigms classify enzymes according to the structural behaviour during catalysis.
Conformational variation in catalytic residues can be captured as alternative snapshots in enzyme crystal structures. Addressing the question of whether active site flexibility is an intrinsic and essential property of enzymes for catalysis, we present a comprehensive study on the 3D variation of active sites of 925 enzyme families, using explicit catalytic residue annotations from the Mechanism and Catalytic Site Atlas and structural data from the Protein Data Bank. Through weighted pairwise superposition of the functional atoms of active sites, we captured structural variability at single-residue level and examined the geometrical changes as ligands bind or as mutations occur. We demonstrate that catalytic centres of enzymes can be inherently rigid or flexible to various degrees according to the function they perform, and structural variability most often involves a subset of the catalytic residues, usually those not directly involved in the formation or cleavage of bonds. Moreover, data suggest that 2/3 of active sites are flexible, and in half of those, flexibility is only observed in the side chain. The goal of this work is to characterise our current knowledge of the extent of flexibility at the heart of catalysis and ultimately place our findings in the context of the evolution of catalysis as enzymes evolve new functions and bind different substrates.
Collapse
|
5
|
Bittrich S, Burley SK, Rose AS. Real-time structural motif searching in proteins using an inverted index strategy. PLoS Comput Biol 2020; 16:e1008502. [PMID: 33284792 PMCID: PMC7746303 DOI: 10.1371/journal.pcbi.1008502] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 12/17/2020] [Accepted: 11/09/2020] [Indexed: 12/30/2022] Open
Abstract
Biochemical and biological functions of proteins are the product of both the overall fold of the polypeptide chain, and, typically, structural motifs made up of smaller numbers of amino acids constituting a catalytic center or a binding site that may be remote from one another in amino acid sequence. Detection of such structural motifs can provide valuable insights into the function(s) of previously uncharacterized proteins. Technically, this remains an extremely challenging problem because of the size of the Protein Data Bank (PDB) archive. Existing methods depend on a clustering by sequence similarity and can be computationally slow. We have developed a new approach that uses an inverted index strategy capable of analyzing >170,000 PDB structures with unmatched speed. The efficiency of the inverted index method depends critically on identifying the small number of structures containing the query motif and ignoring most of the structures that are irrelevant. Our approach (implemented at motif.rcsb.org) enables real-time retrieval and superposition of structural motifs, either extracted from a reference structure or uploaded by the user. Herein, we describe the method and present five case studies that exemplify its efficacy and speed for analyzing 3D structures of both proteins and nucleic acids. The Protein Data Bank (PDB) provides open access to more than 170,000 three-dimensional structures of proteins, nucleic acids, and biological complexes. Similarities between PDB structures give valuable functional and evolutionary insights but such resemblance may not be evident at sequence or global structure level. Throughout the database, there are recurring structural motifs—groups of modest numbers of residues in proximity that, for example, support catalytic activity. Identification of common structural motifs can reveal similarities between proteins and serve as fingerprints for spatial configurations of amino acids, such as the His-Asp-Ser catalytic triad found in serine proteases or the zinc coordination site found in Zinc Finger DNA-binding domains. We present a highly efficient yet flexible strategy that allows users for the first time to search for arbitrary structural motifs across the entire PDB archive in real-time. Our approach scales favorably with the increasing number and complexity of deposited structures, and, also, has the potential to be adapted for other applications in a macromolecular context.
Collapse
Affiliation(s)
- Sebastian Bittrich
- RCSB Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, California, USA
- * E-mail:
| | - Stephen K. Burley
- RCSB Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, California, USA
- RCSB Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA
- Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, New Jersey, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, California, USA
| | - Alexander S. Rose
- RCSB Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, California, USA
| |
Collapse
|
6
|
Kaiser F, Labudde D. Unsupervised Discovery of Geometrically Common Structural Motifs and Long-Range Contacts in Protein 3D Structures. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:671-680. [PMID: 29990265 DOI: 10.1109/tcbb.2017.2786250] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The essential role of small evolutionarily conserved structural units in proteins has been extensively researched and validated. A popular example are serine proteases, where the peptide cleavage reaction is realized by a configuration of only three residues. Brought to spatial proximity during the protein folding process, such structural motifs are often long-range contacts and usually hard to detect at sequence level. Due to the constantly increasing resource of protein 3D structure data, the computational identification of structural motifs can contribute significantly to the understanding of protein fold and function. Thus, we propose a method to discover structural motifs of high geometrical similarity and desired sequence separation in protein 3D structure data. By utilizing methods originated from data mining, no a priori knowledge is required. The applicability of the method is demonstrated by the identification of the catalytic unit of serine proteases and the ion-coordination center of cupredoxins. Furthermore, large-scale analysis of the entire Protein Data Bank points towards the presence of ubiquitous structural motifs, independent of any specific fold or function. We envision that our method is suitable to uncover functional mechanisms and to derive fingerprint libraries of structural motifs, which could be used to assess protein family association.
Collapse
|
7
|
Parasuram R, Mills CL, Wang Z, Somasundaram S, Beuning PJ, Ondrechen MJ. Local structure based method for prediction of the biochemical function of proteins: Applications to glycoside hydrolases. Methods 2016; 93:51-63. [DOI: 10.1016/j.ymeth.2015.11.010] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2015] [Revised: 11/05/2015] [Accepted: 11/09/2015] [Indexed: 01/07/2023] Open
|
8
|
Kaiser F, Eisold A, Bittrich S, Labudde D. Fit3D: a web application for highly accurate screening of spatial residue patterns in protein structure data. Bioinformatics 2015; 32:792-4. [PMID: 26519504 DOI: 10.1093/bioinformatics/btv637] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2015] [Accepted: 10/24/2015] [Indexed: 01/12/2023] Open
Abstract
UNLABELLED The clarification of linkage between protein structure and function is still a demanding process and can be supported by comparison of spatial residue patterns, so-called structural motifs. However, versatile up-to-date resources to search for local structure similarities are rare. We present Fit3D, an easily accessible web application for highly accurate screening of structural motifs in 3D protein data. AVAILABILITY AND IMPLEMENTATION The web application is accessible at https://biosciences.hs-mittweida.de/fit3d and program sources of the command line version were released under the terms of GNU GPLv3. Platform-independent binaries and documentations for offline usage are available at https://bitbucket.org/fkaiser/fit3d CONTACT florian.kaiser@hs-mittweida.de SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Florian Kaiser
- Department of Applied Computer and Biosciences, University of Applied Sciences Mittweida, 09648 Mittweida, Germany
| | - Alexander Eisold
- Department of Applied Computer and Biosciences, University of Applied Sciences Mittweida, 09648 Mittweida, Germany
| | - Sebastian Bittrich
- Department of Applied Computer and Biosciences, University of Applied Sciences Mittweida, 09648 Mittweida, Germany
| | - Dirk Labudde
- Department of Applied Computer and Biosciences, University of Applied Sciences Mittweida, 09648 Mittweida, Germany
| |
Collapse
|
9
|
Kaiser F, Eisold A, Labudde D. A Novel Algorithm for Enhanced Structural Motif Matching in Proteins. J Comput Biol 2015; 22:698-713. [PMID: 25695840 DOI: 10.1089/cmb.2014.0263] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
As widely discussed in literature, spatial patterns of amino acids, so-called structural motifs, play an important role in protein function. The functionally responsible part of proteins often lies in an evolutionarily highly conserved spatial arrangement of only a few amino acids, which are held in place tightly by the rest of the structure. Those recurring amino acid arrangements can be seen as patterns in the three-dimensional space and are known as structural motifs. In general, these motifs can mediate various functional interactions, such as DNA/RNA targeting and binding, ligand interactions, substrate catalysis, and stabilization of the protein structure. Hence, characterizing and identifying such conserved structural motifs can contribute to the understanding of structure-function relationships. Therefore, and because of the rapidly increasing number of solved protein structures, it is highly desirable to identify, understand, and moreover to search for structurally scattered amino acid motifs. This work aims at the development and the implementation of a novel and robust matching algorithm to detect structural motifs in large sets of target structures. The proposed methods were combined and implemented to a feature-rich and easy-to-use command line software tool written in Java.
Collapse
Affiliation(s)
- Florian Kaiser
- Department of Bioinformatics, University of Applied Sciences Mittweida , Mittweida, Germany
| | - Alexander Eisold
- Department of Bioinformatics, University of Applied Sciences Mittweida , Mittweida, Germany
| | - Dirk Labudde
- Department of Bioinformatics, University of Applied Sciences Mittweida , Mittweida, Germany
| |
Collapse
|
10
|
Alderson RG, Barker D, Mitchell JBO. One origin for metallo-β-lactamase activity, or two? An investigation assessing a diverse set of reconstructed ancestral sequences based on a sample of phylogenetic trees. J Mol Evol 2014; 79:117-29. [PMID: 25185655 PMCID: PMC4185109 DOI: 10.1007/s00239-014-9639-7] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2014] [Accepted: 08/11/2014] [Indexed: 01/04/2023]
Abstract
Bacteria use metallo-β-lactamase enzymes to hydrolyse lactam rings found in many antibiotics, rendering them ineffective. Metallo-β-lactamase activity is thought to be polyphyletic, having arisen on more than one occasion within a single functionally diverse homologous superfamily. Since discovery of multiple origins of enzymatic activity conferring antibiotic resistance has broad implications for the continued clinical use of antibiotics, we test the hypothesis of polyphyly further; if lactamase function has arisen twice independently, the most recent common ancestor (MRCA) is not expected to possess lactam-hydrolysing activity. Two major problems present themselves. Firstly, even with a perfectly known phylogeny, ancestral sequence reconstruction is error prone. Secondly, the phylogeny is not known, and in fact reconstructing a single, unambiguous phylogeny for the superfamily has proven impossible. To obtain a more statistical view of the strength of evidence for or against MRCA lactamase function, we reconstructed a sample of 98 MRCAs of the metallo-β-lactamases, each based on a different tree in a bootstrap sample of reconstructed phylogenies. InterPro sequence signatures and homology modelling were then used to assess our sample of MRCAs for lactamase functionality. Only 5 % of these models conform to our criteria for metallo-β-lactamase functionality, suggesting that the ancestor was unlikely to have been a metallo-β-lactamase. On the other hand, given that ancestral proteins may have had metallo-β-lactamase functionality with variation in sequence and structural properties compared with extant enzymes, our criteria are conservative, estimating a lower bound of evidence for metallo-β-lactamase functionality but not an upper bound.
Collapse
Affiliation(s)
- Rosanna G. Alderson
- Biomedical Sciences Research Complex and EaStCHEM School of Chemistry, Purdie Building, University of St Andrews, North Haugh, St Andrews, KY16 9ST Scotland, UK
| | - Daniel Barker
- Sir Harold Mitchell Building, School of Biology, University of St Andrews, St Andrews, KY16 9TH Scotland, UK
| | - John B. O. Mitchell
- Biomedical Sciences Research Complex and EaStCHEM School of Chemistry, Purdie Building, University of St Andrews, North Haugh, St Andrews, KY16 9ST Scotland, UK
| |
Collapse
|
11
|
Tóth-Petróczy A, Tawfik DS. The robustness and innovability of protein folds. Curr Opin Struct Biol 2014; 26:131-8. [PMID: 25038399 DOI: 10.1016/j.sbi.2014.06.007] [Citation(s) in RCA: 98] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2013] [Revised: 06/26/2014] [Accepted: 06/26/2014] [Indexed: 11/30/2022]
Abstract
Assignment of protein folds to functions indicates that >60% of folds carry out one or two enzymatic functions, while few folds, for example, the TIM-barrel and Rossmann folds, exhibit hundreds. Are there structural features that make a fold amenable to functional innovation (innovability)? Do these features relate to robustness--the ability to readily accumulate sequence changes? We discuss several hypotheses regarding the relationship between the architecture of a protein and its evolutionary potential. We describe how, in a seemingly paradoxical manner, opposite properties, such as high stability and rigidity versus conformational plasticity and structural order versus disorder, promote robustness and/or innovability. We hypothesize that polarity--differentiation and low connectivity between a protein's scaffold and its active-site--is a key prerequisite for innovability.
Collapse
Affiliation(s)
- Agnes Tóth-Petróczy
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Dan S Tawfik
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot 76100, Israel.
| |
Collapse
|
12
|
He L, Vandin F, Pandurangan G, Bailey-Kellogg C. Ballast: a ball-based algorithm for structural motifs. J Comput Biol 2013; 20:137-51. [PMID: 23383999 DOI: 10.1089/cmb.2012.0246] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Structural motifs encapsulate local sequence-structure-function relationships characteristic of related proteins, enabling the prediction of functional characteristics of new proteins, providing molecular-level insights into how those functions are performed, and supporting the development of variants specifically maintaining or perturbing function in concert with other properties. Numerous computational methods have been developed to search through databases of structures for instances of specified motifs. However, it remains an open problem how best to leverage the local geometric and chemical constraints underlying structural motifs in order to develop motif-finding algorithms that are both theoretically and practically efficient. We present a simple, general, efficient approach, called Ballast (ball-based algorithm for structural motifs), to match given structural motifs to given structures. Ballast combines the best properties of previously developed methods, exploiting the composition and local geometry of a structural motif and its possible instances in order to effectively filter candidate matches. We show that on a wide range of motif-matching problems, Ballast efficiently and effectively finds good matches, and we provide theoretical insights into why it works well. By supporting generic measures of compositional and geometric similarity, Ballast provides a powerful substrate for the development of motif-matching algorithms.
Collapse
Affiliation(s)
- Lu He
- Department of Computer Science, Dartmouth College, Hanover, NH 03755, USA
| | | | | | | |
Collapse
|
13
|
Kirshner DA, Nilmeier JP, Lightstone FC. Catalytic site identification--a web server to identify catalytic site structural matches throughout PDB. Nucleic Acids Res 2013; 41:W256-65. [PMID: 23680785 PMCID: PMC3692059 DOI: 10.1093/nar/gkt403] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
The catalytic site identification web server provides the innovative capability to find structural matches to a user-specified catalytic site among all Protein Data Bank proteins rapidly (in less than a minute). The server also can examine a user-specified protein structure or model to identify structural matches to a library of catalytic sites. Finally, the server provides a database of pre-calculated matches between all Protein Data Bank proteins and the library of catalytic sites. The database has been used to derive a set of hypothesized novel enzymatic function annotations. In all cases, matches and putative binding sites (protein structure and surfaces) can be visualized interactively online. The website can be accessed at http://catsid.llnl.gov.
Collapse
Affiliation(s)
| | | | - Felice C. Lightstone
- *To whom correspondence should be addressed. Tel: +1 925 423 8657; Fax: +1 925 423 0785;
| |
Collapse
|
14
|
Nilmeier JP, Kirshner DA, Wong SE, Lightstone FC. Rapid catalytic template searching as an enzyme function prediction procedure. PLoS One 2013; 8:e62535. [PMID: 23675414 PMCID: PMC3651201 DOI: 10.1371/journal.pone.0062535] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2013] [Accepted: 03/22/2013] [Indexed: 11/18/2022] Open
Abstract
We present an enzyme protein function identification algorithm, Catalytic Site Identification (CatSId), based on identification of catalytic residues. The method is optimized for highly accurate template identification across a diverse template library and is also very efficient in regards to time and scalability of comparisons. The algorithm matches three-dimensional residue arrangements in a query protein to a library of manually annotated, catalytic residues--The Catalytic Site Atlas (CSA). Two main processes are involved. The first process is a rapid protein-to-template matching algorithm that scales quadratically with target protein size and linearly with template size. The second process incorporates a number of physical descriptors, including binding site predictions, in a logistic scoring procedure to re-score matches found in Process 1. This approach shows very good performance overall, with a Receiver-Operator-Characteristic Area Under Curve (AUC) of 0.971 for the training set evaluated. The procedure is able to process cofactors, ions, nonstandard residues, and point substitutions for residues and ions in a robust and integrated fashion. Sites with only two critical (catalytic) residues are challenging cases, resulting in AUCs of 0.9411 and 0.5413 for the training and test sets, respectively. The remaining sites show excellent performance with AUCs greater than 0.90 for both the training and test data on templates of size greater than two critical (catalytic) residues. The procedure has considerable promise for larger scale searches.
Collapse
Affiliation(s)
- Jerome P. Nilmeier
- Biosciences and Biotechnology Division, Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California, United States of America
| | - Daniel A. Kirshner
- Biosciences and Biotechnology Division, Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California, United States of America
| | - Sergio E. Wong
- Biosciences and Biotechnology Division, Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California, United States of America
| | - Felice C. Lightstone
- Biosciences and Biotechnology Division, Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California, United States of America
| |
Collapse
|
15
|
Wang Z, Yin P, Lee JS, Parasuram R, Somarowthu S, Ondrechen MJ. Protein function annotation with Structurally Aligned Local Sites of Activity (SALSAs). BMC Bioinformatics 2013; 14 Suppl 3:S13. [PMID: 23514271 PMCID: PMC3584854 DOI: 10.1186/1471-2105-14-s3-s13] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Background The prediction of biochemical function from the 3D structure of a protein has proved to be much more difficult than was originally foreseen. A reliable method to test the likelihood of putative annotations and to predict function from structure would add tremendous value to structural genomics data. We report on a new method, Structurally Aligned Local Sites of Activity (SALSA), for the prediction of biochemical function based on a local structural match at the predicted catalytic or binding site. Results Implementation of the SALSA method is described. For the structural genomics protein PY01515 (PDB ID 2aqw) from Plasmodium yoelii, it is shown that the putative annotation, Orotidine 5'-monophosphate decarboxylase (OMPDC), is most likely correct. SALSA analysis of YP_001304206.1 (PDB ID 3h3l), a putative sugar hydrolase from Parabacteroides distasonis, shows that its active site does not bear close resemblance to any previously characterized member of its superfamily, the Concanavalin A-like lectins/glucanases. It is noted that three residues in the active site of the thermophilic beta-1,4-xylanase from Nonomuraea flexuosa (PDB ID 1m4w), Y78, E87, and E176, overlap with POOL-predicted residues of similar type, Y168, D153, and E232, in YP_001304206.1. The substrate recognition regions of the two proteins are rather different, suggesting that YP_001304206.1 is a new functional type within the superfamily. A structural genomics protein from Mycobacterium avium (PDB ID 3q1t) has been reported to be an enoyl-CoA hydratase (ECH), but SALSA analysis shows a poor match between the predicted residues for the SG protein and those of known ECHs. A better local structural match is obtained with Anabaena beta-diketone hydrolase (ABDH), a known β-diketone hydrolase from Cyanobacterium anabaena (PDB ID 2j5s). This suggests that the reported ECH function of the SG protein is incorrect and that it is more likely a β-diketone hydrolase. Conclusions A local site match provides a more compelling function prediction than that obtainable from a simple 3D structure match. The present method can confirm putative annotations, identify misannotation, and in some cases suggest a more probable annotation.
Collapse
Affiliation(s)
- Zhouxi Wang
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115, USA
| | | | | | | | | | | |
Collapse
|
16
|
Wu CY, Hwa YH, Chen YC, Lim C. Hidden relationship between conserved residues and locally conserved phosphate-binding structures in NAD(P)-binding proteins. J Phys Chem B 2012; 116:5644-52. [PMID: 22530587 DOI: 10.1021/jp3014332] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
A one-dimensional (1D) motif usually comprises conserved essential residues involved in catalysis, ligand binding, or maintaining a specific structure. However, it cannot be easily detected in proteins with low sequence identity because it is difficult to (1) identify protein sequences suspected to contain the motif, and (2) align sequences with little sequence identity to spot the conserved residues. Here, we present a strategy for discovering phosphate-binding 1D motifs in NAD(P)-binding proteins sharing low sequence identity that overcomes these two hurdles by determining all distinct locally conserved pyrophosphate-binding structures and aligning the same-length sequences comprising each of these structures to identify the conserved residues. We show that the sequence motifs derived from the distinct pyrophosphate-binding structures yield different numbers/spacing of conserved Gly residues. We also show that they depend on the side chain orientations and cofactor type (NAD or NADP). Thus, sequence motifs derived from local similarity of backbone structures without consideration of the cofactor type and/or side chain orientations would reduce their reliability in annotating protein function from sequence alone. The three-dimensional (3D) and 1D motifs comprising conserved residues in nonredundant proteins reveal hidden relationships between the protein structure/function and sequence as well as protein-cofactor interactions.
Collapse
Affiliation(s)
- Chih Yuan Wu
- Institute of Biomedical Sciences, Academia Sinica , Taipei 115, Taiwan
| | | | | | | |
Collapse
|
17
|
Classification of protein functional surfaces using structural characteristics. Proc Natl Acad Sci U S A 2012; 109:1170-5. [PMID: 22238424 DOI: 10.1073/pnas.1119684109] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Protein structure and function are closely related, especially in functional surfaces, which are local spatial regions that perform the biological functions. Also, protein structures tend to evolve more slowly than amino acid sequences. We have therefore developed a method to classify proteins using the structures of functional surfaces; we call it protein surface classification (PSC). PSC may reflect functional relationships among proteins and may detect evolutionary relationships among highly divergent sequences. We focused on the surfaces of ligand-bound regions because they represent well-defined structures. Specifically, we used structural attributes to measure similarities between binding surfaces and constructed a PSC library of ~2,000 binding surface types from the bound forms. Using flavin mononucleotide-binding proteins and glycosidases as examples, we show how the evolutionary position of an uncharacterized protein can be defined and its function inferred from the characterized members of the same surface subtype. We found that proteins with the same enzyme nomenclature may be divided into subtypes and that two proteins in the same CATH (Class, Architecture, Topology, Homologous superfamily) fold may belong to two different surface types. In conclusion, our approach complements the sequence-based and fold-domain classifications and has the advantage of associating the shape of a protein with its biological function. As an expandable library, PSC provides a resource of spatial patterns for studying the evolution of protein structure and function.
Collapse
|
18
|
Tseng YY, Li WH. Evolutionary approach to predicting the binding site residues of a protein from its primary sequence. Proc Natl Acad Sci U S A 2011; 108:5313-8. [PMID: 21402946 PMCID: PMC3069214 DOI: 10.1073/pnas.1102210108] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Protein binding site residues, especially catalytic residues, play a central role in protein function. Because more than 99% of the ∼ 12 million protein sequences in the nonredundant protein database have no structural information, it is desirable to develop methods to predict the binding site residues of a protein from its primary sequence. This task is highly challenging, because the binding site residues constitute only a small portion of a protein. However, the binding site residues of a protein are clustered in its functional pocket(s), and their spatial patterns tend to be conserved in evolution. To take advantage of these evolutionary and structural principles, we constructed a database of ∼ 50,000 templates (called the pocket-containing segment database), each of which includes not only a sequence segment that contains a functional pocket but also the structural attributes of the pocket. To use this database, we designed a template-matching technique, termed residue-matching profiling, and established a criterion for selecting templates for a query sequence. Finally, we developed a probabilistic model for assigning spatial scores to matched residues between the template and query sequence in local alignments using a set of selected scoring matrices and for computing the binding likelihood of each matched residue in the query sequence. From the likelihoods, one can predict the binding site residues in the query sequence. An automated computational pipeline was developed for our method. A performance evaluation shows that our method achieves a 70% precision in predicting binding site residues at 60% sensitivity.
Collapse
Affiliation(s)
- Yan Yuan Tseng
- Department of Ecology and Evolution, University of Chicago, Chicago, IL 60637; and
| | - Wen-Hsiung Li
- Department of Ecology and Evolution, University of Chicago, Chicago, IL 60637; and
- Biodiversity Research Center, Academia Sinica, Tapei 115, Taiwan
| |
Collapse
|
19
|
Dundas J, Adamian L, Liang J. Structural signatures of enzyme binding pockets from order-independent surface alignment: a study of metalloendopeptidase and NAD binding proteins. J Mol Biol 2011; 406:713-29. [PMID: 21145898 PMCID: PMC3061237 DOI: 10.1016/j.jmb.2010.12.005] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2010] [Revised: 10/14/2010] [Accepted: 12/03/2010] [Indexed: 10/18/2022]
Abstract
Detecting similarities between local binding surfaces can facilitate identification of enzyme binding sites and prediction of enzyme functions, and aid in our understanding of enzyme mechanisms. Constructing a template of local surface characteristics for a specific enzyme function or binding activity is a challenging task, as the size and shape of the binding surfaces of a biochemical function often vary. Here we introduce the concept of signature binding pockets, which captures information on preserved and varied atomic positions at multiresolution levels. For proteins with complex enzyme binding and activity, multiple signatures arise naturally in our model, forming a signature basis set that characterizes this class of proteins. Both signatures and signature basis sets can be automatically constructed by a method called SOLAR (Signature Of Local Active Regions). This method is based on a sequence-order-independent alignment of computed binding surface pockets. SOLAR also provides a structure-based multiple sequence fragment alignment to facilitate the interpretation of computed signatures. By studying a family of evolutionarily related proteins, we show that for metzincin metalloendopeptidase, which has a broad spectrum of substrate binding, signature and basis set pockets can be used to discriminate metzincins from other enzymes, to predict the subclass of metzincins functions, and to identify specific binding surfaces. Studying unrelated proteins that have evolved to bind to the same NAD cofactor, we constructed signatures of NAD binding pockets and used them to predict NAD binding proteins and to locate NAD binding pockets. By measuring preservation ratio and location variation, our method can identify residues and atoms that are important for binding affinity and specificity. In both cases, we show that signatures and signature basis set reveal significant biological insight.
Collapse
Affiliation(s)
- Joe Dundas
- Department of Bioengineering, University of Illinois at Chicago, 835 S. Wolcott, Chicago, Illinois, 60612
| | - Larisa Adamian
- Department of Bioengineering, University of Illinois at Chicago, 835 S. Wolcott, Chicago, Illinois, 60612
| | - Jie Liang
- Department of Bioengineering, University of Illinois at Chicago, 835 S. Wolcott, Chicago, Illinois, 60612
| |
Collapse
|
20
|
Moll M, Bryant DH, Kavraki LE. The LabelHash algorithm for substructure matching. BMC Bioinformatics 2010; 11:555. [PMID: 21070651 PMCID: PMC2996407 DOI: 10.1186/1471-2105-11-555] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2010] [Accepted: 11/11/2010] [Indexed: 01/01/2025] Open
Abstract
Background There is an increasing number of proteins with known structure but unknown function. Determining their function would have a significant impact on understanding diseases and designing new therapeutics. However, experimental protein function determination is expensive and very time-consuming. Computational methods can facilitate function determination by identifying proteins that have high structural and chemical similarity. Results We present LabelHash, a novel algorithm for matching substructural motifs to large collections of protein structures. The algorithm consists of two phases. In the first phase the proteins are preprocessed in a fashion that allows for instant lookup of partial matches to any motif. In the second phase, partial matches for a given motif are expanded to complete matches. The general applicability of the algorithm is demonstrated with three different case studies. First, we show that we can accurately identify members of the enolase superfamily with a single motif. Next, we demonstrate how LabelHash can complement SOIPPA, an algorithm for motif identification and pairwise substructure alignment. Finally, a large collection of Catalytic Site Atlas motifs is used to benchmark the performance of the algorithm. LabelHash runs very efficiently in parallel; matching a motif against all proteins in the 95% sequence identity filtered non-redundant Protein Data Bank typically takes no more than a few minutes. The LabelHash algorithm is available through a web server and as a suite of standalone programs at http://labelhash.kavrakilab.org. The output of the LabelHash algorithm can be further analyzed with Chimera through a plugin that we developed for this purpose. Conclusions LabelHash is an efficient, versatile algorithm for large-scale substructure matching. When LabelHash is running in parallel, motifs can typically be matched against the entire PDB on the order of minutes. The algorithm is able to identify functional homologs beyond the twilight zone of sequence identity and even beyond fold similarity. The three case studies presented in this paper illustrate the versatility of the algorithm.
Collapse
Affiliation(s)
- Mark Moll
- Department of Computer Science, Rice University, Houston, TX 77005, USA.
| | | | | |
Collapse
|
21
|
Bryant DH, Moll M, Chen BY, Fofanov VY, Kavraki LE. Analysis of substructural variation in families of enzymatic proteins with applications to protein function prediction. BMC Bioinformatics 2010; 11:242. [PMID: 20459833 PMCID: PMC2885373 DOI: 10.1186/1471-2105-11-242] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2009] [Accepted: 05/11/2010] [Indexed: 12/02/2022] Open
Abstract
Background Structural variations caused by a wide range of physico-chemical and biological sources directly influence the function of a protein. For enzymatic proteins, the structure and chemistry of the catalytic binding site residues can be loosely defined as a substructure of the protein. Comparative analysis of drug-receptor substructures across and within species has been used for lead evaluation. Substructure-level similarity between the binding sites of functionally similar proteins has also been used to identify instances of convergent evolution among proteins. In functionally homologous protein families, shared chemistry and geometry at catalytic sites provide a common, local point of comparison among proteins that may differ significantly at the sequence, fold, or domain topology levels. Results This paper describes two key results that can be used separately or in combination for protein function analysis. The Family-wise Analysis of SubStructural Templates (FASST) method uses all-against-all substructure comparison to determine Substructural Clusters (SCs). SCs characterize the binding site substructural variation within a protein family. In this paper we focus on examples of automatically determined SCs that can be linked to phylogenetic distance between family members, segregation by conformation, and organization by homology among convergent protein lineages. The Motif Ensemble Statistical Hypothesis (MESH) framework constructs a representative motif for each protein cluster among the SCs determined by FASST to build motif ensembles that are shown through a series of function prediction experiments to improve the function prediction power of existing motifs. Conclusions FASST contributes a critical feedback and assessment step to existing binding site substructure identification methods and can be used for the thorough investigation of structure-function relationships. The application of MESH allows for an automated, statistically rigorous procedure for incorporating structural variation data into protein function prediction pipelines. Our work provides an unbiased, automated assessment of the structural variability of identified binding site substructures among protein structure families and a technique for exploring the relation of substructural variation to protein function. As available proteomic data continues to expand, the techniques proposed will be indispensable for the large-scale analysis and interpretation of structural data.
Collapse
Affiliation(s)
- Drew H Bryant
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | | | | | | |
Collapse
|
22
|
Bandyopadhyay D, Huan J, Prins J, Snoeyink J, Wang W, Tropsha A. Identification of family-specific residue packing motifs and their use for structure-based protein function prediction: II. Case studies and applications. J Comput Aided Mol Des 2009; 23:785-97. [PMID: 19548090 DOI: 10.1007/s10822-009-9277-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2008] [Accepted: 04/22/2009] [Indexed: 11/25/2022]
Abstract
This paper describes several case studies concerning protein function inference from its structure using our novel approach described in the accompanying paper. This approach employs family-specific motifs, i.e. three-dimensional amino acid packing patterns that are statistically prevalent within a protein family. For our case studies we have selected families from the SCOP and EC classifications and analyzed the discriminating power of the motifs in depth. We have devised several benchmarks to compare motifs mined from unweighted topological graph representations of protein structures with those from distance-labeled (weighted) representations, demonstrating the superiority of the latter for function inference in most families. We have tested the robustness of our motif library by inferring the function of new members added to SCOP families, and discriminating between several families that are structurally similar but functionally divergent. Furthermore we have applied our method to predict function for several proteins characterized in structural genomics projects, including orphan structures, and we discuss several selected predictions in depth. Some of our predictions have been corroborated by other computational methods, and some have been validated by independent experimental studies, validating our approach for protein function inference from structure.
Collapse
|
23
|
Xie L, Xie L, Bourne PE. A unified statistical model to support local sequence order independent similarity searching for ligand-binding sites and its application to genome-based drug discovery. Bioinformatics 2009; 25:i305-12. [PMID: 19478004 PMCID: PMC2687974 DOI: 10.1093/bioinformatics/btp220] [Citation(s) in RCA: 71] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Functional relationships between proteins that do not share global structure similarity can be established by detecting their ligand-binding-site similarity. For a large-scale comparison, it is critical to accurately and efficiently assess the statistical significance of this similarity. Here, we report an efficient statistical model that supports local sequence order independent ligand-binding-site similarity searching. Most existing statistical models only take into account the matching vertices between two sites that are defined by a fixed number of points. In reality, the boundary of the binding site is not known or is dependent on the bound ligand making these approaches limited. To address these shortcomings and to perform binding-site mapping on a genome-wide scale, we developed a sequence-order independent profile-profile alignment (SOIPPA) algorithm that is able to detect local similarity between unknown binding sites a priori. The SOIPPA scoring integrates geometric, evolutionary and physical information into a unified framework. However, this imposes a significant challenge in assessing the statistical significance of the similarity because the conventional probability model that is based on fixed-point matching cannot be applied. Here we find that scores for binding-site matching by SOIPPA follow an extreme value distribution (EVD). Benchmark studies show that the EVD model performs at least two-orders faster and is more accurate than the non-parametric statistical method in the previous SOIPPA version. Efficient statistical analysis makes it possible to apply SOIPPA to genome-based drug discovery. Consequently, we have applied the approach to the structural genome of Mycobacterium tuberculosis to construct a protein-ligand interaction network. The network reveals highly connected proteins, which represent suitable targets for promiscuous drugs.
Collapse
Affiliation(s)
- Lei Xie
- San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA.
| | | | | |
Collapse
|
24
|
Tseng YY, Dundas J, Liang J. Predicting protein function and binding profile via matching of local evolutionary and geometric surface patterns. J Mol Biol 2009; 387:451-64. [PMID: 19154742 PMCID: PMC2670802 DOI: 10.1016/j.jmb.2008.12.072] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2008] [Revised: 12/19/2008] [Accepted: 12/23/2008] [Indexed: 11/25/2022]
Abstract
Inferring protein functions from structures is a challenging task, as a large number of orphan protein structures from structural genomics project are now solved without their biochemical functions characterized. For proteins binding to similar substrates or ligands and carrying out similar functions, their binding surfaces are under similar physicochemical constraints, and hence the sets of allowed and forbidden residue substitutions are similar. However, it is difficult to isolate such selection pressure due to protein function from selection pressure due to protein folding, and evolutionary relationship reflected by global sequence and structure similarities between proteins is often unreliable for inferring protein function. We have developed a method, called pevoSOAR (pocket-based evolutionary search of amino acid residues), for predicting protein functions by solving the problem of uncovering amino acids residue substitution pattern due to protein function and separating it from amino acids substitution pattern due to protein folding. We incorporate evolutionary information specific to an individual binding region and match local surfaces on a large scale with millions of precomputed protein surfaces to identify those with similar functions. Our pevoSOAR method also generates a probablistic model called the computed binding a profile that characterizes protein-binding activities that may involve multiple substrates or ligands. We show that our method can be used to predict enzyme functions with accuracy. Our method can also assess enzyme binding specificity and promiscuity. In an objective large-scale test of 100 enzyme families with thousands of structures, our predictions are found to be sensitive and specific: At the stringent specificity level of 99.98%, we can correctly predict enzyme functions for 80.55% of the proteins. The overall area under the receiver operating characteristic curve measuring the performance of our prediction is 0.955, close to the perfect value of 1.00. The best Matthews coefficient is 86.6%. Our method also works well in predicting the biochemical functions of orphan proteins from structural genomics projects.
Collapse
Affiliation(s)
- Yan Yuan Tseng
- Department of Bioengineering, University of Illinois at Chicago, 851 S. Morgan Street, Room 218, SEO, MC-063, Chicago, IL 60607-7052, USA
| | | | | |
Collapse
|
25
|
Zamocky M, Jakopitsch C, Furtmüller PG, Dunand C, Obinger C. The peroxidase-cyclooxygenase superfamily: Reconstructed evolution of critical enzymes of the innate immune system. Proteins 2008; 72:589-605. [PMID: 18247411 DOI: 10.1002/prot.21950] [Citation(s) in RCA: 132] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/30/2025]
Abstract
The authors have reconstructed the phylogenetic relationships of the main evolutionary lines of mammalian heme containing peroxidases. The sequences of intensively investigated human myeloperoxidase, eosinophil peroxidase, and lactoperoxidase, which participate in host defence against infections, were aligned together with newly found open reading frames coding for highly similar putative peroxidase domains in all kingdoms of life. The evolutionary relationships were reconstructed using neighbor-joining, maximum parsimony, and maximum likelihood methods. It is demonstrated that this enzyme superfamily obeys the rules of birth-and-death model of multigene family evolution and contains proteins with a variety of function that could be grouped in seven subfamilies. On the basis of occurrence and the fact that two main enzymatic activities are related with these metalloproteins, they propose the name peroxidase-cyclooxygenase superfamily for this widely spread group of heme-containing oxidoreductases. Well known structure-function relationships in mammalian peroxidases formed the basis for the critical inspection of all subfamilies. The presented data unequivocally suggest that predecessor genes of mammalian heme peroxidases have segregated very early in evolution. Before organisms developed an acquired immunity, their antimicrobial defence depended on enzymes that were recruited upon pathogen invasion and could produce antimicrobial reaction products. Thus, these peroxidatic heme proteins evolved to important components in the innate immune defence system. This work shows that even in certain prokaryotic organisms, genes encoding putative antimicrobial enzymes are found providing a group of bacteria with an evolutionary advantage over the others.
Collapse
Affiliation(s)
- Marcel Zamocky
- Department of Chemistry, Division of Biochemistry, BOKU-University of Natural Resources and Applied Life Sciences, A-1190 Vienna, Austria.
| | | | | | | | | |
Collapse
|
26
|
Chien TY, Chang DTH, Chen CY, Weng YZ, Hsu CM. E1DS: catalytic site prediction based on 1D signatures of concurrent conservation. Nucleic Acids Res 2008; 36:W291-6. [PMID: 18524800 PMCID: PMC2447799 DOI: 10.1093/nar/gkn324] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2008] [Revised: 04/25/2008] [Accepted: 05/07/2008] [Indexed: 11/21/2022] Open
Abstract
Large-scale automatic annotation of protein sequences remains challenging in postgenomics era. E1DS is designed for annotating enzyme sequences based on a repository of 1D signatures. The employed sequence signatures are derived using a novel pattern mining approach that discovers long motifs consisted of several sequential blocks (conserved segments). Each of the sequential blocks is considerably conserved among the protein members of an EC group. Moreover, a signature includes at least three sequential blocks that are concurrently conserved, i.e. frequently observed together in sequences. In other words, a sequence signature is consisted of residues from multiple regions of the protein sequence, which echoes the observation that an enzyme catalytic site is usually constituted of residues that are largely separated in the sequence. E1DS currently contains 5421 sequence signatures that in total cover 932 4-digital EC numbers. E1DS is evaluated based on a collection of enzymes with catalytic sites annotated in Catalytic Site Atlas. When compared to the famous pattern database PROSITE, predictions based on E1DS signatures are considered more sensitive in identifying catalytic sites and the involved residues. E1DS is available at http://e1ds.ee.ncku.edu.tw/ and a mirror site can be found at http://e1ds.csbb.ntu.edu.tw/.
Collapse
Affiliation(s)
- Ting-Ying Chien
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei 106, Department of Electrical Engineering, National Cheng Kung University, Tainan 701, Department of Bio-Industrial Mechatronics Engineering, National Taiwan University, Taipei 106 and Department of Computer Science and Engineering, Yuan Ze University, Chung-Li 320, Taiwan, ROC
| | - Darby Tien-Hao Chang
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei 106, Department of Electrical Engineering, National Cheng Kung University, Tainan 701, Department of Bio-Industrial Mechatronics Engineering, National Taiwan University, Taipei 106 and Department of Computer Science and Engineering, Yuan Ze University, Chung-Li 320, Taiwan, ROC
| | - Chien-Yu Chen
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei 106, Department of Electrical Engineering, National Cheng Kung University, Tainan 701, Department of Bio-Industrial Mechatronics Engineering, National Taiwan University, Taipei 106 and Department of Computer Science and Engineering, Yuan Ze University, Chung-Li 320, Taiwan, ROC
| | - Yi-Zhong Weng
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei 106, Department of Electrical Engineering, National Cheng Kung University, Tainan 701, Department of Bio-Industrial Mechatronics Engineering, National Taiwan University, Taipei 106 and Department of Computer Science and Engineering, Yuan Ze University, Chung-Li 320, Taiwan, ROC
| | - Chen-Ming Hsu
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei 106, Department of Electrical Engineering, National Cheng Kung University, Tainan 701, Department of Bio-Industrial Mechatronics Engineering, National Taiwan University, Taipei 106 and Department of Computer Science and Engineering, Yuan Ze University, Chung-Li 320, Taiwan, ROC
| |
Collapse
|
27
|
Xie L, Bourne PE. Detecting evolutionary relationships across existing fold space, using sequence order-independent profile-profile alignments. Proc Natl Acad Sci U S A 2008; 105:5441-6. [PMID: 18385384 PMCID: PMC2291117 DOI: 10.1073/pnas.0704422105] [Citation(s) in RCA: 181] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2007] [Indexed: 11/18/2022] Open
Abstract
Here, a scalable, accurate, reliable, and robust protein functional site comparison algorithm is presented. The key components of the algorithm consist of a reduced representation of the protein structure and a sequence order-independent profile-profile alignment (SOIPPA). We show that SOIPPA is able to detect distant evolutionary relationships in cases where both a global sequence and structure relationship remains obscure. Results suggest evolutionary relationships across several previously evolutionary distinct protein structure superfamilies. SOIPPA, along with an increased coverage of protein fold space afforded by the structural genomics initiative, can be used to further test the notion that fold space is continuous rather than discrete.
Collapse
Affiliation(s)
- Lei Xie
- *San Diego Supercomputer Center and
| | - Philip E. Bourne
- *San Diego Supercomputer Center and
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California at San Diego, 9500 Gilman Drive, La Jolla, CA 92093
| |
Collapse
|
28
|
Tong W, Williams RJ, Wei Y, Murga LF, Ko J, Ondrechen MJ. Enhanced performance in prediction of protein active sites with THEMATICS and support vector machines. Protein Sci 2007; 17:333-41. [PMID: 18096640 DOI: 10.1110/ps.073213608] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Theoretical microscopic titration curves (THEMATICS) is a computational method for the identification of active sites in proteins through deviations in computed titration behavior of ionizable residues. While the sensitivity to catalytic sites is high, the previously reported sensitivity to catalytic residues was not as high, about 50%. Here THEMATICS is combined with support vector machines (SVM) to improve sensitivity for catalytic residue prediction from protein 3D structure alone. For a test set of 64 proteins taken from the Catalytic Site Atlas (CSA), the average recall rate for annotated catalytic residues is 61%; good precision is maintained selecting only 4% of all residues. The average false positive rate, using the CSA annotations is only 3.2%, far lower than other 3D-structure-based methods. THEMATICS-SVM returns higher precision, lower false positive rate, and better overall performance, compared with other 3D-structure-based methods. Comparison is also made with the latest machine learning methods that are based on both sequence alignments and 3D structures. For annotated sets of well-characterized enzymes, THEMATICS-SVM performance compares very favorably with methods that utilize sequence homology. However, since THEMATICS depends only on the 3D structure of the query protein, no decline in performance is expected when applied to novel folds, proteins with few sequence homologues, or even orphan sequences. An extension of the method to predict non-ionizable catalytic residues is also presented. THEMATICS-SVM predicts a local network of ionizable residues with strong interactions between protonation events; this appears to be a special feature of enzyme active sites.
Collapse
Affiliation(s)
- Wenxu Tong
- College of Computer and Information Science, Northeastern University, Boston, Massachusetts 02115, USA
| | | | | | | | | | | |
Collapse
|
29
|
Schmidberger JW, Wilce JA, Tsang JSH, Wilce MCJ. Crystal structures of the substrate free-enzyme, and reaction intermediate of the HAD superfamily member, haloacid dehalogenase DehIVa from Burkholderia cepacia MBA4. J Mol Biol 2007; 368:706-17. [PMID: 17368477 DOI: 10.1016/j.jmb.2007.02.015] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2006] [Revised: 02/02/2007] [Accepted: 02/07/2007] [Indexed: 11/17/2022]
Abstract
DehIVa is a haloacid dehalogenase (EC 3.8.1.2) from the soil and water borne bacterium Burkholderia cepacia MBA4, which belongs to the functionally variable haloacid dehalogenase (HAD) superfamily of enzymes. The haloacid dehalogenases catalyse the removal of halides from haloacids resulting in a hydroxlated product. These enzymes are of interest for their potential to degrade recalcitrant halogenated environmental pollutants and their use in the synthesis of industrial chemicals. The haloacid dehalogenases utilise a nucleophilic attack on the substrate by an aspartic acid residue to form an enzyme-substrate ester bond and concomitantly cleaving of the carbon-halide bond and release of a hydroxylated product following ester hydrolysis. We present the crystal structures of both the substrate-free DehIVa refined to 1.93 A resolution and DehIVa covalently bound to l-2-monochloropropanoate trapped as a reaction intermediate, refined to 2.7 A resolution. Electron density consistent with a previously unidentified yet anticipated water molecule in the active site poised to donate its hydroxyl group to the product and its proton to the catalytic Asp11 is evident. It has been unclear how substrate enters the active site of this and related enzymes. The results of normal mode analysis (NMA) are presented and suggest a means whereby the predicted global dynamics of the enzyme allow for entry of the substrate into the active site. In the context of these results, the possible role of Arg42 and Asn178 in a "lock down" mechanism affecting active site access is discussed. In silico substrate docking of enantiomeric substrates has been examined in order to evaluate the enzymes enantioselectivity.
Collapse
Affiliation(s)
- Jason W Schmidberger
- School of Medicine and Pharmacology, The University of Western Australia, Perth, Australia
| | | | | | | |
Collapse
|
30
|
Tseng YY, Liang J. Predicting enzyme functional surfaces and locating key residues automatically from structures. Ann Biomed Eng 2007; 35:1037-42. [PMID: 17294116 DOI: 10.1007/s10439-006-9241-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2006] [Accepted: 11/27/2006] [Indexed: 10/23/2022]
Abstract
Locating functionally important protein surfaces and identifying the catalytic site residues are critical for studying enzyme functions. Here, we present a method for predicting and characterizing catalytic sites of enzymes that is fold-independent. By extract atomic patterns of catalytic residues in surface pockets computed geometrically, we develop a library of atomic patterns on protein functional surfaces of ca 700 structures. Together with propensities of secondary structures and residue occurrence in active sites, we develop a method to identify functionally important surfaces on protein structures and to locate key residues. We discuss application of our methods to amylase, dioxygenase, deaminase, dehalogenase, and hydratase. A large scale cross-validated prediction study shows that our method is sensitive and specific. Our method can used to study enzyme function, drug design, and engineering novel biochemical function.
Collapse
Affiliation(s)
- Yan Yuan Tseng
- Department of Bioengineering, University of Illinois at Chicago, Chicago, IL 60607-7052, USA
| | | |
Collapse
|
31
|
Wei Y, Ringe D, Wilson MA, Ondrechen MJ. Identification of functional subclasses in the DJ-1 superfamily proteins. PLoS Comput Biol 2007; 3:e10. [PMID: 17257049 PMCID: PMC1782040 DOI: 10.1371/journal.pcbi.0030010] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2006] [Accepted: 12/07/2006] [Indexed: 12/02/2022] Open
Abstract
Genomics has posed the challenge of determination of protein function from sequence and/or 3-D structure. Functional assignment from sequence relationships can be misleading, and structural similarity does not necessarily imply functional similarity. Proteins in the DJ-1 family, many of which are of unknown function, are examples of proteins with both sequence and fold similarity that span multiple functional classes. THEMATICS (theoretical microscopic titration curves), an electrostatics-based computational approach to functional site prediction, is used to sort proteins in the DJ-1 family into different functional classes. Active site residues are predicted for the eight distinct DJ-1 proteins with available 3-D structures. Placement of the predicted residues onto a structural alignment for six of these proteins reveals three distinct types of active sites. Each type overlaps only partially with the others, with only one residue in common across all six sets of predicted residues. Human DJ-1 and YajL from Escherichia coli have very similar predicted active sites and belong to the same probable functional group. Protease I, a known cysteine protease from Pyrococcus horikoshii, and PfpI/YhbO from E. coli, a hypothetical protein of unknown function, belong to a separate class. THEMATICS predicts a set of residues that is typical of a cysteine protease for Protease I; the prediction for PfpI/YhbO bears some similarity. YDR533Cp from Saccharomyces cerevisiae, of unknown function, and the known chaperone Hsp31 from E. coli constitute a third group with nearly identical predicted active sites. While the first four proteins have predicted active sites at dimer interfaces, YDR533Cp and Hsp31 both have predicted sites contained within each subunit. Although YDR533Cp and Hsp31 form different dimers with different orientations between the subunits, the predicted active sites are superimposable within the monomer structures. Thus, the three predicted functional classes form four different types of quaternary structures. The computational prediction of the functional sites for protein structures of unknown function provides valuable clues for functional classification.
Collapse
Affiliation(s)
- Ying Wei
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, Massachusetts, United States of America
- Institute for Complex Scientific Software, Northeastern University, Boston, Massachusetts, United States of America
| | - Dagmar Ringe
- Department of Biochemistry, Brandeis University, Waltham, Massachusetts, United States of America
- Department of Chemistry, Brandeis University, Waltham, Massachusetts, United States of America
- Rosenstiel Basic Medical Sciences Research Center, Brandeis University, Waltham, Massachusetts, United States of America
| | - Mark A Wilson
- Department of Biochemistry, Brandeis University, Waltham, Massachusetts, United States of America
- Department of Chemistry, Brandeis University, Waltham, Massachusetts, United States of America
- Rosenstiel Basic Medical Sciences Research Center, Brandeis University, Waltham, Massachusetts, United States of America
| | - Mary Jo Ondrechen
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, Massachusetts, United States of America
- Institute for Complex Scientific Software, Northeastern University, Boston, Massachusetts, United States of America
| |
Collapse
|
32
|
Lisewski AM, Lichtarge O. Rapid detection of similarity in protein structure and function through contact metric distances. Nucleic Acids Res 2006; 34:e152. [PMID: 17130161 PMCID: PMC1702494 DOI: 10.1093/nar/gkl788] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The characterization of biological function among newly determined protein structures is a central challenge in structural genomics. One class of computational solutions to this problem is based on the similarity of protein structure. Here, we implement a simple yet efficient measure of protein structure similarity, the contact metric. Even though its computation avoids structural alignments and is therefore nearly instantaneous, we find that small values correlate with geometrical root mean square deviations obtained from structural alignments. To test whether the contact metric detects functional similarity, as defined by Gene Ontology (GO) terms, it was compared in large-scale computational experiments to four other measures of structural similarity, including alignment algorithms as well as alignment independent approaches. The contact metric was the fastest method and its sensitivity, at any given specificity level, was a close second only to Fast Alignment and Search Tool—a structural alignment method that is slower by three orders of magnitude. Critically, nearly 40% of correct functional inferences by the contact metric were not identified by any other approach, which shows that the contact metric is complementary and computationally efficient in detecting functional relationships between proteins. A public ‘Contact Metric Internet Server’ is provided.
Collapse
Affiliation(s)
| | - Olivier Lichtarge
- To whom correspondence should be addressed. Tel: +1 713 798 5646; Fax: +1 713 798 7773;
| |
Collapse
|
33
|
Lu CH, Lin YS, Chen YC, Yu CS, Chang SY, Hwang JK. The fragment transformation method to detect the protein structural motifs. Proteins 2006; 63:636-43. [PMID: 16470805 DOI: 10.1002/prot.20904] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
To identify functional structural motifs from protein structures of unknown function becomes increasingly important in recent years due to the progress of the structural genomics initiatives. Although certain structural patterns such as the Asp-His-Ser catalytic triad are easy to detect because of their conserved residues and stringently constrained geometry, it is usually more challenging to detect a general structural motifs like, for example, the betabetaalpha-metal binding motif, which has a much more variable conformation and sequence. At present, the identification of these motifs usually relies on manual procedures based on different structure and sequence analysis tools. In this study, we develop a structural alignment algorithm combining both structural and sequence information to identify the local structure motifs. We applied our method to the following examples: the betabetaalpha-metal binding motif and the treble clef motif. The betabetaalpha-metal binding motif plays an important role in nonspecific DNA interactions and cleavage in host defense and apoptosis. The treble clef motif is a zinc-binding motif adaptable to diverse functions such as the binding of nucleic acid and hydrolysis of phosphodiester bonds. Our results are encouraging, indicating that we can effectively identify these structural motifs in an automatic fashion. Our method may provide a useful means for automatic functional annotation through detecting structural motifs associated with particular functions.
Collapse
Affiliation(s)
- Chih-Hao Lu
- Institute of Bioinformatics, National Chiao Tung University, Hsinchu, Taiwan, Republic of China
| | | | | | | | | | | |
Collapse
|
34
|
Burroughs AM, Allen KN, Dunaway-Mariano D, Aravind L. Evolutionary genomics of the HAD superfamily: understanding the structural adaptations and catalytic diversity in a superfamily of phosphoesterases and allied enzymes. J Mol Biol 2006; 361:1003-34. [PMID: 16889794 DOI: 10.1016/j.jmb.2006.06.049] [Citation(s) in RCA: 343] [Impact Index Per Article: 18.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2005] [Revised: 06/16/2006] [Accepted: 06/20/2006] [Indexed: 11/21/2022]
Abstract
The HAD (haloacid dehalogenase) superfamily includes phosphoesterases, ATPases, phosphonatases, dehalogenases, and sugar phosphomutases acting on a remarkably diverse set of substrates. The availability of numerous crystal structures of representatives belonging to diverse branches of the HAD superfamily provides us with a unique opportunity to reconstruct their evolutionary history and uncover the principal determinants that led to their diversification of structure and function. To this end we present a comprehensive analysis of the HAD superfamily that identifies their unique structural features and provides a detailed classification of the entire superfamily. We show that at the highest level the HAD superfamily is unified with several other superfamilies, namely the DHH, receiver (CheY-like), von Willebrand A, TOPRIM, classical histone deacetylases and PIN/FLAP nuclease domains, all of which contain a specific form of the Rossmannoid fold. These Rossmannoid folds are distinguished from others by the presence of equivalently placed acidic catalytic residues, including one at the end of the first core beta-strand of the central sheet. The HAD domain is distinguished from these related Rossmannoid folds by two key structural signatures, a "squiggle" (a single helical turn) and a "flap" (a beta hairpin motif) located immediately downstream of the first beta-strand of their core Rossmanoid fold. The squiggle and the flap motifs are predicted to provide the necessary mobility to these enzymes for them to alternate between the "open" and "closed" conformations. In addition, most members of the HAD superfamily contains inserts, termed caps, occurring at either of two positions in the core Rossmannoid fold. We show that the cap modules have been independently inserted into these two stereotypic positions on multiple occasions in evolution and display extensive evolutionary diversification independent of the core catalytic domain. The first group of caps, the C1 caps, is directly inserted into the flap motif and regulates access of reactants to the active site. The second group, the C2 caps, forms a roof over the active site, and access to their internal cavities might be in part regulated by the movement of the flap. The diversification of the cap module was a major factor in the exploration of a vast substrate space in the course of the evolution of this superfamily. We show that the HAD superfamily contains 33 major families distributed across the three superkingdoms of life. Analysis of the phyletic patterns suggests that at least five distinct HAD proteins are traceable to the last universal common ancestor (LUCA) of all extant organisms. While these prototypes diverged prior to the emergence of the LUCA, the major diversification in terms of both substrate specificity and reaction types occurred after the radiation of the three superkingdoms of life, primarily in bacteria. Most major diversification events appear to correlate with the acquisition of new metabolic capabilities, especially related to the elaboration of carbohydrate metabolism in the bacteria. The newly identified relationships and functional predictions provided here are likely to aid the future exploration of the numerous poorly understood members of this large superfamily of enzymes.
Collapse
Affiliation(s)
- A Maxwell Burroughs
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | | | | |
Collapse
|
35
|
Abstract
MOTIVATION Function inference from structure is facilitated by the use of patterns of residues (3D motifs), normally identified by expert knowledge, that correlate with function. As an alternative to often limited expert knowledge, we use machine-learning techniques to identify patterns of 3-10 residues that maximize function prediction. This approach allows us to test the assumption that residues that provide function are the most informative for predicting function. RESULTS We apply our method, GASPS, to the haloacid dehalogenase, enolase, amidohydrolase and crotonase superfamilies and to the serine proteases. The motifs found by GASPS are as good at function prediction as 3D motifs based on expert knowledge. The GASPS motifs with the greatest ability to predict protein function consist mainly of known functional residues. However, several residues with no known functional role are equally predictive. For four groups, we show that the predictive power of our 3D motifs is comparable with or better than approaches that use the entire fold (Combinatorial-Extension) or sequence profiles (PSI-BLAST). AVAILABILITY Source code is freely available for academic use by contacting the authors. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Benjamin J Polacco
- Department of Biopharmaceutical Sciences, University of California, San Francisco, 94143-2250, USA
| | | |
Collapse
|
36
|
Tseng YY, Liang J. Automated method for predicting enzyme functional surfaces and locating key residues with accuracy and specificity. CONFERENCE PROCEEDINGS : ... ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL CONFERENCE 2006; 2006:4552-4555. [PMID: 17947099 DOI: 10.1109/iembs.2006.259540] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Locating functionally important protein surfaces and identifying the catalytic site residues are critical for studying enzyme functions. Here, we present methods for predicting and characterizing catalytic sites of enzymes at atomic level that is fold-independent. By extract atomic patterns of catalytic residues in surface pockets computed geometrically, we develop a library of atomic patterns on protein functional surfaces of ca 700 structures. Together with propensities of secondary structures and residue occurrence in active sites, we develop methods to identify functionally important surfaces on protein structures and to locate key residues. We discuss application of our methods to amylase, dioxygenase, deaminase, dehalogenase, and hydratase. A large scale cross-validated prediction study shows that our method is sensitive and specific.
Collapse
Affiliation(s)
- Yan Yuan Tseng
- Dept. of Bioeng., Illinois Univ., Chicago, IL 60607-7052, USA
| | | |
Collapse
|
37
|
Torrance JW, Bartlett GJ, Porter CT, Thornton JM. Using a library of structural templates to recognise catalytic sites and explore their evolution in homologous families. J Mol Biol 2005; 347:565-81. [PMID: 15755451 DOI: 10.1016/j.jmb.2005.01.044] [Citation(s) in RCA: 92] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2004] [Revised: 01/13/2005] [Accepted: 01/19/2005] [Indexed: 11/20/2022]
Abstract
Catalytic site structure is normally highly conserved between distantly related enzymes. As a consequence, templates representing catalytic sites have the potential to succeed at function prediction in cases where methods based on sequence or overall structure fail. There are many methods for searching protein structures for matches to structural templates, but few validated template libraries to use with these methods. We present a library of structural templates representing catalytic sites, based on information from the scientific literature. Furthermore, we analyse homologous template families to discover the diversity within families and the utility of templates for active site recognition. Templates representing the catalytic sites of homologous proteins mostly differ by less than 1A root mean square deviation, even when the sequence similarity between the two proteins is low. Within these sets of homologues there is usually no discernible relationship between catalytic site structure similarity and sequence similarity. Because of this structural conservation of catalytic sites, the templates can discriminate between matches to related proteins and random matches with over 85% sensitivity and predictive accuracy. Templates based on protein backbone positions are more discriminating than those based on side-chain atoms. These analyses show encouraging prospects for prediction of functional sites in structural genomics structures of unknown function, and will be of use in analyses of convergent evolution and exploring relationships between active site geometry and chemistry. The template library can be queried via a web server at and is available for download.
Collapse
Affiliation(s)
- James W Torrance
- EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | | | |
Collapse
|
38
|
Van Lanen SG, Reader JS, Swairjo MA, de Crécy-Lagard V, Lee B, Iwata-Reuyl D. From cyclohydrolase to oxidoreductase: discovery of nitrile reductase activity in a common fold. Proc Natl Acad Sci U S A 2005; 102:4264-9. [PMID: 15767583 PMCID: PMC555470 DOI: 10.1073/pnas.0408056102] [Citation(s) in RCA: 92] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2004] [Indexed: 11/18/2022] Open
Abstract
The enzyme YkvM from Bacillus subtilis was identified previously along with three other enzymes (YkvJKL) in a bioinformatics search for enzymes involved in the biosynthesis of queuosine, a 7-deazaguanine modified nucleoside found in tRNA(GUN) of Bacteria and Eukarya. Genetic analysis of ykvJKLM mutants in Acinetobacter confirmed that each was essential for queuosine biosynthesis, and the genes were renamed queCDEF. QueF exhibits significant homology to the type I GTP cyclohydrolases characterized by FolE. Given that GTP is the precursor to queuosine and that a cyclohydrolase-like reaction was postulated as the initial step in queuosine biosynthesis, QueF was proposed to be the putative cyclohydrolase-like enzyme responsible for this reaction. We have cloned the queF genes from B. subtilis and Escherichia coli and characterized the recombinant enzymes. Contrary to the predictions based on sequence analysis, we discovered that the enzymes, in fact, catalyze a mechanistically unrelated reaction, the NADPH-dependent reduction of 7-cyano-7-deazaguanineto7-aminomethyl-7-deazaguanine, a late step in the biosynthesis of queuosine. We report here in vitro and in vivo studies that demonstrate this catalytic activity, as well as preliminary biochemical and bioinformatics analysis that provide insight into the structure of this family of enzymes.
Collapse
Affiliation(s)
- Steven G Van Lanen
- Department of Chemistry, Portland State University, P.O. Box 751, Portland, OR 97207, USA
| | | | | | | | | | | |
Collapse
|