1
|
Prabakaran R, Bromberg Y. Functional profiling of the sequence stockpile: a protein pair-based assessment of in silico prediction tools. Bioinformatics 2025; 41:btaf035. [PMID: 39854283 PMCID: PMC11821270 DOI: 10.1093/bioinformatics/btaf035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Revised: 11/04/2024] [Accepted: 01/22/2025] [Indexed: 01/26/2025] Open
Abstract
MOTIVATION In silico functional annotation of proteins is crucial to narrowing the sequencing-accelerated gap in our understanding of protein activities. Numerous function annotation methods exist, and their ranks have been growing, particularly so with the recent deep learning-based developments. However, it is unclear if these tools are truly predictive. As we are not aware of any methods that can identify new terms in functional ontologies, we ask if they can, at least, identify molecular functions of proteins that are non-homologous to or far-removed from known protein families. RESULTS Here, we explore the potential and limitations of the existing methods in predicting the molecular functions of thousands of such proteins. Lacking the "ground truth" functional annotations, we transformed the assessment of function prediction into evaluation of functional similarity of protein pairs that likely share function but are unlike any of the currently functionally annotated sequences. Notably, our approach transcends the limitations of functional annotation vocabularies, providing a means to assess different-ontology annotation methods. We find that most existing methods are limited to identifying functional similarity of homologous sequences and fail to predict the function of proteins lacking reference. Curiously, despite their seemingly unlimited by-homology scope, deep learning methods also have trouble capturing the functional signal encoded in protein sequence. We believe that our work will inspire the development of a new generation of methods that push boundaries and promote exploration and discovery in the molecular function domain. AVAILABILITY AND IMPLEMENTATION The data underlying this article are available at https://doi.org/10.6084/m9.figshare.c.6737127.v3. The code used to compute siblings is available openly at https://bitbucket.org/bromberglab/siblings-detector/.
Collapse
Affiliation(s)
- R Prabakaran
- Department of Biology, Emory University, Atlanta, GA 30322, United States
- Department of Computer Science, Emory University, Atlanta, GA 30322, United States
| | - Yana Bromberg
- Department of Biology, Emory University, Atlanta, GA 30322, United States
- Department of Computer Science, Emory University, Atlanta, GA 30322, United States
| |
Collapse
|
2
|
Role and Application of Biocatalysts in Cancer Drug Discovery. Catalysts 2023. [DOI: 10.3390/catal13020250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
A biocatalyst is an enzyme that speeds up or slows down the rate at which a chemical reaction occurs and speeds up certain processes by 108 times. It is used as an anticancer agent because it targets drug activation inside the tumor microenvironment while limiting damage to healthy cells. Biocatalysts have been used for the synthesis of different heterocyclic compounds and is also used in the nano drug delivery systems. The use of nano-biocatalysts for tumor-targeted delivery not only aids in tumor invasion, angiogenesis, and mutagenesis, but also provides information on the expression and activity of many markers related to the microenvironment. Iosmapinol, moclobemide, cinepazide, lysine dioxygenase, epothilone, 1-homophenylalanine, and many more are only some of the anticancer medicines that have been synthesised using biocatalysts. In this review, we have highlighted the application of biocatalysts in cancer therapies as well as the use of biocatalysts in the synthesis of drugs and drug-delivery systems in the tumor microenvironment.
Collapse
|
3
|
MohammadiPeyhani H, Chiappino-Pepe A, Haddadi K, Hafner J, Hadadi N, Hatzimanikatis V. NICEdrug.ch, a workflow for rational drug design and systems-level analysis of drug metabolism. eLife 2021; 10:e65543. [PMID: 34340747 PMCID: PMC8331181 DOI: 10.7554/elife.65543] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Accepted: 07/07/2021] [Indexed: 12/30/2022] Open
Abstract
The discovery of a drug requires over a decade of intensive research and financial investments - and still has a high risk of failure. To reduce this burden, we developed the NICEdrug.ch resource, which incorporates 250,000 bioactive molecules, and studied their enzymatic metabolic targets, fate, and toxicity. NICEdrug.ch includes a unique fingerprint that identifies reactive similarities between drug-drug and drug-metabolite pairs. We validated the application, scope, and performance of NICEdrug.ch over similar methods in the field on golden standard datasets describing drugs and metabolites sharing reactivity, drug toxicities, and drug targets. We use NICEdrug.ch to evaluate inhibition and toxicity by the anticancer drug 5-fluorouracil, and suggest avenues to alleviate its side effects. We propose shikimate 3-phosphate for targeting liver-stage malaria with minimal impact on the human host cell. Finally, NICEdrug.ch suggests over 1300 candidate drugs and food molecules to target COVID-19 and explains their inhibitory mechanism for further experimental screening. The NICEdrug.ch database is accessible online to systematically identify the reactivity of small molecules and druggable enzymes with practical applications in lead discovery and drug repurposing.
Collapse
Affiliation(s)
- Homa MohammadiPeyhani
- Laboratory of Computational Systems Biotechnology, École Polytechnique Fédérale de Lausanne, EPFLLausanneSwitzerland
| | - Anush Chiappino-Pepe
- Laboratory of Computational Systems Biotechnology, École Polytechnique Fédérale de Lausanne, EPFLLausanneSwitzerland
| | - Kiandokht Haddadi
- Laboratory of Computational Systems Biotechnology, École Polytechnique Fédérale de Lausanne, EPFLLausanneSwitzerland
| | - Jasmin Hafner
- Laboratory of Computational Systems Biotechnology, École Polytechnique Fédérale de Lausanne, EPFLLausanneSwitzerland
| | - Noushin Hadadi
- Laboratory of Computational Systems Biotechnology, École Polytechnique Fédérale de Lausanne, EPFLLausanneSwitzerland
| | - Vassily Hatzimanikatis
- Laboratory of Computational Systems Biotechnology, École Polytechnique Fédérale de Lausanne, EPFLLausanneSwitzerland
| |
Collapse
|
4
|
Mitchell JB. Enzyme function and its evolution. Curr Opin Struct Biol 2017; 47:151-156. [PMID: 29107208 DOI: 10.1016/j.sbi.2017.10.004] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2017] [Revised: 08/29/2017] [Accepted: 10/02/2017] [Indexed: 01/10/2023]
Abstract
With rapid increases over recent years in the determination of protein sequence and structure, alongside knowledge of thousands of enzyme functions and hundreds of chemical mechanisms, it is now possible to combine breadth and depth in our understanding of enzyme evolution. Phylogenetics continues to move forward, though determining correct evolutionary family trees is not trivial. Protein function prediction has spawned a variety of promising methods that offer the prospect of identifying enzymes across the whole range of chemical functions and over numerous species. This knowledge is essential to understand antibiotic resistance, as well as in protein re-engineering and de novo enzyme design.
Collapse
Affiliation(s)
- John Bo Mitchell
- EaStCHEM School of Chemistry and Biomedical Sciences Research Complex, University of St Andrews, North Haugh, St Andrews, Scotland KY16 9ST, United Kingdom
| |
Collapse
|
5
|
Piergiorge RM, de Miranda AB, Guimarães AC, Catanho M. Functional Analogy in Human Metabolism: Enzymes with Different Biological Roles or Functional Redundancy? Genome Biol Evol 2017; 9:1624-1636. [PMID: 28854631 PMCID: PMC5737724 DOI: 10.1093/gbe/evx119] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/04/2017] [Indexed: 12/12/2022] Open
Abstract
Since enzymes catalyze almost all chemical reactions that occur in living organisms, it is crucial that genes encoding such activities are correctly identified and functionally characterized. Several studies suggest that the fraction of enzymatic activities in which multiple events of independent origin have taken place during evolution is substantial. However, this topic is still poorly explored, and a comprehensive investigation of the occurrence, distribution, and implications of these events has not been done so far. Fundamental questions, such as how analogous enzymes originate, why so many events of independent origin have apparently occurred during evolution, and what are the reasons for the coexistence in the same organism of distinct enzymatic forms catalyzing the same reaction, remain unanswered. Also, several isofunctional enzymes are still not recognized as nonhomologous, even with substantial evidence indicating different evolutionary histories. In this work, we begin to investigate the biological significance of the cooccurrence of nonhomologous isofunctional enzymes in human metabolism, characterizing functional analogous enzymes identified in metabolic pathways annotated in the human genome. Our hypothesis is that the coexistence of multiple enzymatic forms might not be interpreted as functional redundancy. Instead, these enzymatic forms may be implicated in distinct (and probably relevant) biological roles.
Collapse
Affiliation(s)
- Rafael Mina Piergiorge
- Laboratório de Genômica Funcional e Bioinformática, Fiocruz, Instituto Oswaldo Cruz, Manguinhos, Rio de Janeiro, Brazil
| | - Antonio Basílio de Miranda
- Laboratório de Biologia Computacional e Sistemas, Fiocruz, Instituto Oswaldo Cruz, Manguinhos, Rio de Janeiro, Brazil
| | - Ana Carolina Guimarães
- Laboratório de Genômica Funcional e Bioinformática, Fiocruz, Instituto Oswaldo Cruz, Manguinhos, Rio de Janeiro, Brazil
| | - Marcos Catanho
- Laboratório de Genômica Funcional e Bioinformática, Fiocruz, Instituto Oswaldo Cruz, Manguinhos, Rio de Janeiro, Brazil
| |
Collapse
|
6
|
Mudgal R, Srinivasan N, Chandra N. Resolving protein structure-function-binding site relationships from a binding site similarity network perspective. Proteins 2017; 85:1319-1335. [PMID: 28342236 DOI: 10.1002/prot.25293] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2016] [Revised: 03/18/2017] [Accepted: 03/20/2017] [Indexed: 11/05/2022]
Abstract
Functional annotation is seldom straightforward with complexities arising due to functional divergence in protein families or functional convergence between non-homologous protein families, leading to mis-annotations. An enzyme may contain multiple domains and not all domains may be involved in a given function, adding to the complexity in function annotation. To address this, we use binding site information from bound cognate ligands and catalytic residues, since it can help in resolving fold-function relationships at a finer level and with higher confidence. A comprehensive database of 2,020 fold-function-binding site relationships has been systematically generated. A network-based approach is employed to capture the complexity in these relationships, from which different types of associations are deciphered, that identify versatile protein folds performing diverse functions, same function associated with multiple folds and one-to-one relationships. Binding site similarity networks integrated with fold, function, and ligand similarity information are generated to understand the depth of these relationships. Apart from the observed continuity in the functional site space, network properties of these revealed versatile families with topologically different or dissimilar binding sites and structural families that perform very similar functions. As a case study, subtle changes in the active site of a set of evolutionarily related superfamilies are studied using these networks. Tracing of such similarities in evolutionarily related proteins provide clues into the transition and evolution of protein functions. Insights from this study will be helpful in accurate and reliable functional annotations of uncharacterized proteins, poly-pharmacology, and designing enzymes with new functional capabilities. Proteins 2017; 85:1319-1335. © 2017 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Richa Mudgal
- IISc Mathematics Initiative, Indian Institute of Science, Bangalore, Karnataka, 560 012, India
| | | | - Nagasuma Chandra
- Department of Biochemistry, Indian Institute of Science, Bangalore, Karnataka, 560 012, India
| |
Collapse
|
7
|
Beattie KE, De Ferrari L, Mitchell JBO. Why do Sequence Signatures Predict Enzyme Mechanism? Homology versus Chemistry. Evol Bioinform Online 2015; 11:267-74. [PMID: 26740739 PMCID: PMC4696837 DOI: 10.4137/ebo.s31482] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2015] [Revised: 11/04/2015] [Accepted: 11/08/2015] [Indexed: 01/25/2023] Open
Abstract
First, we identify InterPro sequence signatures representing evolutionary relatedness and, second, signatures identifying specific chemical machinery. Thus, we predict the chemical mechanisms of enzyme-catalyzed reactions from catalytic and non-catalytic subsets of InterPro signatures. We first scanned our 249 sequences using InterProScan and then used the MACiE database to identify those amino acid residues that are important for catalysis. The sequences were mutated in silico to replace these catalytic residues with glycine and then again scanned using InterProScan. Those signature matches from the original scan that disappeared on mutation were called catalytic. Mechanism was predicted using all signatures, only the 78 “catalytic” signatures, or only the 519 “non-catalytic” signatures. The non-catalytic signatures gave indistinguishable results from those for the whole feature set, with precision of 0.991 and sensitivity of 0.970. The catalytic signatures alone gave less impressive predictivity, with precision and sensitivity of 0.791 and 0.735, respectively. These results show that our successful prediction of enzyme mechanism is mostly by homology rather than by identifying catalytic machinery.
Collapse
Affiliation(s)
- Kirsten E Beattie
- Biomedical Sciences Research Complex and EaStCHEM School of Chemistry, Purdie Building, University of St Andrews, North Haugh, St Andrews, Scotland, UK
| | - Luna De Ferrari
- Biomedical Sciences Research Complex and EaStCHEM School of Chemistry, Purdie Building, University of St Andrews, North Haugh, St Andrews, Scotland, UK
| | - John B O Mitchell
- Biomedical Sciences Research Complex and EaStCHEM School of Chemistry, Purdie Building, University of St Andrews, North Haugh, St Andrews, Scotland, UK
| |
Collapse
|
8
|
Maghawry HA, Mostafa MGM, Gharib TF. A new protein structure representation for efficient protein function prediction. J Comput Biol 2015; 21:936-46. [PMID: 25343279 DOI: 10.1089/cmb.2014.0137] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
One of the challenging problems in bioinformatics is the prediction of protein function. Protein function is the main key that can be used to classify different proteins. Protein function can be inferred experimentally with very small throughput or computationally with very high throughput. Computational methods are sequence based or structure based. Structure-based methods produce more accurate protein function prediction. In this article, we propose a new protein structure representation for efficient protein function prediction. The representation is based on three-dimensional patterns of protein residues. In the analysis, we used protein function based on enzyme activity through six mechanistically diverse enzyme superfamilies: amidohydrolase, crotonase, haloacid dehalogenase, isoprenoid synthase type I, and vicinal oxygen chelate. We applied three different classification methods, naïve Bayes, k-nearest neighbors, and random forest, to predict the enzyme superfamily of a given protein. The prediction accuracy using the proposed representation outperforms a recently introduced representation method that is based only on the distance patterns. The results show that the proposed representation achieved prediction accuracy up to 98%, with improvement of about 10% on average.
Collapse
Affiliation(s)
- Huda A Maghawry
- 1 Department of Information Systems, Faculty of Computer and Information Sciences, Ain Shams University , Cairo, Egypt
| | | | | |
Collapse
|
9
|
Martínez Cuesta S, Rahman SA, Furnham N, Thornton JM. The Classification and Evolution of Enzyme Function. Biophys J 2015; 109:1082-6. [PMID: 25986631 DOI: 10.1016/j.bpj.2015.04.020] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2015] [Revised: 04/16/2015] [Accepted: 04/17/2015] [Indexed: 11/30/2022] Open
Abstract
Enzymes are the proteins responsible for the catalysis of life. Enzymes sharing a common ancestor as defined by sequence and structure similarity are grouped into families and superfamilies. The molecular function of enzymes is defined as their ability to catalyze biochemical reactions; it is manually classified by the Enzyme Commission and robust approaches to quantitatively compare catalytic reactions are just beginning to appear. Here, we present an overview of studies at the interface of the evolution and function of enzymes.
Collapse
Affiliation(s)
- Sergio Martínez Cuesta
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Syed Asad Rahman
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Nicholas Furnham
- Department of Pathogen Molecular Biology, London School of Hygiene & Tropical Medicine, London, United Kingdom
| | - Janet M Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.
| |
Collapse
|
10
|
Martinez Cuesta S, Furnham N, Rahman SA, Sillitoe I, Thornton JM. The evolution of enzyme function in the isomerases. Curr Opin Struct Biol 2014; 26:121-30. [PMID: 25000289 PMCID: PMC4139412 DOI: 10.1016/j.sbi.2014.06.002] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2014] [Revised: 06/02/2014] [Accepted: 06/10/2014] [Indexed: 01/14/2023]
Abstract
The advent of computational approaches to measure functional similarity between enzymes adds a new dimension to existing evolutionary studies based on sequence and structure. This paper reviews research efforts aiming to understand the evolution of enzyme function in superfamilies, presenting a novel strategy to provide an overview of the evolution of enzymes belonging to an individual EC class, using the isomerases as an exemplar.
Collapse
Affiliation(s)
- Sergio Martinez Cuesta
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom.
| | - Nicholas Furnham
- Department of Pathogen Molecular Biology, London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, United Kingdom
| | - Syed Asad Rahman
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London, WC1E 6BT, United Kingdom
| | - Janet M Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom.
| |
Collapse
|
11
|
Sorokina M, Stam M, Médigue C, Lespinet O, Vallenet D. Profiling the orphan enzymes. Biol Direct 2014; 9:10. [PMID: 24906382 PMCID: PMC4084501 DOI: 10.1186/1745-6150-9-10] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2014] [Accepted: 05/29/2014] [Indexed: 11/10/2022] Open
Abstract
The emergence of Next Generation Sequencing generates an incredible amount of sequence and great potential for new enzyme discovery. Despite this huge amount of data and the profusion of bioinformatic methods for function prediction, a large part of known enzyme activities is still lacking an associated protein sequence. These particular activities are called "orphan enzymes". The present review proposes an update of previous surveys on orphan enzymes by mining the current content of public databases. While the percentage of orphan enzyme activities has decreased from 38% to 22% in ten years, there are still more than 1,000 orphans among the 5,000 entries of the Enzyme Commission (EC) classification. Taking into account all the reactions present in metabolic databases, this proportion dramatically increases to reach nearly 50% of orphans and many of them are not associated to a known pathway. We extended our survey to "local orphan enzymes" that are activities which have no representative sequence in a given clade, but have at least one in organisms belonging to other clades. We observe an important bias in Archaea and find that in general more than 30% of the EC activities have incomplete sequence information in at least one superkingdom. To estimate if candidate proteins for local orphans could be retrieved by homology search, we applied a simple strategy based on the PRIAM software and noticed that candidates may be proposed for an important fraction of local orphan enzymes. Finally, by studying relation between protein domains and catalyzed activities, it appears that newly discovered enzymes are mostly associated with already known enzyme domains. Thus, the exploration of the promiscuity and the multifunctional aspect of known enzyme families may solve part of the orphan enzyme issue. We conclude this review with a presentation of recent initiatives in finding proteins for orphan enzymes and in extending the enzyme world by the discovery of new activities.
Collapse
Affiliation(s)
- Maria Sorokina
- Direction des Sciences du Vivant, Commissariat à l'Energie Atomique (CEA), Institut de Génomique, Genoscope, Laboratoire d'Analyses Bioinformatiques pour la Génomique et le Métabolisme, 2 rue Gaston Crémieux, 91057 Evry, France.
| | | | | | | | | |
Collapse
|
12
|
Alderson RG, De Ferrari L, Mavridis L, McDonagh JL, Mitchell JBO, Nath N. Enzyme informatics. Curr Top Med Chem 2014; 12:1911-23. [PMID: 23116471 DOI: 10.2174/156802612804547353] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2012] [Revised: 09/12/2012] [Accepted: 09/15/2012] [Indexed: 12/18/2022]
Abstract
Over the last 50 years, sequencing, structural biology and bioinformatics have completely revolutionised biomolecular science, with millions of sequences and tens of thousands of three dimensional structures becoming available. The bioinformatics of enzymes is well served by, mostly free, online databases. BRENDA describes the chemistry, substrate specificity, kinetics, preparation and biological sources of enzymes, while KEGG is valuable for understanding enzymes and metabolic pathways. EzCatDB, SFLD and MACiE are key repositories for data on the chemical mechanisms by which enzymes operate. At the current rate of genome sequencing and manual annotation, human curation will never finish the functional annotation of the ever-expanding list of known enzymes. Hence there is an increasing need for automated annotation, though it is not yet widespread for enzyme data. In contrast, functional ontologies such as the Gene Ontology already profit from automation. Despite our growing understanding of enzyme structure and dynamics, we are only beginning to be able to design novel enzymes. One can now begin to trace the functional evolution of enzymes using phylogenetics. The ability of enzymes to perform secondary functions, albeit relatively inefficiently, gives clues as to how enzyme function evolves. Substrate promiscuity in enzymes is one example of imperfect specificity in protein-ligand interactions. Similarly, most drugs bind to more than one protein target. This may sometimes result in helpful polypharmacology as a drug modulates plural targets, but also often leads to adverse side-effects. Many chemoinformatics approaches can be used to model the interactions between druglike molecules and proteins in silico. We can even use quantum chemical techniques like DFT and QM/MM to compute the structural and energetic course of enzyme catalysed chemical reaction mechanisms, including a full description of bond making and breaking.
Collapse
Affiliation(s)
- Rosanna G Alderson
- Biomedical Sciences Research Complex and EaStCHEM School of Chemistry, Purdie Building, University of St Andrews, North Haugh, St Andrews, Scotland, UK
| | | | | | | | | | | |
Collapse
|
13
|
Abstract
The amount of known protein structures is continuously growing, exhibited in over 95,000 3D structures freely available via the PDB. Over the last decade, pharmaceutical research has sparked interest in computationally extracting information from this large data pool, resulting in a homology-driven knowledge transfer from annotated to new structures. Studying protein structures with respect to understanding and modulating their functional behavior means analyzing their centers of action. Therefore, the detection and description of potential binding sites on the protein surface is a major step towards protein classification and assessment. Subsequently, these representations can be incorporated to compare proteins, and to predict their druggability or function. Especially in the context of target identification and polypharmacology, automated tools for large-scale target comparisons are highly needed. In this article, developments for automated structure-based target assessment are reviewed and remaining challenges as well as future perspectives are discussed.
Collapse
|
14
|
Wink PL, Sanchez Quitian ZA, Rosado LA, Rodrigues VDS, Petersen GO, Lorenzini DM, Lipinski-Paes T, Saraiva Macedo Timmers LF, de Souza ON, Basso LA, Santos DS. Biochemical characterization of recombinant nucleoside hydrolase from Mycobacterium tuberculosis H37Rv. Arch Biochem Biophys 2013; 538:80-94. [DOI: 10.1016/j.abb.2013.08.011] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2013] [Revised: 08/13/2013] [Accepted: 08/17/2013] [Indexed: 11/25/2022]
|
15
|
Rosado LA, Vasconcelos IB, Palma MS, Frappier V, Najmanovich RJ, Santos DS, Basso LA. The mode of action of recombinant Mycobacterium tuberculosis shikimate kinase: kinetics and thermodynamics analyses. PLoS One 2013; 8:e61918. [PMID: 23671579 PMCID: PMC3646032 DOI: 10.1371/journal.pone.0061918] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2012] [Accepted: 03/14/2013] [Indexed: 12/03/2022] Open
Abstract
Tuberculosis remains as one of the main cause of mortality worldwide due to a single infectious agent, Mycobacterium tuberculosis. The aroK-encoded M. tuberculosis Shikimate Kinase (MtSK), shown to be essential for survival of bacilli, catalyzes the phosphoryl transfer from ATP to the carbon-3 hydroxyl group of shikimate (SKH), yielding shikimate-3-phosphate and ADP. Here we present purification to homogeneity, and oligomeric state determination of recombinant MtSK. Biochemical and biophysical data suggest that the chemical reaction catalyzed by monomeric MtSK follows a rapid-equilibrium random order of substrate binding, and ordered product release. Isothermal titration calorimetry (ITC) for binding of ligands to MtSK provided thermodynamic signatures of non-covalent interactions to each process. A comparison of steady-state kinetics parameters and equilibrium dissociation constant value determined by ITC showed that ATP binding does not increase the affinity of MtSK for SKH. We suggest that MtSK would more appropriately be described as an aroL-encoded type II shikimate kinase. Our manuscript also gives thermodynamic description of SKH binding to MtSK and data for the number of protons exchanged during this bimolecular interaction. The negative value for the change in constant pressure heat capacity (ΔCp) and molecular homology model building suggest a pronounced contribution of desolvation of non-polar groups upon binary complex formation. Thermodynamic parameters were deconvoluted into hydrophobic and vibrational contributions upon MtSK:SKH binary complex formation. Data for the number of protons exchanged during this bimolecular interaction are interpreted in light of a structural model to try to propose the likely amino acid side chains that are the proton donors to bulk solvent following MtSK:SKH complex formation.
Collapse
Affiliation(s)
- Leonardo Astolfi Rosado
- Centro de Pesquisas em Biologia Molecular e Funcional (CPBMF), Instituto Nacional de Ciência e Tecnologia em Tuberculose (INCT-TB), Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS), Porto Alegre, RS, Brazil
- Programa de Pós-Graduação em Biologia Celular e Molecular, PUCRS, Porto Alegre, RS, Brazil
| | - Igor Bordin Vasconcelos
- Centro de Pesquisas em Biologia Molecular e Funcional (CPBMF), Instituto Nacional de Ciência e Tecnologia em Tuberculose (INCT-TB), Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS), Porto Alegre, RS, Brazil
- Programa de Pós-Graduação em Biologia Celular e Molecular, PUCRS, Porto Alegre, RS, Brazil
| | - Mário Sérgio Palma
- Laboratório de Biologia Estrutural e Zooquímica, Centro de Estudos de Insetos Sociais, Departamento de Biologia, Instituto de Biociências de Rio Claro, Universidade Estadual Paulista (UNESP), Rio Claro, SP, Brazil
| | - Vincent Frappier
- Department of Biochemistry, Faculty of Medicine, Université de Sherbrooke, Sherbrooke, Québec, Canada
| | - Rafael Josef Najmanovich
- Department of Biochemistry, Faculty of Medicine, Université de Sherbrooke, Sherbrooke, Québec, Canada
| | - Diógenes Santiago Santos
- Centro de Pesquisas em Biologia Molecular e Funcional (CPBMF), Instituto Nacional de Ciência e Tecnologia em Tuberculose (INCT-TB), Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS), Porto Alegre, RS, Brazil
- Programa de Pós-Graduação em Medicina e Ciências da Saúde, PUCRS, Porto Alegre, RS, Brazil
- Programa de Pós-Graduação em Biologia Celular e Molecular, PUCRS, Porto Alegre, RS, Brazil
| | - Luiz Augusto Basso
- Centro de Pesquisas em Biologia Molecular e Funcional (CPBMF), Instituto Nacional de Ciência e Tecnologia em Tuberculose (INCT-TB), Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS), Porto Alegre, RS, Brazil
- Programa de Pós-Graduação em Medicina e Ciências da Saúde, PUCRS, Porto Alegre, RS, Brazil
- Programa de Pós-Graduação em Biologia Celular e Molecular, PUCRS, Porto Alegre, RS, Brazil
| |
Collapse
|
16
|
Volkamer A, Kuhn D, Rippmann F, Rarey M. Predicting enzymatic function from global binding site descriptors. Proteins 2012; 81:479-89. [DOI: 10.1002/prot.24205] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2012] [Revised: 09/21/2012] [Accepted: 10/11/2012] [Indexed: 11/09/2022]
|
17
|
Nath N, Mitchell JBO. Is EC class predictable from reaction mechanism? BMC Bioinformatics 2012; 13:60. [PMID: 22530800 PMCID: PMC3368749 DOI: 10.1186/1471-2105-13-60] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2012] [Accepted: 04/24/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND We investigate the relationships between the EC (Enzyme Commission) class, the associated chemical reaction, and the reaction mechanism by building predictive models using Support Vector Machine (SVM), Random Forest (RF) and k-Nearest Neighbours (kNN). We consider two ways of encoding the reaction mechanism in descriptors, and also three approaches that encode only the overall chemical reaction. Both cross-validation and also an external test set are used. RESULTS The three descriptor sets encoding overall chemical transformation perform better than the two descriptions of mechanism. SVM and RF models perform comparably well; kNN is less successful. Oxidoreductases and hydrolases are relatively well predicted by all types of descriptor; isomerases are well predicted by overall reaction descriptors but not by mechanistic ones. CONCLUSIONS Our results suggest that pairs of similar enzyme reactions tend to proceed by different mechanisms. Oxidoreductases, hydrolases, and to some extent isomerases and ligases, have clear chemical signatures, making them easier to predict than transferases and lyases. We find evidence that isomerases as a class are notably mechanistically diverse and that their one shared property, of substrate and product being isomers, can arise in various unrelated ways.The performance of the different machine learning algorithms is in line with many cheminformatics applications, with SVM and RF being roughly equally effective. kNN is less successful, given the role that non-local information plays in successful classification. We note also that, despite a lack of clarity in the literature, EC number prediction is not a single problem; the challenge of predicting protein function from available sequence data is quite different from assigning an EC classification from a cheminformatics representation of a reaction.
Collapse
Affiliation(s)
- Neetika Nath
- Biomedical Sciences Research Complex and EaStCHEM School of Chemistry, Purdie Building, University of St Andrews, North Haugh, St Andrews, Scotland KY16 9ST, UK
| | | |
Collapse
|
18
|
Furnham N, Sillitoe I, Holliday GL, Cuff AL, Laskowski RA, Orengo CA, Thornton JM. Exploring the evolution of novel enzyme functions within structurally defined protein superfamilies. PLoS Comput Biol 2012; 8:e1002403. [PMID: 22396634 PMCID: PMC3291543 DOI: 10.1371/journal.pcbi.1002403] [Citation(s) in RCA: 69] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2011] [Accepted: 01/09/2012] [Indexed: 11/18/2022] Open
Abstract
In order to understand the evolution of enzyme reactions and to gain an overview of biological catalysis we have combined sequence and structural data to generate phylogenetic trees in an analysis of 276 structurally defined enzyme superfamilies, and used these to study how enzyme functions have evolved. We describe in detail the analysis of two superfamilies to illustrate different paradigms of enzyme evolution. Gathering together data from all the superfamilies supports and develops the observation that they have all evolved to act on a diverse set of substrates, whilst the evolution of new chemistry is much less common. Despite that, by bringing together so much data, we can provide a comprehensive overview of the most common and rare types of changes in function. Our analysis demonstrates on a larger scale than previously studied, that modifications in overall chemistry still occur, with all possible changes at the primary level of the Enzyme Commission (E.C.) classification observed to a greater or lesser extent. The phylogenetic trees map out the evolutionary route taken within a superfamily, as well as all the possible changes within a superfamily. This has been used to generate a matrix of observed exchanges from one enzyme function to another, revealing the scale and nature of enzyme evolution and that some types of exchanges between and within E.C. classes are more prevalent than others. Surprisingly a large proportion (71%) of all known enzyme functions are performed by this relatively small set of 276 superfamilies. This reinforces the hypothesis that relatively few ancient enzymatic domain superfamilies were progenitors for most of the chemistry required for life. Enzymes, as biological catalysts, are crucial to life. Understanding how enzymes have evolved to perform the wide variety of reactions found across all kingdoms of life is fundamental to a broad range of biological studies, especially those leading to new therapeutics. To unravel the evolution of novel enzyme function requires combining information on protein structure, sequence, phylogeny and chemistry (in terms of interacting small molecules and reaction mechanisms). We have developed a protocol for integrating this wide range of data, which we have applied to a relatively large number of families comprising some very diverse relatives. This has permitted us to present an initial overview of the evolution of novel enzyme functions, in which we observe that some changes in function between relatives are more common than others, with most of the functionality observed in nature confined to relatively few families. Moreover, we are able to identify the evolutionary route taken within a superfamily to change the enzyme function from one reaction to another. This information may help in predicting the function of an enzyme that has yet to be experimentally characterised as well as in designing new enzymes for industrial and medical purposes.
Collapse
Affiliation(s)
- Nicholas Furnham
- EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.
| | | | | | | | | | | | | |
Collapse
|
19
|
Holliday GL, Andreini C, Fischer JD, Rahman SA, Almonacid DE, Williams ST, Pearson WR. MACiE: exploring the diversity of biochemical reactions. Nucleic Acids Res 2011; 40:D783-9. [PMID: 22058127 PMCID: PMC3244993 DOI: 10.1093/nar/gkr799] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
MACiE (which stands for Mechanism, Annotation and Classification in Enzymes) is a database of enzyme reaction mechanisms, and can be accessed from http://www.ebi.ac.uk/thornton-srv/databases/MACiE/. This article presents the release of Version 3 of MACiE, which not only extends the dataset to 335 entries, covering 182 of the EC sub-subclasses with a crystal structure available (∼90%), but also incorporates greater chemical and structural detail. This version of MACiE represents a shift in emphasis for new entries, from non-homologous representatives covering EC reaction space to enzymes with mechanisms of interest to our users and collaborators with a view to exploring the chemical diversity of life. We present new tools for exploring the data in MACiE and comparing entries as well as new analyses of the data and new searches, many of which can now be accessed via dedicated Perl scripts.
Collapse
Affiliation(s)
- Gemma L Holliday
- EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | | | | | | | | | |
Collapse
|
20
|
Holliday GL, Fischer JD, Mitchell JBO, Thornton JM. Characterizing the complexity of enzymes on the basis of their mechanisms and structures with a bio-computational analysis. FEBS J 2011; 278:3835-45. [PMID: 21605342 PMCID: PMC3258480 DOI: 10.1111/j.1742-4658.2011.08190.x] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Enzymes are basically composed of 20 naturally occurring amino acids, yet they catalyse a dizzying array of chemical reactions, with regiospecificity and stereospecificity and under physiological conditions. In this review, we attempt to gain some understanding of these complex proteins, from the chemical versatility of the catalytic toolkit, including the use of cofactors (both metal ions and organic molecules), to the complex mapping of reactions to proteins (which is rarely one-to-one), and finally the structural complexity of enzymes and their active sites, often involving multidomain or multisubunit assemblies. This work highlights how the enzymes that we see today reflect millions of years of evolution, involving de novo design followed by exquisite regulation and modulation to create optimal fitness for life.
Collapse
|
21
|
Almonacid DE, Babbitt PC. Toward mechanistic classification of enzyme functions. Curr Opin Chem Biol 2011; 15:435-42. [PMID: 21489855 PMCID: PMC3551611 DOI: 10.1016/j.cbpa.2011.03.008] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2011] [Accepted: 03/17/2011] [Indexed: 11/15/2022]
Abstract
Classification of enzyme function should be quantitative, computationally accessible, and informed by sequences and structures to enable use of genomic information for functional inference and other applications. Large-scale studies have established that divergently evolved enzymes share conserved elements of structure and common mechanistic steps and that convergently evolved enzymes often converge to similar mechanisms too, suggesting that reaction mechanisms could be used to develop finer-grained functional descriptions than provided by the Enzyme Commission (EC) system currently in use. Here we describe how evolution informs these structure-function mappings and review the databases that store mechanisms of enzyme reactions along with recent developments to measure ligand and mechanistic similarities. Together, these provide a foundation for new classifications of enzyme function.
Collapse
Affiliation(s)
- Daniel E. Almonacid
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, 1700 4th Street, MC 2550, San Francisco, CA 94158, USA;
- Department of Pharmaceutical Chemistry, University of California San Francisco, 600 16 Street, MC 2240, San Francisco, CA 94158, USA; Telephone: +1 (415) 476-3784; Fax: +1 (415) 514-9656;
- California Institute for Quantitative Biosciences, University of California San Francisco
| | - Patricia C. Babbitt
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, 1700 4th Street, MC 2550, San Francisco, CA 94158, USA;
- Department of Pharmaceutical Chemistry, University of California San Francisco, 600 16 Street, MC 2240, San Francisco, CA 94158, USA; Telephone: +1 (415) 476-3784; Fax: +1 (415) 514-9656;
- California Institute for Quantitative Biosciences, University of California San Francisco
| |
Collapse
|
22
|
Erdin S, Lisewski AM, Lichtarge O. Protein function prediction: towards integration of similarity metrics. Curr Opin Struct Biol 2011; 21:180-8. [PMID: 21353529 PMCID: PMC3120633 DOI: 10.1016/j.sbi.2011.02.001] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2011] [Accepted: 02/03/2011] [Indexed: 11/16/2022]
Abstract
Genomic centers discover increasingly many protein sequences and structures, but not necessarily their full biological functions. Thus, currently, less than one percent of proteins have experimentally verified biochemical activities. To fill this gap, function prediction algorithms apply metrics of similarity between proteins on the premise that those sufficiently alike in sequence, or structure, will perform identical functions. Although high sensitivity is elusive, network analyses that integrate these metrics together hold the promise of rapid gains in function prediction specificity.
Collapse
Affiliation(s)
- Serkan Erdin
- Department of Molecular and Human Genetics, 1 Baylor Plaza, Baylor College of Medicine, Houston, TX 77030, USA
| | - Andreas Martin Lisewski
- Department of Molecular and Human Genetics, 1 Baylor Plaza, Baylor College of Medicine, Houston, TX 77030, USA
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, 1 Baylor Plaza, Baylor College of Medicine, Houston, TX 77030, USA
| |
Collapse
|
23
|
Ullrich A, Rohrschneider M, Scheuermann G, Stadler PF, Flamm C. In silico evolution of early metabolism. ARTIFICIAL LIFE 2011; 17:87-108. [PMID: 21370961 DOI: 10.1162/artl_a_00021] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
We developed a simulation tool for investigating the evolution of early metabolism, allowing us to speculate on the formation of metabolic pathways from catalyzed chemical reactions and on the development of their characteristic properties. Our model consists of a protocellular entity with a simple RNA-based genetic system and an evolving metabolism of catalytically active ribozymes that manipulate a rich underlying chemistry. Ensuring an almost open-ended and fairly realistic simulation is crucial for understanding the first steps in metabolic evolution. We show here how our simulation tool can be helpful in arguing for or against hypotheses on the evolution of metabolic pathways. We demonstrate that seemingly mutually exclusive hypotheses may well be compatible when we take into account that different processes dominate different phases in the evolution of a metabolic system. Our results suggest that forward evolution shapes metabolic network in the very early steps of evolution. In later and more complex stages, enzyme recruitment supersedes forward evolution, keeping a core set of pathways from the early phase.
Collapse
Affiliation(s)
- Alexander Ullrich
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Germany.
| | | | | | | | | |
Collapse
|
24
|
Martinelli LKB, Ducati RG, Rosado LA, Breda A, Selbach BP, Santos DS, Basso LA. Recombinant Escherichia coli GMP reductase: kinetic, catalytic and chemical mechanisms, and thermodynamics of enzyme-ligand binary complex formation. MOLECULAR BIOSYSTEMS 2011; 7:1289-305. [PMID: 21298178 DOI: 10.1039/c0mb00245c] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Guanosine monophosphate (GMP) reductase catalyzes the reductive deamination of GMP to inosine monophosphate (IMP). GMP reductase plays an important role in the conversion of nucleoside and nucleotide derivatives of guanine to adenine nucleotides. In addition, as a member of the purine salvage pathway, it also participates in the reutilization of free intracellular bases. Here we present cloning, expression and purification of Escherichia coli guaC-encoded GMP reductase to determine its kinetic mechanism, as well as chemical and thermodynamic features of this reaction. Initial velocity studies and isothermal titration calorimetry demonstrated that GMP reductase follows an ordered bi-bi kinetic mechanism, in which GMP binds first to the enzyme followed by NADPH binding, and NADP(+) dissociates first followed by IMP release. The isothermal titration calorimetry also showed that GMP and IMP binding are thermodynamically favorable processes. The pH-rate profiles showed groups with apparent pK values of 6.6 and 9.6 involved in catalysis, and pK values of 7.1 and 8.6 important to GMP binding, and a pK value of 6.2 important for NADPH binding. Primary deuterium kinetic isotope effects demonstrated that hydride transfer contributes to the rate-limiting step, whereas solvent kinetic isotope effects arise from a single protonic site that plays a modest role in catalysis. Multiple isotope effects suggest that protonation and hydride transfer steps take place in the same transition state, lending support to a concerted mechanism. Pre-steady-state kinetic data suggest that product release does not contribute to the rate-limiting step of the reaction catalyzed by E. coli GMP reductase.
Collapse
Affiliation(s)
- Leonardo Krás Borges Martinelli
- Centro de Pesquisas em Biologia Molecular e Funcional, Instituto Nacional de Ciência e Tecnologia em Tuberculose, Pontifícia Universidade Católica do Rio Grande do Sul, 6681/92-A Av Ipiranga, 90619-900 Porto Alegre, RS, Brazil
| | | | | | | | | | | | | |
Collapse
|
25
|
Mohammed A, Guda C. Computational Approaches for Automated Classification of Enzyme Sequences. ACTA ACUST UNITED AC 2011; 4:147-152. [PMID: 22114367 DOI: 10.4172/jpb.1000183] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Determining the functional role(s) of enzymes is very important to build the metabolic blueprint of an organism and to identify the potential roles enzymes may play in metabolic and disease pathways. With exponential growth in gene and protein sequence data, it is not feasible to experimentally characterize the function(s) of all enzymes. Alternatively, computational methods can be used to annotate the enormous amount of unannotated enzyme sequences. For function prediction and classification of enzymes, features based on amino acid composition, sequence and structural properties, domain composition and specific peptide information have been widely used by different computational approaches. Each feature space has its own merits and limitations on the overall prediction accuracy. Prediction accuracy improves when machine-learning methods are used to classify enzymes. Given the incomplete and unbalanced nature of annotations in biological databases, ensemble methods or methods that bank on a combination of orthogonal feature are more desirable for achieving higher accuracy and coverage in enzyme classification. In this review article, we systematically describe all the features and methods used thus far for enzyme class prediction. To the authors' knowledge, this review represents the most exhaustive description of methods used for computational prediction of enzyme classes.
Collapse
Affiliation(s)
- Akram Mohammed
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, NE, USA
| | | |
Collapse
|
26
|
Venner E, Lisewski AM, Erdin S, Ward RM, Amin SR, Lichtarge O. Accurate protein structure annotation through competitive diffusion of enzymatic functions over a network of local evolutionary similarities. PLoS One 2010; 5:e14286. [PMID: 21179190 PMCID: PMC3001439 DOI: 10.1371/journal.pone.0014286] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2010] [Accepted: 11/10/2010] [Indexed: 12/24/2022] Open
Abstract
High-throughput Structural Genomics yields many new protein structures without known molecular function. This study aims to uncover these missing annotations by globally comparing select functional residues across the structural proteome. First, Evolutionary Trace Annotation, or ETA, identifies which proteins have local evolutionary and structural features in common; next, these proteins are linked together into a proteomic network of ETA similarities; then, starting from proteins with known functions, competing functional labels diffuse link-by-link over the entire network. Every node is thus assigned a likelihood z-score for every function, and the most significant one at each node wins and defines its annotation. In high-throughput controls, this competitive diffusion process recovered enzyme activity annotations with 99% and 97% accuracy at half-coverage for the third and fourth Enzyme Commission (EC) levels, respectively. This corresponds to false positive rates 4-fold lower than nearest-neighbor and 5-fold lower than sequence-based annotations. In practice, experimental validation of the predicted carboxylesterase activity in a protein from Staphylococcus aureus illustrated the effectiveness of this approach in the context of an increasingly drug-resistant microbe. This study further links molecular function to a small number of evolutionarily important residues recognizable by Evolutionary Tracing and it points to the specificity and sensitivity of functional annotation by competitive global network diffusion. A web server is at http://mammoth.bcm.tmc.edu/networks.
Collapse
Affiliation(s)
- Eric Venner
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas, United States of America
- W. M. Keck Center for Interdisciplinary Bioscience Training, Houston, Texas, United States of America
| | - Andreas Martin Lisewski
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Serkan Erdin
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- W. M. Keck Center for Interdisciplinary Bioscience Training, Houston, Texas, United States of America
| | - R. Matthew Ward
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas, United States of America
- W. M. Keck Center for Interdisciplinary Bioscience Training, Houston, Texas, United States of America
| | - Shivas R. Amin
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas, United States of America
- W. M. Keck Center for Interdisciplinary Bioscience Training, Houston, Texas, United States of America
- * E-mail:
| |
Collapse
|
27
|
Evolution of bacterial phosphoglycerate mutases: non-homologous isofunctional enzymes undergoing gene losses, gains and lateral transfers. PLoS One 2010; 5:e13576. [PMID: 21187861 PMCID: PMC2964296 DOI: 10.1371/journal.pone.0013576] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2010] [Accepted: 09/27/2010] [Indexed: 11/28/2022] Open
Abstract
Background The glycolytic phosphoglycerate mutases exist as non-homologous isofunctional enzymes (NISE) having independent evolutionary origins and no similarity in primary sequence, 3D structure, or catalytic mechanism. Cofactor-dependent PGM (dPGM) requires 2,3-bisphosphoglycerate for activity; cofactor-independent PGM (iPGM) does not. The PGM profile of any given bacterium is unpredictable and some organisms such as Escherichia coli encode both forms. Methods/Principal Findings To examine the distribution of PGM NISE throughout the Bacteria, and gain insight into the evolutionary processes that shape their phyletic profiles, we searched bacterial genome sequences for the presence of dPGM and iPGM. Both forms exhibited patchy distributions throughout the bacterial domain. Species within the same genus, or even strains of the same species, frequently differ in their PGM repertoire. The distribution is further complicated by the common occurrence of dPGM paralogs, while iPGM paralogs are rare. Larger genomes are more likely to accommodate PGM paralogs or both NISE forms. Lateral gene transfers have shaped the PGM profiles with intradomain and interdomain transfers apparent. Archaeal-type iPGM was identified in many bacteria, often as the sole PGM. To address the function of PGM NISE in an organism encoding both forms, we analyzed recombinant enzymes from E. coli. Both NISE were active mutases, but the specific activity of dPGM greatly exceeded that of iPGM, which showed highest activity in the presence of manganese. We created PGM null mutants in E. coli and discovered the ΔdPGM mutant grew slowly due to a delay in exiting stationary phase. Overexpression of dPGM or iPGM overcame this defect. Conclusions/Significance Our biochemical and genetic analyses in E. coli firmly establish dPGM and iPGM as NISE. Metabolic redundancy is indicated since only larger genomes encode both forms. Non-orthologous gene displacement can fully account for the non-uniform PGM distribution we report across the bacterial domain.
Collapse
|
28
|
Chen L, Feng KY, Cai YD, Chou KC, Li HP. Predicting the network of substrate-enzyme-product triads by combining compound similarity and functional domain composition. BMC Bioinformatics 2010; 11:293. [PMID: 20513238 PMCID: PMC3098070 DOI: 10.1186/1471-2105-11-293] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2009] [Accepted: 05/31/2010] [Indexed: 12/02/2022] Open
Abstract
Background Metabolic pathway is a highly regulated network consisting of many metabolic reactions involving substrates, enzymes, and products, where substrates can be transformed into products with particular catalytic enzymes. Since experimental determination of the network of substrate-enzyme-product triad (whether the substrate can be transformed into the product with a given enzyme) is both time-consuming and expensive, it would be very useful to develop a computational approach for predicting the network of substrate-enzyme-product triads. Results A mathematical model for predicting the network of substrate-enzyme-product triads was developed. Meanwhile, a benchmark dataset was constructed that contains 744,192 substrate-enzyme-product triads, of which 14,592 are networking triads, and 729,600 are non-networking triads; i.e., the number of the negative triads was about 50 times the number of the positive triads. The molecular graph was introduced to calculate the similarity between the substrate compounds and between the product compounds, while the functional domain composition was introduced to calculate the similarity between enzyme molecules. The nearest neighbour algorithm was utilized as a prediction engine, in which a novel metric was introduced to measure the "nearness" between triads. To train and test the prediction engine, one tenth of the positive triads and one tenth of the negative triads were randomly picked from the benchmark dataset as the testing samples, while the remaining were used to train the prediction model. It was observed that the overall success rate in predicting the network for the testing samples was 98.71%, with 95.41% success rate for the 1,460 testing networking triads and 98.77% for the 72,960 testing non-networking triads. Conclusions It is quite promising and encouraged to use the molecular graph to calculate the similarity between compounds and use the functional domain composition to calculate the similarity between enzymes for studying the substrate-enzyme-product network system. The software is available upon request.
Collapse
Affiliation(s)
- Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, PR China
| | | | | | | | | |
Collapse
|