1
|
Iwaniak A, Minkiewicz P, Darewicz M. Bioinformatics and bioactive peptides from foods: Do they work together? ADVANCES IN FOOD AND NUTRITION RESEARCH 2024; 108:35-111. [PMID: 38461003 DOI: 10.1016/bs.afnr.2023.09.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/11/2024]
Abstract
We live in the Big Data Era which affects many aspects of science, including research on bioactive peptides derived from foods, which during the last few decades have been a focus of interest for scientists. These two issues, i.e., the development of computer technologies and progress in the discovery of novel peptides with health-beneficial properties, are closely interrelated. This Chapter presents the example applications of bioinformatics for studying biopeptides, focusing on main aspects of peptide analysis as the starting point, including: (i) the role of peptide databases; (ii) aspects of bioactivity prediction; (iii) simulation of peptide release from proteins. Bioinformatics can also be used for predicting other features of peptides, including ADMET, QSAR, structure, and taste. To answer the question asked "bioinformatics and bioactive peptides from foods: do they work together?", currently it is almost impossible to find examples of peptide research with no bioinformatics involved. However, theoretical predictions are not equivalent to experimental work and always require critical scrutiny. The aspects of compatibility of in silico and in vitro results are also summarized herein.
Collapse
Affiliation(s)
- Anna Iwaniak
- Chair of Food Biochemistry, Faculty of Food Science, University of Warmia and Mazury in Olsztyn, Olsztyn-Kortowo, Poland.
| | - Piotr Minkiewicz
- Chair of Food Biochemistry, Faculty of Food Science, University of Warmia and Mazury in Olsztyn, Olsztyn-Kortowo, Poland
| | - Małgorzata Darewicz
- Chair of Food Biochemistry, Faculty of Food Science, University of Warmia and Mazury in Olsztyn, Olsztyn-Kortowo, Poland
| |
Collapse
|
2
|
Yan B, Ran X, Gollu A, Cheng Z, Zhou X, Chen Y, Yang ZJ. IntEnzyDB: an Integrated Structure-Kinetics Enzymology Database. J Chem Inf Model 2022; 62:5841-5848. [PMID: 36286319 DOI: 10.1021/acs.jcim.2c01139] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Data-driven modeling has emerged as a new paradigm for biocatalyst design and discovery. Biocatalytic databases that integrate enzyme structure and function data are in urgent need. Here we describe IntEnzyDB as an integrated structure-kinetics database for facile statistical modeling and machine learning. IntEnzyDB employs a relational database architecture with a flattened data structure, which allows rapid data operation. This architecture also makes it easy for IntEnzyDB to incorporate more types of enzyme function data. IntEnzyDB contains enzyme kinetics and structure data from six enzyme commission classes. Using 1050 enzyme structure-kinetics pairs, we investigated the efficiency-perturbing propensities of mutations that are close or distal to the active site. The statistical results show that efficiency-enhancing mutations are globally encoded and that deleterious mutations are much more likely to occur in close mutations than in distal mutations. Finally, we describe a web interface that allows public users to access enzymology data stored in IntEnzyDB. IntEnzyDB will provide a computational facility for data-driven modeling in biocatalysis and molecular evolution.
Collapse
Affiliation(s)
- Bailu Yan
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States.,Department of Biostatistics, Vanderbilt University, Nashville, Tennessee 37205, United States
| | - Xinchun Ran
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States
| | - Anvita Gollu
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States
| | - Zihao Cheng
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States
| | - Xiang Zhou
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States
| | - Yiwen Chen
- Data Science Institute, Vanderbilt University, Nashville, Tennessee 37235, United States
| | - Zhongyue J Yang
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States.,Center for Structural Biology, Vanderbilt University, Nashville, Tennessee 37235, United States.,Vanderbilt Institute of Chemical Biology, Vanderbilt University, Nashville, Tennessee 37235, United States.,Data Science Institute, Vanderbilt University, Nashville, Tennessee 37235, United States.,Department of Chemical and Biomolecular Engineering, Vanderbilt University, Nashville, Tennessee 37205, United States
| |
Collapse
|
3
|
Vicedomini R, Bouly JP, Laine E, Falciatore A, Carbone A. Multiple profile models extract features from protein sequence data and resolve functional diversity of very different protein families. Mol Biol Evol 2022; 39:6556147. [PMID: 35353898 PMCID: PMC9016551 DOI: 10.1093/molbev/msac070] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Functional classification of proteins from sequences alone has become a critical bottleneck in understanding the myriad of protein sequences that accumulate in our databases. The great diversity of homologous sequences hides, in many cases, a variety of functional activities that cannot be anticipated. Their identification appears critical for a fundamental understanding of the evolution of living organisms and for biotechnological applications. ProfileView is a sequence-based computational method, designed to functionally classify sets of homologous sequences. It relies on two main ideas: the use of multiple profile models whose construction explores evolutionary information in available databases, and a novel definition of a representation space in which to analyse sequences with multiple profile models combined together. ProfileView classifies protein families by enriching known functional groups with new sequences and discovering new groups and subgroups. We validate ProfileView on seven classes of widespread proteins involved in the interaction with nucleic acids, amino acids and small molecules, and in a large variety of functions and enzymatic reactions. Profile-View agrees with the large set of functional data collected for these proteins from the literature regarding the organisation into functional subgroups and residues that characterise the functions. In addition, ProfileView resolves undefined functional classifications and extracts the molecular determinants underlying protein functional diversity, showing its potential to select sequences towards accurate experimental design and discovery of novel biological functions. On protein families with complex domain architecture, ProfileView functional classification reconciles domain combinations, unlike phylogenetic reconstruction. ProfileView proves to outperform the functional classification approach PANTHER, the two k-mer based methods CUPP and eCAMI and a neural network approach based on Restricted Boltzmann Machines. It overcomes time complexity limitations of the latter.
Collapse
Affiliation(s)
- R Vicedomini
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative - UMR 7238, 4 place Jussieu, 75005 Paris, France.,Sorbonne Université, Institut des Sciences du Calcul et des Données
| | - J P Bouly
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative - UMR 7238, 4 place Jussieu, 75005 Paris, France.,CNRS, Sorbonne Université Institut de Biologie Physico-Chimique, Laboratory of Chloroplast Biology and Light Sensing in Microalgae - UMR7141, Paris, France
| | - E Laine
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative - UMR 7238, 4 place Jussieu, 75005 Paris, France
| | - A Falciatore
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative - UMR 7238, 4 place Jussieu, 75005 Paris, France.,CNRS, Sorbonne Université Institut de Biologie Physico-Chimique, Laboratory of Chloroplast Biology and Light Sensing in Microalgae - UMR7141, Paris, France
| | - A Carbone
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative - UMR 7238, 4 place Jussieu, 75005 Paris, France.,Institut Universitaire de France, Paris 75005, France
| |
Collapse
|
4
|
Waman VP, Orengo C, Kleywegt GJ, Lesk AM. Three-dimensional Structure Databases of Biological Macromolecules. Methods Mol Biol 2022; 2449:43-91. [PMID: 35507259 DOI: 10.1007/978-1-0716-2095-3_3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Databases of three-dimensional structures of proteins (and their associated molecules) provide: (a) Curated repositories of coordinates of experimentally determined structures, including extensive metadata; for instance information about provenance, details about data collection and interpretation, and validation of results. (b) Information-retrieval tools to allow searching to identify entries of interest and provide access to them. (c) Links among databases, especially to databases of amino-acid and genetic sequences, and of protein function; and links to software for analysis of amino-acid sequence and protein structure, and for structure prediction. (d) Collections of predicted three-dimensional structures of proteins. These will become more and more important after the breakthrough in structure prediction achieved by AlphaFold2. The single global archive of experimentally determined biomacromolecular structures is the Protein Data Bank (PDB). It is managed by wwPDB, a consortium of five partner institutions: the Protein Data Bank in Europe (PDBe), the Research Collaboratory for Structural Bioinformatics (RCSB), the Protein Data Bank Japan (PDBj), the BioMagResBank (BMRB), and the Electron Microscopy Data Bank (EMDB). In addition to jointly managing the PDB repository, the individual wwPDB partners offer many tools for analysis of protein and nucleic acid structures and their complexes, including providing computer-graphic representations. Their collective and individual websites serve as hubs of the community of structural biologists, offering newsletters, reports from Task Forces, training courses, and "helpdesks," as well as links to external software.Many specialized projects are based on the information contained in the PDB. Especially important are SCOP, CATH, and ECOD, which present classifications of protein domains.
Collapse
Affiliation(s)
- Vaishali P Waman
- Institute of Structural and Molecular Biology, University College London, London, UK
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London, UK
| | - Gerard J Kleywegt
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Arthur M Lesk
- Department of Biochemistry and Molecular Biology and Center for Computational Biology and Bioinformatics, The Pennsylvania State University, University Park, PA, USA.
| |
Collapse
|
5
|
Bordin N, Sillitoe I, Lees JG, Orengo C. Tracing Evolution Through Protein Structures: Nature Captured in a Few Thousand Folds. Front Mol Biosci 2021; 8:668184. [PMID: 34041266 PMCID: PMC8141709 DOI: 10.3389/fmolb.2021.668184] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 04/27/2021] [Indexed: 11/13/2022] Open
Abstract
This article is dedicated to the memory of Cyrus Chothia, who was a leading light in the world of protein structure evolution. His elegant analyses of protein families and their mechanisms of structural and functional evolution provided important evolutionary and biological insights and firmly established the value of structural perspectives. He was a mentor and supervisor to many other leading scientists who continued his quest to characterise structure and function space. He was also a generous and supportive colleague to those applying different approaches. In this article we review some of his accomplishments and the history of protein structure classifications, particularly SCOP and CATH. We also highlight some of the evolutionary insights these two classifications have brought. Finally, we discuss how the expansion and integration of protein sequence data into these structural families helps reveal the dark matter of function space and can inform the emergence of novel functions in Metazoa. Since we cover 25 years of structural classification, it has not been feasible to review all structure based evolutionary studies and hence we focus mainly on those undertaken by the SCOP and CATH groups and their collaborators.
Collapse
Affiliation(s)
- Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Jonathan G Lees
- Department of Biological and Medical Sciences, Faculty of Health and Life Sciences, Oxford Brookes University, Oxford, United Kingdom
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| |
Collapse
|
6
|
Borba JVVB, Silva AC, Lima MNN, Mendonca SS, Furnham N, Costa FTM, Andrade CH. Chemogenomics and bioinformatics approaches for prioritizing kinases as drug targets for neglected tropical diseases. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2020; 124:187-223. [PMID: 33632465 DOI: 10.1016/bs.apcsb.2020.10.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Neglected tropical diseases (NTDs) are a group of twenty-one diseases classified by the World Health Organization that prevail in regions with tropical and subtropical climate and affect more than one billion people. There is an urgent need to develop new and safer drugs for these diseases. Protein kinases are a potential class of targets for developing new drugs against NTDs, since they play crucial role in many biological processes, such as signaling pathways, regulating cellular communication, division, metabolism and death. Bioinformatics is a field that aims to organize large amounts of biological data as well as develop and use tools for understanding and analyze them in order to produce meaningful information in a biological manner. In combination with chemogenomics, which analyzes chemical-biological interactions to screen ligands against selected targets families, these approaches can be used to stablish a rational strategy for prioritizing new drug targets for NTDs. Here, we describe how bioinformatics and chemogenomics tools can help to identify protein kinases and their potential inhibitors for the development of new drugs for NTDs. We present a review of bioinformatics tools and techniques that can be used to define an organisms kinome for drug prioritization, drug and target repurposing, multi-quinase inhibition approachs and selectivity profiling. We also present some successful examples of the application of such approaches in recent case studies.
Collapse
Affiliation(s)
- Joyce Villa Verde Bastos Borba
- LabMol-Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goiás, Goiânia, GO, Brazil; Laboratory of Tropical Diseases-Prof. Luiz Jacintho da Silva, Department of Genetics, Evolution and Bioagents, University of Campinas, Campinas, SP, Brazil
| | - Arthur Carvalho Silva
- LabMol-Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goiás, Goiânia, GO, Brazil
| | - Marilia Nunes Nascimento Lima
- LabMol-Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goiás, Goiânia, GO, Brazil
| | - Sabrina Silva Mendonca
- LabMol-Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goiás, Goiânia, GO, Brazil
| | - Nicholas Furnham
- Department of Infection Biology, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Fabio Trindade Maranhão Costa
- Laboratory of Tropical Diseases-Prof. Luiz Jacintho da Silva, Department of Genetics, Evolution and Bioagents, University of Campinas, Campinas, SP, Brazil
| | - Carolina Horta Andrade
- LabMol-Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goiás, Goiânia, GO, Brazil; Department of Infection Biology, London School of Hygiene and Tropical Medicine, London, United Kingdom.
| |
Collapse
|
7
|
Yoo WG, Kang JM, Lê HG, Pak JH, Hong SJ, Sohn WM, Na BK. Bile Ductal Transcriptome Identifies Key Pathways and Hub Genes in Clonorchis sinensis-Infected Sprague-Dawley Rats. THE KOREAN JOURNAL OF PARASITOLOGY 2020; 58:513-525. [PMID: 33202503 PMCID: PMC7672232 DOI: 10.3347/kjp.2020.58.5.513] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Accepted: 09/27/2020] [Indexed: 12/18/2022]
Abstract
Clonorchis sinensis is a food-borne trematode that infects more than 15 million people. The liver fluke causes clonorchiasis and chronical cholangitis, and promotes cholangiocarcinoma. The underlying molecular pathogenesis occurring in the bile duct by the infection is little known. In this study, transcriptome profile in the bile ducts infected with C. sinensis were analyzed using microarray methods. Differentially expressed genes (DEGs) were 1,563 and 1,457 at 2 and 4 weeks after infection. Majority of the DEGs were temporally dysregulated at 2 weeks, but 519 DEGs showed monotonically changing expression patterns that formed seven distinct expression profiles. Protein-protein interaction (PPI) analysis of the DEG products revealed 5 sub-networks and 10 key hub proteins while weighted co-expression network analysis (WGCNA)-derived gene-gene interaction exhibited 16 co-expression modules and 13 key hub genes. The DEGs were significantly enriched in 16 Kyoto Encyclopedia of Genes and Genomes pathways, which were related to original systems, cellular process, environmental information processing, and human diseases. This study uncovered a global picture of gene expression profiles in the bile ducts infected with C. sinensis, and provided a set of potent predictive biomarkers for early diagnosis of clonorchiasis.
Collapse
Affiliation(s)
- Won Gi Yoo
- Department of Medical Environmental Biology, Chung-Ang University College of Medicine, Seoul 06974, Korea
| | - Jung-Mi Kang
- Department of Parasitology and Tropical Medicine, and Institute of Health Sciences, Gyeongsang National University College of Medicine, Jinju 52727, Korea.,Department of Convergence Medical Science, Gyeongsang National University, Jinju 52727, Korea
| | - Huong Giang Lê
- Department of Parasitology and Tropical Medicine, and Institute of Health Sciences, Gyeongsang National University College of Medicine, Jinju 52727, Korea.,Department of Convergence Medical Science, Gyeongsang National University, Jinju 52727, Korea
| | - Jhang Ho Pak
- Department of Convergence Medicine, University of Ulsan College of Medicine and Asan Institute for Life Sciences, Asan Medical Center, Seoul 05505, Korea
| | - Sung-Jong Hong
- Department of Medical Environmental Biology, Chung-Ang University College of Medicine, Seoul 06974, Korea
| | - Woon-Mok Sohn
- Department of Parasitology and Tropical Medicine, and Institute of Health Sciences, Gyeongsang National University College of Medicine, Jinju 52727, Korea
| | - Byoung-Kuk Na
- Department of Parasitology and Tropical Medicine, and Institute of Health Sciences, Gyeongsang National University College of Medicine, Jinju 52727, Korea.,Department of Convergence Medical Science, Gyeongsang National University, Jinju 52727, Korea
| |
Collapse
|
8
|
Holliday GL, Brown SD, Mischel D, Polacco BJ, Babbitt PC. A strategy for large-scale comparison of evolutionary- and reaction-based classifications of enzyme function. Database (Oxford) 2020; 2020:baaa034. [PMID: 32449511 PMCID: PMC7246345 DOI: 10.1093/database/baaa034] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Revised: 03/18/2020] [Accepted: 04/27/2020] [Indexed: 12/12/2022]
Abstract
Determining the molecular function of enzymes discovered by genome sequencing represents a primary foundation for understanding many aspects of biology. Historically, classification of enzyme reactions has used the enzyme nomenclature system developed to describe the overall reactions performed by biochemically characterized enzymes, irrespective of their associated sequences. In contrast, functional classification and assignment for the millions of protein sequences of unknown function now available is largely done in two computational steps, first by similarity-based assignment of newly obtained sequences to homologous groups, followed by transferring to them the known functions of similar biochemically characterized homologs. Due to the fundamental differences in their etiologies and practice, `how' these chemistry- and evolution-centric functional classification systems relate to each other has been difficult to explore on a large scale. To investigate this issue in a new way, we integrated two published ontologies that had previously described each of these classification systems independently. The resulting infrastructure was then used to compare the functional assignments obtained from each classification system for the well-studied and functionally diverse enolase superfamily. Mapping these function assignments to protein structure and reaction similarity networks shows a profound and complex disconnect between the homology- and chemistry-based classification systems. This conclusion mirrors previous observations suggesting that except for closely related sequences, facile annotation transfer from small numbers of characterized enzymes to the huge number uncharacterized homologs to which they are related is problematic. Our extension of these comparisons to large enzyme superfamilies in a computationally intelligent manner provides a foundation for new directions in protein function prediction for the huge proportion of sequences of unknown function represented in major databases. Interactive sequence, reaction, substrate and product similarity networks computed for this work for the enolase and two other superfamilies are freely available for download from the Structure Function Linkage Database Archive (http://sfld.rbvi.ucsf.edu).
Collapse
Affiliation(s)
- Gemma L Holliday
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, 1700 4th Street, CA 94143, USA
- Present Address: Medicines Discovery Catapult, Mereside, Alderley Park, Alderley Edge SK10 4TG, UK
| | - Shoshana D Brown
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, 1700 4th Street, CA 94143, USA
| | - David Mischel
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, 1700 4th Street, CA 94143, USA
| | - Benjamin J Polacco
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, 1700 4th Street, CA 94143, USA
| | - Patricia C Babbitt
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, 1700 4th Street, CA 94143, USA
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, 1700 4th Street, CA 94143, USA
- Quantitative Biosciences Institute, University of California, San Francisco, San Francisco, 1700 4th Street, CA 94143, USA
| |
Collapse
|
9
|
Catazaro J, Caprez A, Swanson D, Powers R. Functional Evolution of Proteins. Proteins 2019; 87:492-501. [PMID: 30714210 DOI: 10.1002/prot.25670] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Revised: 11/02/2018] [Accepted: 01/31/2019] [Indexed: 11/12/2022]
Abstract
The functional evolution of proteins advances through gene duplication followed by functional drift, whereas molecular evolution occurs through random mutational events. Over time, protein active-site structures or functional epitopes remain highly conserved, which enables relationships to be inferred between distant orthologs or paralogs. In this study, we present the first functional clustering and evolutionary analysis of the RCSB Protein Data Bank (RCSB PDB) based on similarities between active-site structures. All of the ligand-bound proteins within the RCSB PDB were scored using our Comparison of Protein Active-site Structures (CPASS) software and database (http://cpass.unl.edu/). Principal component analysis was then used to identify 4431 representative structures to construct a phylogenetic tree based on the CPASS comparative scores (http://itol.embl.de/shared/jcatazaro). The resulting phylogenetic tree identified a sequential, step-wise evolution of protein active-sites and provides novel insights into the emergence of protein function or changes in substrate specificity based on subtle changes in geometry and amino acid composition.
Collapse
Affiliation(s)
- Jonathan Catazaro
- Department of Chemistry, University of Nebraska-Lincoln, Lincoln, Nebraska
| | - Adam Caprez
- Holland Computing Center, Office of Research, University of Nebraska-Lincoln, Lincoln, Nebraska
| | - David Swanson
- Holland Computing Center, Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, Nebraska
| | - Robert Powers
- Department of Chemistry, University of Nebraska-Lincoln, Lincoln, Nebraska.,Department of Chemistry, Nebraska Center for Integrated Biomolecular Communication, Lincoln, Nebraska
| |
Collapse
|
10
|
.Newton MS, Arcus VL, Gerth ML, Patrick WM. Enzyme evolution: innovation is easy, optimization is complicated. Curr Opin Struct Biol 2018; 48:110-116. [DOI: 10.1016/j.sbi.2017.11.007] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2017] [Accepted: 11/21/2017] [Indexed: 10/18/2022]
|
11
|
Valasatava Y, Rosato A, Furnham N, Thornton JM, Andreini C. To what extent do structural changes in catalytic metal sites affect enzyme function? J Inorg Biochem 2018; 179:40-53. [PMID: 29161638 PMCID: PMC5760197 DOI: 10.1016/j.jinorgbio.2017.11.002] [Citation(s) in RCA: 56] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2017] [Revised: 11/02/2017] [Accepted: 11/04/2017] [Indexed: 01/09/2023]
Abstract
About half of known enzymatic reactions involve metals. Enzymes belonging to the same superfamily often evolve to catalyze different reactions on the same structural scaffold. The work presented here investigates how functional differentiation, within superfamilies that contain metalloenzymes, relates to structural changes at the catalytic metal site. In general, when the catalytic metal site is unchanged across the enzymes of a superfamily, the functional differentiation within the superfamily tends to be low and the mechanism conserved. Conversely, all types of structural changes in the metal binding site are observed for superfamilies with high functional differentiation. Overall, the catalytic role of the metal ions appears to be one of the most conserved features of the enzyme mechanism within metalloenzyme superfamilies. In particular, when the catalytic role of the metal ion does not involve a redox reaction (i.e. there is no exchange of electrons with the substrate), this role is almost always maintained even when the site undergoes significant structural changes. In these enzymes, functional diversification is most often associated with modifications in the surrounding protein matrix, which has changed so much that the enzyme chemistry is significantly altered. On the other hand, in more than 50% of the examples where the metal has a redox role in catalysis, changes at the metal site modify its catalytic role. Further, we find that there are no examples in our dataset where metal sites with a redox role are lost during evolution. SYNOPSIS In this paper we investigate how functional diversity within superfamilies of metalloenzymes relates to structural changes at the catalytic metal site. Evolution tends to strictly conserve the metal site. When changes occur, they do not modify the catalytic role of non-redox metals whereas they affect the role of redox-active metals.
Collapse
Affiliation(s)
- Yana Valasatava
- Magnetic Resonance Center, University of Florence, 50019 Sesto Fiorentino, Italy; Department of Chemistry, University of Florence, 50019 Sesto Fiorentino, Italy
| | - Antonio Rosato
- Magnetic Resonance Center, University of Florence, 50019 Sesto Fiorentino, Italy; Department of Chemistry, University of Florence, 50019 Sesto Fiorentino, Italy
| | - Nicholas Furnham
- Department of Pathogen Molecular Biology, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, United Kingdom
| | - Janet M Thornton
- EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Claudia Andreini
- Magnetic Resonance Center, University of Florence, 50019 Sesto Fiorentino, Italy; Department of Chemistry, University of Florence, 50019 Sesto Fiorentino, Italy.
| |
Collapse
|
12
|
Abstract
The significant expansion in protein sequence and structure data that we are now witnessing brings with it a pressing need to bring order to the protein world. Such order enables us to gain insights into the evolution of proteins, their function and the extent to which the functional repertoire can vary across the three kingdoms of life. This has lead to the creation of a wide range of protein family classifications that aim to group proteins based upon their evolutionary relationships.In this chapter we discuss the approaches and methods that are frequently used in the classification of proteins, with a specific emphasis on the classification of protein domains. The construction of both domain sequence and domain structure databases is considered and we show how the use of domain family annotations to assign structural and functional information is enhancing our understanding of genomes.
Collapse
|
13
|
Nardo AE, Añón MC, Parisi G. Large-scale mapping of bioactive peptides in structural and sequence space. PLoS One 2018; 13:e0191063. [PMID: 29351315 PMCID: PMC5774755 DOI: 10.1371/journal.pone.0191063] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2017] [Accepted: 12/27/2017] [Indexed: 12/11/2022] Open
Abstract
Health-enhancing potential bioactive peptide (BP) has driven an interest in food proteins as well as in the development of predictive methods. Research in this area has been especially active to use them as components in functional foods. Apparently, BPs do not have a given biological function in the containing proteins and they do not evolve under independent evolutionary constraints. In this work we performed a large-scale mapping of BPs in sequence and structural space. Using well curated BP deposited in BIOPEP database, we searched for exact matches in non-redundant sequences databases. Proteins containing BPs, were used in fold-recognition methods to predict the corresponding folds and BPs occurrences were mapped. We found that fold distribution of BP occurrences possibly reflects sequence relative abundance in databases. However, we also found that proteins with 5 or more than 5 BP in their sequences correspond to well populated protein folds, called superfolds. Also, we found that in well populated superfamilies, BPs tend to adopt similar locations in the protein fold, suggesting the existence of hotspots. We think that our results could contribute to the development of new bioinformatics pipeline to improve BP detection.
Collapse
Affiliation(s)
- Agustina E. Nardo
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, CONICET, Bernal, Argentina
- Centro de Investigación y Desarrollo en Criotecnología de Alimentos, Facultad de Ciencia Exactas, Universidad Nacional de la Plata - Comisión de Investigaciones Científicas - CONICET, La Plata, Argentina
| | - M. Cristina Añón
- Centro de Investigación y Desarrollo en Criotecnología de Alimentos, Facultad de Ciencia Exactas, Universidad Nacional de la Plata - Comisión de Investigaciones Científicas - CONICET, La Plata, Argentina
| | - Gustavo Parisi
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, CONICET, Bernal, Argentina
| |
Collapse
|
14
|
Schomburg I, Jeske L, Ulbrich M, Placzek S, Chang A, Schomburg D. The BRENDA enzyme information system–From a database to an expert system. J Biotechnol 2017; 261:194-206. [DOI: 10.1016/j.jbiotec.2017.04.020] [Citation(s) in RCA: 102] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2017] [Revised: 04/11/2017] [Accepted: 04/18/2017] [Indexed: 02/06/2023]
|
15
|
Mudgal R, Srinivasan N, Chandra N. Resolving protein structure-function-binding site relationships from a binding site similarity network perspective. Proteins 2017; 85:1319-1335. [PMID: 28342236 DOI: 10.1002/prot.25293] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2016] [Revised: 03/18/2017] [Accepted: 03/20/2017] [Indexed: 11/05/2022]
Abstract
Functional annotation is seldom straightforward with complexities arising due to functional divergence in protein families or functional convergence between non-homologous protein families, leading to mis-annotations. An enzyme may contain multiple domains and not all domains may be involved in a given function, adding to the complexity in function annotation. To address this, we use binding site information from bound cognate ligands and catalytic residues, since it can help in resolving fold-function relationships at a finer level and with higher confidence. A comprehensive database of 2,020 fold-function-binding site relationships has been systematically generated. A network-based approach is employed to capture the complexity in these relationships, from which different types of associations are deciphered, that identify versatile protein folds performing diverse functions, same function associated with multiple folds and one-to-one relationships. Binding site similarity networks integrated with fold, function, and ligand similarity information are generated to understand the depth of these relationships. Apart from the observed continuity in the functional site space, network properties of these revealed versatile families with topologically different or dissimilar binding sites and structural families that perform very similar functions. As a case study, subtle changes in the active site of a set of evolutionarily related superfamilies are studied using these networks. Tracing of such similarities in evolutionarily related proteins provide clues into the transition and evolution of protein functions. Insights from this study will be helpful in accurate and reliable functional annotations of uncharacterized proteins, poly-pharmacology, and designing enzymes with new functional capabilities. Proteins 2017; 85:1319-1335. © 2017 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Richa Mudgal
- IISc Mathematics Initiative, Indian Institute of Science, Bangalore, Karnataka, 560 012, India
| | | | - Nagasuma Chandra
- Department of Biochemistry, Indian Institute of Science, Bangalore, Karnataka, 560 012, India
| |
Collapse
|
16
|
CATH-Gene3D: Generation of the Resource and Its Use in Obtaining Structural and Functional Annotations for Protein Sequences. Methods Mol Biol 2017; 1558:79-110. [PMID: 28150234 DOI: 10.1007/978-1-4939-6783-4_4] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
Abstract
This chapter describes the generation of the data in the CATH-Gene3D online resource and how it can be used to study protein domains and their evolutionary relationships. Methods will be presented for: comparing protein structures, recognizing homologs, predicting domain structures within protein sequences, and subclassifying superfamilies into functionally pure families, together with a guide on using the webpages.
Collapse
|
17
|
Zallot R, Harrison KJ, Kolaczkowski B, de Crécy-Lagard V. Functional Annotations of Paralogs: A Blessing and a Curse. Life (Basel) 2016; 6:life6030039. [PMID: 27618105 PMCID: PMC5041015 DOI: 10.3390/life6030039] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Revised: 08/29/2016] [Accepted: 09/02/2016] [Indexed: 12/15/2022] Open
Abstract
Gene duplication followed by mutation is a classic mechanism of neofunctionalization, producing gene families with functional diversity. In some cases, a single point mutation is sufficient to change the substrate specificity and/or the chemistry performed by an enzyme, making it difficult to accurately separate enzymes with identical functions from homologs with different functions. Because sequence similarity is often used as a basis for assigning functional annotations to genes, non-isofunctional gene families pose a great challenge for genome annotation pipelines. Here we describe how integrating evolutionary and functional information such as genome context, phylogeny, metabolic reconstruction and signature motifs may be required to correctly annotate multifunctional families. These integrative analyses can also lead to the discovery of novel gene functions, as hints from specific subgroups can guide the functional characterization of other members of the family. We demonstrate how careful manual curation processes using comparative genomics can disambiguate subgroups within large multifunctional families and discover their functions. We present the COG0720 protein family as a case study. We also discuss strategies to automate this process to improve the accuracy of genome functional annotation pipelines.
Collapse
Affiliation(s)
- Rémi Zallot
- Department of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida, Gainesville, FL 32611, USA.
| | - Katherine J Harrison
- Department of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida, Gainesville, FL 32611, USA.
| | - Bryan Kolaczkowski
- Department of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida, Gainesville, FL 32611, USA.
| | - Valérie de Crécy-Lagard
- Department of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida, Gainesville, FL 32611, USA.
| |
Collapse
|
18
|
Boari de Lima E, Meira W, de Melo-Minardi RC. Isofunctional Protein Subfamily Detection Using Data Integration and Spectral Clustering. PLoS Comput Biol 2016; 12:e1005001. [PMID: 27348631 PMCID: PMC4922564 DOI: 10.1371/journal.pcbi.1005001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2015] [Accepted: 05/22/2016] [Indexed: 01/14/2023] Open
Abstract
As increasingly more genomes are sequenced, the vast majority of proteins may only be annotated computationally, given experimental investigation is extremely costly. This highlights the need for computational methods to determine protein functions quickly and reliably. We believe dividing a protein family into subtypes which share specific functions uncommon to the whole family reduces the function annotation problem's complexity. Hence, this work's purpose is to detect isofunctional subfamilies inside a family of unknown function, while identifying differentiating residues. Similarity between protein pairs according to various properties is interpreted as functional similarity evidence. Data are integrated using genetic programming and provided to a spectral clustering algorithm, which creates clusters of similar proteins. The proposed framework was applied to well-known protein families and to a family of unknown function, then compared to ASMC. Results showed our fully automated technique obtained better clusters than ASMC for two families, besides equivalent results for other two, including one whose clusters were manually defined. Clusters produced by our framework showed great correspondence with the known subfamilies, besides being more contrasting than those produced by ASMC. Additionally, for the families whose specificity determining positions are known, such residues were among those our technique considered most important to differentiate a given group. When run with the crotonase and enolase SFLD superfamilies, the results showed great agreement with this gold-standard. Best results consistently involved multiple data types, thus confirming our hypothesis that similarities according to different knowledge domains may be used as functional similarity evidence. Our main contributions are the proposed strategy for selecting and integrating data types, along with the ability to work with noisy and incomplete data; domain knowledge usage for detecting subfamilies in a family with different specificities, thus reducing the complexity of the experimental function characterization problem; and the identification of residues responsible for specificity.
Collapse
Affiliation(s)
- Elisa Boari de Lima
- Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Wagner Meira
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | | |
Collapse
|
19
|
Sillitoe I, Furnham N. FunTree: advances in a resource for exploring and contextualising protein function evolution. Nucleic Acids Res 2015; 44:D317-23. [PMID: 26590404 PMCID: PMC4702901 DOI: 10.1093/nar/gkv1274] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2015] [Accepted: 11/03/2015] [Indexed: 11/13/2022] Open
Abstract
FunTree is a resource that brings together protein sequence, structure and functional information, including overall chemical reaction and mechanistic data, for structurally defined domain superfamilies. Developed in tandem with the CATH database, the original FunTree contained just 276 superfamilies focused on enzymes. Here, we present an update of FunTree that has expanded to include 2340 superfamilies including both enzymes and proteins with non-enzymatic functions annotated by Gene Ontology (GO) terms. This allows the investigation of how novel functions have evolved within a structurally defined superfamily and provides a means to analyse trends across many superfamilies. This is done not only within the context of a protein's sequence and structure but also the relationships of their functions. New measures of functional similarity have been integrated, including for enzymes comparisons of overall reactions based on overall bond changes, reaction centres (the local environment atoms involved in the reaction) and the sub-structure similarities of the metabolites involved in the reaction and for non-enzymes semantic similarities based on the GO. To identify and highlight changes in function through evolution, ancestral character estimations are made and presented. All this is accessible through a new re-designed web interface that can be found at http://www.funtree.info.
Collapse
Affiliation(s)
- Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK
| | - Nicholas Furnham
- Department of Pathogen Molecular Biology, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, UK
| |
Collapse
|
20
|
Large-Scale Analysis Exploring Evolution of Catalytic Machineries and Mechanisms in Enzyme Superfamilies. J Mol Biol 2015; 428:253-267. [PMID: 26585402 PMCID: PMC4751976 DOI: 10.1016/j.jmb.2015.11.010] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Revised: 10/05/2015] [Accepted: 11/10/2015] [Indexed: 01/28/2023]
Abstract
Enzymes, as biological catalysts, form the basis of all forms of life. How these proteins have evolved their functions remains a fundamental question in biology. Over 100 years of detailed biochemistry studies, combined with the large volumes of sequence and protein structural data now available, means that we are able to perform large-scale analyses to address this question. Using a range of computational tools and resources, we have compiled information on all experimentally annotated changes in enzyme function within 379 structurally defined protein domain superfamilies, linking the changes observed in functions during evolution to changes in reaction chemistry. Many superfamilies show changes in function at some level, although one function often dominates one superfamily. We use quantitative measures of changes in reaction chemistry to reveal the various types of chemical changes occurring during evolution and to exemplify these by detailed examples. Additionally, we use structural information of the enzymes active site to examine how different superfamilies have changed their catalytic machinery during evolution. Some superfamilies have changed the reactions they perform without changing catalytic machinery. In others, large changes of enzyme function, in terms of both overall chemistry and substrate specificity, have been brought about by significant changes in catalytic machinery. Interestingly, in some superfamilies, relatives perform similar functions but with different catalytic machineries. This analysis highlights characteristics of functional evolution across a wide range of superfamilies, providing insights that will be useful in predicting the function of uncharacterised sequences and the design of new synthetic enzymes. Examining how enzyme function evolves using sequence, structure, and reaction mechanism data. Quantifying changes in reaction mechanisms reveals how function has diverged in many superfamilies. Homologous domains frequently use different catalytic residues, which sometimes perform the same enzyme chemistry. This large-scale analysis has significance in protein function prediction and enzyme design.
Collapse
|
21
|
The history of the CATH structural classification of protein domains. Biochimie 2015; 119:209-17. [PMID: 26253692 PMCID: PMC4678953 DOI: 10.1016/j.biochi.2015.08.004] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2015] [Accepted: 08/01/2015] [Indexed: 11/21/2022]
Abstract
This article presents a historical review of the protein structure classification database CATH. Together with the SCOP database, CATH remains comprehensive and reasonably up-to-date with the now more than 100,000 protein structures in the PDB. We review the expansion of the CATH and SCOP resources to capture predicted domain structures in the genome sequence data and to provide information on the likely functions of proteins mediated by their constituent domains. The establishment of comprehensive function annotation resources has also meant that domain families can be functionally annotated allowing insights into functional divergence and evolution within protein families. We present a historical review of the protein structure database CATH. We review the expansion of the CATH and SCOP resources with sequence data and functional annotations. How functional annotation resources allow insights into functional divergence and evolution within protein families.
Collapse
|
22
|
Sillitoe I, Lewis T, Orengo C. Using CATH-Gene3D to Analyze the Sequence, Structure, and Function of Proteins. ACTA ACUST UNITED AC 2015; 50:1.28.1-1.28.21. [PMID: 26087950 DOI: 10.1002/0471250953.bi0128s50] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The CATH database is a classification of protein structures found in the Protein Data Bank (PDB). Protein structures are chopped into individual units of structural domains, and these domains are grouped together into superfamilies if there is sufficient evidence that they have diverged from a common ancestor during the process of evolution. A sister resource, Gene3D, extends this information by scanning sequence profiles of these CATH domain superfamilies against many millions of known proteins to identify related sequences. Thus the combined CATH-Gene3D resource provides confident predictions of the likely structural fold, domain organisation, and evolutionary relatives of these proteins. In addition, this resource incorporates annotations from a large number of external databases such as known enzyme active sites, GO molecular functions, physical interactions, and mutations. This unit details how to access and understand the information contained within the CATH-Gene3D Web pages, the downloadable data files, and the remotely accessible Web services.
Collapse
Affiliation(s)
- Ian Sillitoe
- University College London, London, United Kingdom
| | - Tony Lewis
- University College London, London, United Kingdom
| | | |
Collapse
|
23
|
Martínez Cuesta S, Rahman SA, Furnham N, Thornton JM. The Classification and Evolution of Enzyme Function. Biophys J 2015; 109:1082-6. [PMID: 25986631 DOI: 10.1016/j.bpj.2015.04.020] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2015] [Revised: 04/16/2015] [Accepted: 04/17/2015] [Indexed: 11/30/2022] Open
Abstract
Enzymes are the proteins responsible for the catalysis of life. Enzymes sharing a common ancestor as defined by sequence and structure similarity are grouped into families and superfamilies. The molecular function of enzymes is defined as their ability to catalyze biochemical reactions; it is manually classified by the Enzyme Commission and robust approaches to quantitatively compare catalytic reactions are just beginning to appear. Here, we present an overview of studies at the interface of the evolution and function of enzymes.
Collapse
Affiliation(s)
- Sergio Martínez Cuesta
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Syed Asad Rahman
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Nicholas Furnham
- Department of Pathogen Molecular Biology, London School of Hygiene & Tropical Medicine, London, United Kingdom
| | - Janet M Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.
| |
Collapse
|
24
|
Sillitoe I, Lewis TE, Cuff A, Das S, Ashford P, Dawson NL, Furnham N, Laskowski RA, Lee D, Lees JG, Lehtinen S, Studer RA, Thornton J, Orengo CA. CATH: comprehensive structural and functional annotations for genome sequences. Nucleic Acids Res 2015; 43:D376-81. [PMID: 25348408 PMCID: PMC4384018 DOI: 10.1093/nar/gku947] [Citation(s) in RCA: 309] [Impact Index Per Article: 30.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2014] [Accepted: 09/29/2014] [Indexed: 11/19/2022] Open
Abstract
The latest version of the CATH-Gene3D protein structure classification database (4.0, http://www.cathdb.info) provides annotations for over 235,000 protein domain structures and includes 25 million domain predictions. This article provides an update on the major developments in the 2 years since the last publication in this journal including: significant improvements to the predictive power of our functional families (FunFams); the release of our 'current' putative domain assignments (CATH-B); a new, strictly non-redundant data set of CATH domains suitable for homology benchmarking experiments (CATH-40) and a number of improvements to the web pages.
Collapse
Affiliation(s)
- Ian Sillitoe
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, WC1E 6BT, UK
| | - Tony E Lewis
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, WC1E 6BT, UK
| | - Alison Cuff
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, WC1E 6BT, UK
| | - Sayoni Das
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, WC1E 6BT, UK
| | - Paul Ashford
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, WC1E 6BT, UK
| | - Natalie L Dawson
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, WC1E 6BT, UK
| | - Nicholas Furnham
- London School of Hygiene and Tropical Medicine, Keppel Street, London, WC1E 7HT, UK
| | - Roman A Laskowski
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - David Lee
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, WC1E 6BT, UK
| | - Jonathan G Lees
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, WC1E 6BT, UK
| | - Sonja Lehtinen
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, WC1E 6BT, UK
| | - Romain A Studer
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Janet Thornton
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Christine A Orengo
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, WC1E 6BT, UK
| |
Collapse
|
25
|
Alderson RG, Barker D, Mitchell JBO. One origin for metallo-β-lactamase activity, or two? An investigation assessing a diverse set of reconstructed ancestral sequences based on a sample of phylogenetic trees. J Mol Evol 2014; 79:117-29. [PMID: 25185655 PMCID: PMC4185109 DOI: 10.1007/s00239-014-9639-7] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2014] [Accepted: 08/11/2014] [Indexed: 01/04/2023]
Abstract
Bacteria use metallo-β-lactamase enzymes to hydrolyse lactam rings found in many antibiotics, rendering them ineffective. Metallo-β-lactamase activity is thought to be polyphyletic, having arisen on more than one occasion within a single functionally diverse homologous superfamily. Since discovery of multiple origins of enzymatic activity conferring antibiotic resistance has broad implications for the continued clinical use of antibiotics, we test the hypothesis of polyphyly further; if lactamase function has arisen twice independently, the most recent common ancestor (MRCA) is not expected to possess lactam-hydrolysing activity. Two major problems present themselves. Firstly, even with a perfectly known phylogeny, ancestral sequence reconstruction is error prone. Secondly, the phylogeny is not known, and in fact reconstructing a single, unambiguous phylogeny for the superfamily has proven impossible. To obtain a more statistical view of the strength of evidence for or against MRCA lactamase function, we reconstructed a sample of 98 MRCAs of the metallo-β-lactamases, each based on a different tree in a bootstrap sample of reconstructed phylogenies. InterPro sequence signatures and homology modelling were then used to assess our sample of MRCAs for lactamase functionality. Only 5 % of these models conform to our criteria for metallo-β-lactamase functionality, suggesting that the ancestor was unlikely to have been a metallo-β-lactamase. On the other hand, given that ancestral proteins may have had metallo-β-lactamase functionality with variation in sequence and structural properties compared with extant enzymes, our criteria are conservative, estimating a lower bound of evidence for metallo-β-lactamase functionality but not an upper bound.
Collapse
Affiliation(s)
- Rosanna G. Alderson
- Biomedical Sciences Research Complex and EaStCHEM School of Chemistry, Purdie Building, University of St Andrews, North Haugh, St Andrews, KY16 9ST Scotland, UK
| | - Daniel Barker
- Sir Harold Mitchell Building, School of Biology, University of St Andrews, St Andrews, KY16 9TH Scotland, UK
| | - John B. O. Mitchell
- Biomedical Sciences Research Complex and EaStCHEM School of Chemistry, Purdie Building, University of St Andrews, North Haugh, St Andrews, KY16 9ST Scotland, UK
| |
Collapse
|
26
|
Martinez Cuesta S, Furnham N, Rahman SA, Sillitoe I, Thornton JM. The evolution of enzyme function in the isomerases. Curr Opin Struct Biol 2014; 26:121-30. [PMID: 25000289 PMCID: PMC4139412 DOI: 10.1016/j.sbi.2014.06.002] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2014] [Revised: 06/02/2014] [Accepted: 06/10/2014] [Indexed: 01/14/2023]
Abstract
The advent of computational approaches to measure functional similarity between enzymes adds a new dimension to existing evolutionary studies based on sequence and structure. This paper reviews research efforts aiming to understand the evolution of enzyme function in superfamilies, presenting a novel strategy to provide an overview of the evolution of enzymes belonging to an individual EC class, using the isomerases as an exemplar.
Collapse
Affiliation(s)
- Sergio Martinez Cuesta
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom.
| | - Nicholas Furnham
- Department of Pathogen Molecular Biology, London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, United Kingdom
| | - Syed Asad Rahman
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London, WC1E 6BT, United Kingdom
| | - Janet M Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom.
| |
Collapse
|
27
|
Nath N, Mitchell JBO, Caetano-Anollés G. The natural history of biocatalytic mechanisms. PLoS Comput Biol 2014; 10:e1003642. [PMID: 24874434 PMCID: PMC4038463 DOI: 10.1371/journal.pcbi.1003642] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2013] [Accepted: 04/09/2014] [Indexed: 11/29/2022] Open
Abstract
Phylogenomic analysis of the occurrence and abundance of protein domains in proteomes has recently showed that the α/β architecture is probably the oldest fold design. This holds important implications for the origins of biochemistry. Here we explore structure-function relationships addressing the use of chemical mechanisms by ancestral enzymes. We test the hypothesis that the oldest folds used the most mechanisms. We start by tracing biocatalytic mechanisms operating in metabolic enzymes along a phylogenetic timeline of the first appearance of homologous superfamilies of protein domain structures from CATH. A total of 335 enzyme reactions were retrieved from MACiE and were mapped over fold age. We define a mechanistic step type as one of the 51 mechanistic annotations given in MACiE, and each step of each of the 335 mechanisms was described using one or more of these annotations. We find that the first two folds, the P-loop containing nucleotide triphosphate hydrolase and the NAD(P)-binding Rossmann-like homologous superfamilies, were α/β architectures responsible for introducing 35% (18/51) of the known mechanistic step types. We find that these two oldest structures in the phylogenomic analysis of protein domains introduced many mechanistic step types that were later combinatorially spread in catalytic history. The most common mechanistic step types included fundamental building blocks of enzyme chemistry: "Proton transfer," "Bimolecular nucleophilic addition," "Bimolecular nucleophilic substitution," and "Unimolecular elimination by the conjugate base." They were associated with the most ancestral fold structure typical of P-loop containing nucleotide triphosphate hydrolases. Over half of the mechanistic step types were introduced in the evolutionary timeline before the appearance of structures specific to diversified organisms, during a period of architectural diversification. The other half unfolded gradually after organismal diversification and during a period that spanned ∼2 billion years of evolutionary history.
Collapse
Affiliation(s)
- Neetika Nath
- Biomedical Sciences Research Complex and EaStCHEM School of Chemistry, University of St. Andrews, North Haugh, St. Andrews, Scotland, United Kingdom
| | - John B. O. Mitchell
- Biomedical Sciences Research Complex and EaStCHEM School of Chemistry, University of St. Andrews, North Haugh, St. Andrews, Scotland, United Kingdom
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, United States of America
| |
Collapse
|
28
|
Alderson RG, De Ferrari L, Mavridis L, McDonagh JL, Mitchell JBO, Nath N. Enzyme informatics. Curr Top Med Chem 2014; 12:1911-23. [PMID: 23116471 DOI: 10.2174/156802612804547353] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2012] [Revised: 09/12/2012] [Accepted: 09/15/2012] [Indexed: 12/18/2022]
Abstract
Over the last 50 years, sequencing, structural biology and bioinformatics have completely revolutionised biomolecular science, with millions of sequences and tens of thousands of three dimensional structures becoming available. The bioinformatics of enzymes is well served by, mostly free, online databases. BRENDA describes the chemistry, substrate specificity, kinetics, preparation and biological sources of enzymes, while KEGG is valuable for understanding enzymes and metabolic pathways. EzCatDB, SFLD and MACiE are key repositories for data on the chemical mechanisms by which enzymes operate. At the current rate of genome sequencing and manual annotation, human curation will never finish the functional annotation of the ever-expanding list of known enzymes. Hence there is an increasing need for automated annotation, though it is not yet widespread for enzyme data. In contrast, functional ontologies such as the Gene Ontology already profit from automation. Despite our growing understanding of enzyme structure and dynamics, we are only beginning to be able to design novel enzymes. One can now begin to trace the functional evolution of enzymes using phylogenetics. The ability of enzymes to perform secondary functions, albeit relatively inefficiently, gives clues as to how enzyme function evolves. Substrate promiscuity in enzymes is one example of imperfect specificity in protein-ligand interactions. Similarly, most drugs bind to more than one protein target. This may sometimes result in helpful polypharmacology as a drug modulates plural targets, but also often leads to adverse side-effects. Many chemoinformatics approaches can be used to model the interactions between druglike molecules and proteins in silico. We can even use quantum chemical techniques like DFT and QM/MM to compute the structural and energetic course of enzyme catalysed chemical reaction mechanisms, including a full description of bond making and breaking.
Collapse
Affiliation(s)
- Rosanna G Alderson
- Biomedical Sciences Research Complex and EaStCHEM School of Chemistry, Purdie Building, University of St Andrews, North Haugh, St Andrews, Scotland, UK
| | | | | | | | | | | |
Collapse
|
29
|
Silveira SDA, de Melo-Minardi RC, da Silveira CH, Santoro MM, Meira Jr W. ENZYMAP: exploiting protein annotation for modeling and predicting EC number changes in UniProt/Swiss-Prot. PLoS One 2014; 9:e89162. [PMID: 24586563 PMCID: PMC3929618 DOI: 10.1371/journal.pone.0089162] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2013] [Accepted: 01/19/2014] [Indexed: 11/18/2022] Open
Abstract
The volume and diversity of biological data are increasing at very high rates. Vast amounts of protein sequences and structures, protein and genetic interactions and phenotype studies have been produced. The majority of data generated by high-throughput devices is automatically annotated because manually annotating them is not possible. Thus, efficient and precise automatic annotation methods are required to ensure the quality and reliability of both the biological data and associated annotations. We proposed ENZYMatic Annotation Predictor (ENZYMAP), a technique to characterize and predict EC number changes based on annotations from UniProt/Swiss-Prot using a supervised learning approach. We evaluated ENZYMAP experimentally, using test data sets from both UniProt/Swiss-Prot and UniProt/TrEMBL, and showed that predicting EC changes using selected types of annotation is possible. Finally, we compared ENZYMAP and DETECT with respect to their predictions and checked both against the UniProt/Swiss-Prot annotations. ENZYMAP was shown to be more accurate than DETECT, coming closer to the actual changes in UniProt/Swiss-Prot. Our proposal is intended to be an automatic complementary method (that can be used together with other techniques like the ones based on protein sequence and structure) that helps to improve the quality and reliability of enzyme annotations over time, suggesting possible corrections, anticipating annotation changes and propagating the implicit knowledge for the whole dataset.
Collapse
Affiliation(s)
- Sabrina de Azevedo Silveira
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
- Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
- * E-mail: (SAS); (WM)
| | | | | | - Marcelo Matos Santoro
- Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Wagner Meira Jr
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
- * E-mail: (SAS); (WM)
| |
Collapse
|
30
|
Akiva E, Brown S, Almonacid DE, Barber AE, Custer AF, Hicks MA, Huang CC, Lauck F, Mashiyama ST, Meng EC, Mischel D, Morris JH, Ojha S, Schnoes AM, Stryke D, Yunes JM, Ferrin TE, Holliday GL, Babbitt PC. The Structure-Function Linkage Database. Nucleic Acids Res 2013; 42:D521-30. [PMID: 24271399 PMCID: PMC3965090 DOI: 10.1093/nar/gkt1130] [Citation(s) in RCA: 180] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
The Structure–Function Linkage Database (SFLD, http://sfld.rbvi.ucsf.edu/) is a manually curated classification resource describing structure–function relationships for functionally diverse enzyme superfamilies. Members of such superfamilies are diverse in their overall reactions yet share a common ancestor and some conserved active site features associated with conserved functional attributes such as a partial reaction. Thus, despite their different functions, members of these superfamilies ‘look alike’, making them easy to misannotate. To address this complexity and enable rational transfer of functional features to unknowns only for those members for which we have sufficient functional information, we subdivide superfamily members into subgroups using sequence information, and lastly into families, sets of enzymes known to catalyze the same reaction using the same mechanistic strategy. Browsing and searching options in the SFLD provide access to all of these levels. The SFLD offers manually curated as well as automatically classified superfamily sets, both accompanied by search and download options for all hierarchical levels. Additional information includes multiple sequence alignments, tab-separated files of functional and other attributes, and sequence similarity networks. The latter provide a new and intuitively powerful way to visualize functional trends mapped to the context of sequence similarity.
Collapse
Affiliation(s)
- Eyal Akiva
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94158, USA, Universidad Andres Bello, Center for Bioinformatics and Integrative Biology, Facultad de Ciencias Biologicas, Santiago 8370146, Chile, Nodality, Inc., South San Francisco, CA 94080, USA, Department of Electrical and Computer Engineering, College of Engineering, Boston University, Boston, MA 02215, USA, Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA, Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, San Francisco, CA 94158, USA, Center for Bioinformatics (ZBH), University of Hamburg, Hamburg 20146, Germany, Department of Chemistry and Biochemistry, Montana State University, Bozeman, MT 59717, USA, School of Medicine, University of California, San Francisco, San Francisco, CA 94143, USA, UC Berkeley - UCSF Graduate Program in Bioengineering, University of California, San Francisco, CA 94158 and Berkeley, CA 94720, USA and California Institute for Quantitative Biosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Protein structure based prediction of catalytic residues. BMC Bioinformatics 2013; 14:63. [PMID: 23433045 PMCID: PMC3598644 DOI: 10.1186/1471-2105-14-63] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2012] [Accepted: 02/17/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Worldwide structural genomics projects continue to release new protein structures at an unprecedented pace, so far nearly 6000, but only about 60% of these proteins have any sort of functional annotation. RESULTS We explored a range of features that can be used for the prediction of functional residues given a known three-dimensional structure. These features include various centrality measures of nodes in graphs of interacting residues: closeness, betweenness and page-rank centrality. We also analyzed the distance of functional amino acids to the general center of mass (GCM) of the structure, relative solvent accessibility (RSA), and the use of relative entropy as a measure of sequence conservation. From the selected features, neural networks were trained to identify catalytic residues. We found that using distance to the GCM together with amino acid type provide a good discriminant function, when combined independently with sequence conservation. Using an independent test set of 29 annotated protein structures, the method returned 411 of the initial 9262 residues as the most likely to be involved in function. The output 411 residues contain 70 of the annotated 111 catalytic residues. This represents an approximately 14-fold enrichment of catalytic residues on the entire input set (corresponding to a sensitivity of 63% and a precision of 17%), a performance competitive with that of other state-of-the-art methods. CONCLUSIONS We found that several of the graph based measures utilize the same underlying feature of protein structures, which can be simply and more effectively captured with the distance to GCM definition. This also has the added the advantage of simplicity and easy implementation. Meanwhile sequence conservation remains by far the most influential feature in identifying functional residues. We also found that due the rapid changes in size and composition of sequence databases, conservation calculations must be recalibrated for specific reference databases.
Collapse
|
32
|
Residue mutations and their impact on protein structure and function: detecting beneficial and pathogenic changes. Biochem J 2013; 449:581-94. [DOI: 10.1042/bj20121221] [Citation(s) in RCA: 131] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
The present review focuses on the evolution of proteins and the impact of amino acid mutations on function from a structural perspective. Proteins evolve under the law of natural selection and undergo alternating periods of conservative evolution and of relatively rapid change. The likelihood of mutations being fixed in the genome depends on various factors, such as the fitness of the phenotype or the position of the residues in the three-dimensional structure. For example, co-evolution of residues located close together in three-dimensional space can occur to preserve global stability. Whereas point mutations can fine-tune the protein function, residue insertions and deletions (‘decorations’ at the structural level) can sometimes modify functional sites and protein interactions more dramatically. We discuss recent developments and tools to identify such episodic mutations, and examine their applications in medical research. Such tools have been tested on simulated data and applied to real data such as viruses or animal sequences. Traditionally, there has been little if any cross-talk between the fields of protein biophysics, protein structure–function and molecular evolution. However, the last several years have seen some exciting developments in combining these approaches to obtain an in-depth understanding of how proteins evolve. For example, a better understanding of how structural constraints affect protein evolution will greatly help us to optimize our models of sequence evolution. The present review explores this new synthesis of perspectives.
Collapse
|
33
|
Sillitoe I, Cuff AL, Dessailly BH, Dawson NL, Furnham N, Lee D, Lees JG, Lewis TE, Studer RA, Rentzsch R, Yeats C, Thornton JM, Orengo CA. New functional families (FunFams) in CATH to improve the mapping of conserved functional sites to 3D structures. Nucleic Acids Res 2012. [PMID: 23203873 PMCID: PMC3531114 DOI: 10.1093/nar/gks1211] [Citation(s) in RCA: 175] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
CATH version 3.5 (Class, Architecture, Topology, Homology, available at http://www.cathdb.info/) contains 173 536 domains, 2626 homologous superfamilies and 1313 fold groups. When focusing on structural genomics (SG) structures, we observe that the number of new folds for CATH v3.5 is slightly less than for previous releases, and this observation suggests that we may now know the majority of folds that are easily accessible to structure determination. We have improved the accuracy of our functional family (FunFams) sub-classification method and the CATH sequence domain search facility has been extended to provide FunFam annotations for each domain. The CATH website has been redesigned. We have improved the display of functional data and of conserved sequence features associated with FunFams within each CATH superfamily.
Collapse
Affiliation(s)
- Ian Sillitoe
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London WC1E 6BT, UK
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
34
|
Furnham N, Laskowski RA, Thornton JM. Abstracting knowledge from the protein data bank. Biopolymers 2012; 99:183-8. [DOI: 10.1002/bip.22107] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2012] [Accepted: 05/25/2012] [Indexed: 12/27/2022]
|
35
|
Current challenges in genome annotation through structural biology and bioinformatics. Curr Opin Struct Biol 2012; 22:594-601. [PMID: 22884875 DOI: 10.1016/j.sbi.2012.07.005] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2012] [Revised: 06/29/2012] [Accepted: 07/09/2012] [Indexed: 01/25/2023]
Abstract
With the huge volume in genomic sequences being generated from high-throughout sequencing projects the requirement for providing accurate and detailed annotations of gene products has never been greater. It is proving to be a huge challenge for computational biologists to use as much information as possible from experimental data to provide annotations for genome data of unknown function. A central component to this process is to use experimentally determined structures, which provide a means to detect homology that is not discernable from just the sequence and permit the consequences of genomic variation to be realized at the molecular level. In particular, structures also form the basis of many bioinformatics methods for improving the detailed functional annotations of enzymes in combination with similarities in sequence and chemistry.
Collapse
|
36
|
Furnham N, Sillitoe I, Holliday GL, Cuff AL, Laskowski RA, Orengo CA, Thornton JM. Exploring the evolution of novel enzyme functions within structurally defined protein superfamilies. PLoS Comput Biol 2012; 8:e1002403. [PMID: 22396634 PMCID: PMC3291543 DOI: 10.1371/journal.pcbi.1002403] [Citation(s) in RCA: 69] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2011] [Accepted: 01/09/2012] [Indexed: 11/18/2022] Open
Abstract
In order to understand the evolution of enzyme reactions and to gain an overview of biological catalysis we have combined sequence and structural data to generate phylogenetic trees in an analysis of 276 structurally defined enzyme superfamilies, and used these to study how enzyme functions have evolved. We describe in detail the analysis of two superfamilies to illustrate different paradigms of enzyme evolution. Gathering together data from all the superfamilies supports and develops the observation that they have all evolved to act on a diverse set of substrates, whilst the evolution of new chemistry is much less common. Despite that, by bringing together so much data, we can provide a comprehensive overview of the most common and rare types of changes in function. Our analysis demonstrates on a larger scale than previously studied, that modifications in overall chemistry still occur, with all possible changes at the primary level of the Enzyme Commission (E.C.) classification observed to a greater or lesser extent. The phylogenetic trees map out the evolutionary route taken within a superfamily, as well as all the possible changes within a superfamily. This has been used to generate a matrix of observed exchanges from one enzyme function to another, revealing the scale and nature of enzyme evolution and that some types of exchanges between and within E.C. classes are more prevalent than others. Surprisingly a large proportion (71%) of all known enzyme functions are performed by this relatively small set of 276 superfamilies. This reinforces the hypothesis that relatively few ancient enzymatic domain superfamilies were progenitors for most of the chemistry required for life. Enzymes, as biological catalysts, are crucial to life. Understanding how enzymes have evolved to perform the wide variety of reactions found across all kingdoms of life is fundamental to a broad range of biological studies, especially those leading to new therapeutics. To unravel the evolution of novel enzyme function requires combining information on protein structure, sequence, phylogeny and chemistry (in terms of interacting small molecules and reaction mechanisms). We have developed a protocol for integrating this wide range of data, which we have applied to a relatively large number of families comprising some very diverse relatives. This has permitted us to present an initial overview of the evolution of novel enzyme functions, in which we observe that some changes in function between relatives are more common than others, with most of the functionality observed in nature confined to relatively few families. Moreover, we are able to identify the evolutionary route taken within a superfamily to change the enzyme function from one reaction to another. This information may help in predicting the function of an enzyme that has yet to be experimentally characterised as well as in designing new enzymes for industrial and medical purposes.
Collapse
Affiliation(s)
- Nicholas Furnham
- EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.
| | | | | | | | | | | | | |
Collapse
|
37
|
|