1
|
Ezerzer Y, Frenkel-Pinter M, Kolodny R, Ben-Tal N. A building blocks perspective on protein emergence and evolution. Curr Opin Struct Biol 2025; 91:102996. [PMID: 39919321 DOI: 10.1016/j.sbi.2025.102996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2024] [Revised: 01/04/2025] [Accepted: 01/15/2025] [Indexed: 02/09/2025]
Abstract
Recent findings increasingly suggest the emergence of proteins by mix and match of short peptides, or 'building blocks'. What are these building blocks, and how did they evolve into contemporary proteins? We review two complementary approaches to tackling these questions. First, a bottom-up approach that involves identifying putative components of primordial peptides, and the synthetic routes through which these peptides may have emerged. Second, searches in protein space to reveal building blocks that make up the contemporary protein repertoire; proteins that are not closely related to one another may nevertheless have certain parts in common, suggesting common ancestry. Identifying such shared building blocks, and characterizing their functions, can shed light on the ancient molecules from which proteins emerged, and hint at the mechanisms that govern their evolution. A key challenge lies in merging these two approaches to create a cohesive narrative of how proteins emerged and continue to evolve.
Collapse
Affiliation(s)
- Yishi Ezerzer
- Institute of Chemistry, The Hebrew University of Jerusalem, 9190401, Israel
| | - Moran Frenkel-Pinter
- Institute of Chemistry, The Hebrew University of Jerusalem, 9190401, Israel; The Center for Nanoscience and Nanotechnology, The Hebrew University of Jerusalem, 9190401, Israel.
| | - Rachel Kolodny
- Department of Computer Science, University of Haifa, Haifa, Israel.
| | - Nir Ben-Tal
- School of Neurobiology, Biochemistry and Biophysics, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel.
| |
Collapse
|
2
|
Tanoz I, Timsit Y. Protein Fold Usages in Ribosomes: Another Glance to the Past. Int J Mol Sci 2024; 25:8806. [PMID: 39201491 PMCID: PMC11354259 DOI: 10.3390/ijms25168806] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Revised: 08/07/2024] [Accepted: 08/08/2024] [Indexed: 09/02/2024] Open
Abstract
The analysis of protein fold usage, similar to codon usage, offers profound insights into the evolution of biological systems and the origins of modern proteomes. While previous studies have examined fold distribution in modern genomes, our study focuses on the comparative distribution and usage of protein folds in ribosomes across bacteria, archaea, and eukaryotes. We identify the prevalence of certain 'super-ribosome folds,' such as the OB fold in bacteria and the SH3 domain in archaea and eukaryotes. The observed protein fold distribution in the ribosomes announces the future power-law distribution where only a few folds are highly prevalent, and most are rare. Additionally, we highlight the presence of three copies of proto-Rossmann folds in ribosomes across all kingdoms, showing its ancient and fundamental role in ribosomal structure and function. Our study also explores early mechanisms of molecular convergence, where different protein folds bind equivalent ribosomal RNA structures in ribosomes across different kingdoms. This comparative analysis enhances our understanding of ribosomal evolution, particularly the distinct evolutionary paths of the large and small subunits, and underscores the complex interplay between RNA and protein components in the transition from the RNA world to modern cellular life. Transcending the concept of folds also makes it possible to group a large number of ribosomal proteins into five categories of urfolds or metafolds, which could attest to their ancestral character and common origins. This work also demonstrates that the gradual acquisition of extensions by simple but ordered folds constitutes an inexorable evolutionary mechanism. This observation supports the idea that simple but structured ribosomal proteins preceded the development of their disordered extensions.
Collapse
Affiliation(s)
- Inzhu Tanoz
- Aix-Marseille Université, Université de Toulon, IRD, CNRS, Mediterranean Institute of Oceanography (MIO), UM 110, 13288 Marseille, France;
| | - Youri Timsit
- Aix-Marseille Université, Université de Toulon, IRD, CNRS, Mediterranean Institute of Oceanography (MIO), UM 110, 13288 Marseille, France;
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 3 Rue Michel-Ange, 75016 Paris, France
| |
Collapse
|
3
|
Middendorf L, Ravi Iyengar B, Eicholt LA. Sequence, Structure, and Functional Space of Drosophila De Novo Proteins. Genome Biol Evol 2024; 16:evae176. [PMID: 39212966 PMCID: PMC11363682 DOI: 10.1093/gbe/evae176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/29/2024] [Indexed: 09/04/2024] Open
Abstract
During de novo emergence, new protein coding genes emerge from previously nongenic sequences. The de novo proteins they encode are dissimilar in composition and predicted biochemical properties to conserved proteins. However, functional de novo proteins indeed exist. Both identification of functional de novo proteins and their structural characterization are experimentally laborious. To identify functional and structured de novo proteins in silico, we applied recently developed machine learning based tools and found that most de novo proteins are indeed different from conserved proteins both in their structure and sequence. However, some de novo proteins are predicted to adopt known protein folds, participate in cellular reactions, and to form biomolecular condensates. Apart from broadening our understanding of de novo protein evolution, our study also provides a large set of testable hypotheses for focused experimental studies on structure and function of de novo proteins in Drosophila.
Collapse
Affiliation(s)
- Lasse Middendorf
- Institute for Evolution and Biodiversity, University of Muenster, Huefferstrasse 1, 48149 Muenster, Germany
| | - Bharat Ravi Iyengar
- Institute for Evolution and Biodiversity, University of Muenster, Huefferstrasse 1, 48149 Muenster, Germany
| | - Lars A Eicholt
- Institute for Evolution and Biodiversity, University of Muenster, Huefferstrasse 1, 48149 Muenster, Germany
| |
Collapse
|
4
|
Toledo-Patiño S, Goetz SK, Shanmugaratnam S, Höcker B, Farías-Rico JA. Molecular handcraft of a well-folded protein chimera. FEBS Lett 2024; 598:1375-1386. [PMID: 38508768 DOI: 10.1002/1873-3468.14856] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 02/11/2024] [Accepted: 02/12/2024] [Indexed: 03/22/2024]
Abstract
Modular assembly is a compelling pathway to create new proteins, a concept supported by protein engineering and millennia of evolution. Natural evolution provided a repository of building blocks, known as domains, which trace back to even shorter segments that underwent numerous 'copy-paste' processes culminating in the scaffolds we see today. Utilizing the subdomain-database Fuzzle, we constructed a fold-chimera by integrating a flavodoxin-like fragment into a periplasmic binding protein. This chimera is well-folded and a crystal structure reveals stable interfaces between the fragments. These findings demonstrate the adaptability of α/β-proteins and offer a stepping stone for optimization. By emphasizing the practicality of fragment databases, our work pioneers new pathways in protein engineering. Ultimately, the results substantiate the conjecture that periplasmic binding proteins originated from a flavodoxin-like ancestor.
Collapse
Affiliation(s)
- Saacnicteh Toledo-Patiño
- Max Planck Institute for Developmental Biology, Tübingen, Germany
- Okinawa Institute of Science and Technology Graduate University, Japan
| | | | - Sooruban Shanmugaratnam
- Max Planck Institute for Developmental Biology, Tübingen, Germany
- Department of Biochemistry, University of Bayreuth, Germany
| | - Birte Höcker
- Max Planck Institute for Developmental Biology, Tübingen, Germany
- Department of Biochemistry, University of Bayreuth, Germany
| | - José Arcadio Farías-Rico
- Max Planck Institute for Developmental Biology, Tübingen, Germany
- Synthetic Biology Program, Center for Genome Sciences, National Autonomous University of Mexico, Cuernavaca, Mexico
| |
Collapse
|
5
|
Ye W, Krishna Behra PR, Dyrhage K, Seeger C, Joiner JD, Karlsson E, Andersson E, Chi CN, Andersson SGE, Jemth P. Folded Alpha Helical Putative New Proteins from Apilactobacillus kunkeei. J Mol Biol 2024; 436:168490. [PMID: 38355092 DOI: 10.1016/j.jmb.2024.168490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 02/07/2024] [Accepted: 02/08/2024] [Indexed: 02/16/2024]
Abstract
The emergence of new proteins is a central question in biology. Most tertiary protein folds known to date appear to have an ancient origin, but it is clear from bioinformatic analyses that new proteins continuously emerge in all organismal groups. However, there is a paucity of experimental data on new proteins regarding their structure and biophysical properties. We performed a detailed phylogenetic analysis and identified 48 putative open reading frames in the honeybee-associated bacterium Apilactobacillus kunkeei for which no or few homologs could be identified in closely-related species, suggesting that they could be relatively new on an evolutionary time scale and represent recently evolved proteins. Using circular dichroism-, fluorescence- and nuclear magnetic resonance (NMR) spectroscopy we investigated six of these proteins and show that they are not intrinsically disordered, but populate alpha-helical dominated folded states with relatively low thermodynamic stability (0-3 kcal/mol). The NMR and biophysical data demonstrate that small new proteins readily adopt simple folded conformations suggesting that more complex tertiary structures can be continuously re-invented during evolution by fusion of such simple secondary structure elements. These findings have implications for the general view on protein evolution, where de novo emergence of folded proteins may be a common event.
Collapse
Affiliation(s)
- Weihua Ye
- Department of Medical Biochemistry and Microbiology, Uppsala University, BMC Box 582, 75123 Uppsala, Sweden
| | - Phani Rama Krishna Behra
- Department of Molecular Evolution, Cell and Molecular Biology, Biomedical Centre, Science for Life Laboratory, Uppsala University, 75236 Uppsala, Sweden
| | - Karl Dyrhage
- Department of Molecular Evolution, Cell and Molecular Biology, Biomedical Centre, Science for Life Laboratory, Uppsala University, 75236 Uppsala, Sweden
| | - Christian Seeger
- Department of Molecular Evolution, Cell and Molecular Biology, Biomedical Centre, Science for Life Laboratory, Uppsala University, 75236 Uppsala, Sweden
| | - Joe D Joiner
- Department of Medical Biochemistry and Microbiology, Uppsala University, BMC Box 582, 75123 Uppsala, Sweden
| | - Elin Karlsson
- Department of Medical Biochemistry and Microbiology, Uppsala University, BMC Box 582, 75123 Uppsala, Sweden
| | - Eva Andersson
- Department of Medical Biochemistry and Microbiology, Uppsala University, BMC Box 582, 75123 Uppsala, Sweden
| | - Celestine N Chi
- Department of Medical Biochemistry and Microbiology, Uppsala University, BMC Box 582, 75123 Uppsala, Sweden.
| | - Siv G E Andersson
- Department of Molecular Evolution, Cell and Molecular Biology, Biomedical Centre, Science for Life Laboratory, Uppsala University, 75236 Uppsala, Sweden.
| | - Per Jemth
- Department of Medical Biochemistry and Microbiology, Uppsala University, BMC Box 582, 75123 Uppsala, Sweden.
| |
Collapse
|
6
|
McGuinness KN, Fehon N, Feehan R, Miller M, Mutter AC, Rybak LA, Nam J, AbuSalim JE, Atkinson JT, Heidari H, Losada N, Kim JD, Koder RL, Lu Y, Silberg JJ, Slusky JSG, Falkowski PG, Nanda V. The energetics and evolution of oxidoreductases in deep time. Proteins 2024; 92:52-59. [PMID: 37596815 DOI: 10.1002/prot.26563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 07/06/2023] [Indexed: 08/20/2023]
Abstract
The core metabolic reactions of life drive electrons through a class of redox protein enzymes, the oxidoreductases. The energetics of electron flow is determined by the redox potentials of organic and inorganic cofactors as tuned by the protein environment. Understanding how protein structure affects oxidation-reduction energetics is crucial for studying metabolism, creating bioelectronic systems, and tracing the history of biological energy utilization on Earth. We constructed ProtReDox (https://protein-redox-potential.web.app), a manually curated database of experimentally determined redox potentials. With over 500 measurements, we can begin to identify how proteins modulate oxidation-reduction energetics across the tree of life. By mapping redox potentials onto networks of oxidoreductase fold evolution, we can infer the evolution of electron transfer energetics over deep time. ProtReDox is designed to include user-contributed submissions with the intention of making it a valuable resource for researchers in this field.
Collapse
Affiliation(s)
- Kenneth N McGuinness
- Department of Natural Sciences, Caldwell University, Caldwell, New Jersey, USA
- Center for Advanced Biotechnology and Medicine, Rutgers University, Piscataway, New Jersey, USA
| | - Nolan Fehon
- Environmental Biophysics and Molecular Ecology Program, Department of Marine and Coastal Sciences, Rutgers University, New Brunswick, New Jersey, USA
| | - Ryan Feehan
- Computational Biology Program, The University of Kansas, Lawrence, Kansas, USA
| | - Michelle Miller
- Environmental Biophysics and Molecular Ecology Program, Department of Marine and Coastal Sciences, Rutgers University, New Brunswick, New Jersey, USA
| | - Andrew C Mutter
- Department of Physics, The City College of New York, New York, New York, USA
| | - Laryssa A Rybak
- Department of Physics, The City College of New York, New York, New York, USA
| | - Justin Nam
- Center for Advanced Biotechnology and Medicine, Rutgers University, Piscataway, New Jersey, USA
| | - Jenna E AbuSalim
- Center for Advanced Biotechnology and Medicine, Rutgers University, Piscataway, New Jersey, USA
| | - Joshua T Atkinson
- Department of Chemical and Biomolecular Engineering, Rice University, Houston, Texas, USA
| | - Hirbod Heidari
- Department of Chemistry, University of Texas at Austin, Austin, Texas, USA
| | - Natalie Losada
- Center for Advanced Biotechnology and Medicine, Rutgers University, Piscataway, New Jersey, USA
| | - J Dongun Kim
- Environmental Biophysics and Molecular Ecology Program, Department of Marine and Coastal Sciences, Rutgers University, New Brunswick, New Jersey, USA
| | - Ronald L Koder
- Department of Physics, The City College of New York, New York, New York, USA
| | - Yi Lu
- Department of Chemistry, University of Texas at Austin, Austin, Texas, USA
| | - Jonathan J Silberg
- Department of Chemical and Biomolecular Engineering, Rice University, Houston, Texas, USA
| | - Joanna S G Slusky
- Computational Biology Program, The University of Kansas, Lawrence, Kansas, USA
- Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas, USA
| | - Paul G Falkowski
- Environmental Biophysics and Molecular Ecology Program, Department of Marine and Coastal Sciences, Rutgers University, New Brunswick, New Jersey, USA
- Department of Earth and Planetary Sciences, Rutgers University, New Brunswick, New Jersey, USA
| | - Vikas Nanda
- Center for Advanced Biotechnology and Medicine, Rutgers University, Piscataway, New Jersey, USA
- Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, Rutgers University, Piscataway, New Jersey, USA
| |
Collapse
|
7
|
Durairaj J, Waterhouse AM, Mets T, Brodiazhenko T, Abdullah M, Studer G, Tauriello G, Akdel M, Andreeva A, Bateman A, Tenson T, Hauryliuk V, Schwede T, Pereira J. Uncovering new families and folds in the natural protein universe. Nature 2023; 622:646-653. [PMID: 37704037 PMCID: PMC10584680 DOI: 10.1038/s41586-023-06622-3] [Citation(s) in RCA: 49] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 09/07/2023] [Indexed: 09/15/2023]
Abstract
We are now entering a new era in protein sequence and structure annotation, with hundreds of millions of predicted protein structures made available through the AlphaFold database1. These models cover nearly all proteins that are known, including those challenging to annotate for function or putative biological role using standard homology-based approaches. In this study, we examine the extent to which the AlphaFold database has structurally illuminated this 'dark matter' of the natural protein universe at high predicted accuracy. We further describe the protein diversity that these models cover as an annotated interactive sequence similarity network, accessible at https://uniprot3d.org/atlas/AFDB90v4 . By searching for novelties from sequence, structure and semantic perspectives, we uncovered the β-flower fold, added several protein families to Pfam database2 and experimentally demonstrated that one of these belongs to a new superfamily of translation-targeting toxin-antitoxin systems, TumE-TumA. This work underscores the value of large-scale efforts in identifying, annotating and prioritizing new protein families. By leveraging the recent deep learning revolution in protein bioinformatics, we can now shed light into uncharted areas of the protein universe at an unprecedented scale, paving the way to innovations in life sciences and biotechnology.
Collapse
Affiliation(s)
- Janani Durairaj
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Basel, Basel, Switzerland
| | - Andrew M Waterhouse
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Basel, Basel, Switzerland
| | - Toomas Mets
- Institute of Technology, University of Tartu, Tartu, Estonia
- Department of Experimental Medical Science, Lund University, Lund, Sweden
| | | | - Minhal Abdullah
- Institute of Technology, University of Tartu, Tartu, Estonia
- Department of Experimental Medical Science, Lund University, Lund, Sweden
| | - Gabriel Studer
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Basel, Basel, Switzerland
| | - Gerardo Tauriello
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Basel, Basel, Switzerland
| | | | - Antonina Andreeva
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Tanel Tenson
- Institute of Technology, University of Tartu, Tartu, Estonia
| | - Vasili Hauryliuk
- Institute of Technology, University of Tartu, Tartu, Estonia
- Department of Experimental Medical Science, Lund University, Lund, Sweden
- Science for Life Laboratory, Lund, Sweden
- Virus Centre, Lund University, Lund, Sweden
| | - Torsten Schwede
- Biozentrum, University of Basel, Basel, Switzerland.
- SIB Swiss Institute of Bioinformatics, University of Basel, Basel, Switzerland.
| | - Joana Pereira
- Biozentrum, University of Basel, Basel, Switzerland.
- SIB Swiss Institute of Bioinformatics, University of Basel, Basel, Switzerland.
| |
Collapse
|
8
|
Barrera-Redondo J, Lotharukpong JS, Drost HG, Coelho SM. Uncovering gene-family founder events during major evolutionary transitions in animals, plants and fungi using GenEra. Genome Biol 2023; 24:54. [PMID: 36964572 PMCID: PMC10037820 DOI: 10.1186/s13059-023-02895-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Accepted: 03/10/2023] [Indexed: 03/26/2023] Open
Abstract
We present GenEra ( https://github.com/josuebarrera/GenEra ), a DIAMOND-fueled gene-family founder inference framework that addresses previously raised limitations and biases in genomic phylostratigraphy, such as homology detection failure. GenEra also reduces computational time from several months to a few days for any genome of interest. We analyze the emergence of taxonomically restricted gene families during major evolutionary transitions in plants, animals, and fungi. Our results indicate that the impact of homology detection failure on inferred patterns of gene emergence is lineage-dependent, suggesting that plants are more prone to evolve novelty through the emergence of new genes compared to animals and fungi.
Collapse
Affiliation(s)
- Josué Barrera-Redondo
- Department of Algal Development and Evolution, Max Planck Institute for Biology, Max-Planck-Ring 5, 72076, Tübingen, Germany.
| | - Jaruwatana Sodai Lotharukpong
- Department of Algal Development and Evolution, Max Planck Institute for Biology, Max-Planck-Ring 5, 72076, Tübingen, Germany
| | - Hajk-Georg Drost
- Computational Biology Group, Department of Molecular Biology, Max Planck Institute for Biology, Max-Planck-Ring 5, 72076, Tübingen, Germany.
| | - Susana M Coelho
- Department of Algal Development and Evolution, Max Planck Institute for Biology, Max-Planck-Ring 5, 72076, Tübingen, Germany.
| |
Collapse
|
9
|
Benton R, Himmel NJ. Structural screens identify candidate human homologs of insect chemoreceptors and cryptic Drosophila gustatory receptor-like proteins. eLife 2023; 12:85537. [PMID: 36803935 PMCID: PMC9998090 DOI: 10.7554/elife.85537] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 02/16/2023] [Indexed: 02/22/2023] Open
Abstract
Insect odorant receptors and gustatory receptors define a superfamily of seven transmembrane domain ion channels (referred to here as 7TMICs), with homologs identified across Animalia except Chordata. Previously, we used sequence-based screening methods to reveal conservation of this family in unicellular eukaryotes and plants (DUF3537 proteins) (Benton et al., 2020). Here, we combine three-dimensional structure-based screening, ab initio protein folding predictions, phylogenetics, and expression analyses to characterize additional candidate homologs with tertiary but little or no primary structural similarity to known 7TMICs, including proteins in disease-causing Trypanosoma. Unexpectedly, we identify structural similarity between 7TMICs and PHTF proteins, a deeply conserved family of unknown function, whose human orthologs display enriched expression in testis, cerebellum, and muscle. We also discover divergent groups of 7TMICs in insects, which we term the gustatory receptor-like (Grl) proteins. Several Drosophila melanogaster Grls display selective expression in subsets of taste neurons, suggesting that they are previously unrecognized insect chemoreceptors. Although we cannot exclude the possibility of remarkable structural convergence, our findings support the origin of 7TMICs in a eukaryotic common ancestor, counter previous assumptions of complete loss of 7TMICs in Chordata, and highlight the extreme evolvability of this protein fold, which likely underlies its functional diversification in different cellular contexts.
Collapse
Affiliation(s)
- Richard Benton
- Center for Integrative Genomics, Faculty of Biology and Medicine, University of LausanneLausanneSwitzerland
| | - Nathaniel J Himmel
- Center for Integrative Genomics, Faculty of Biology and Medicine, University of LausanneLausanneSwitzerland
| |
Collapse
|
10
|
Qiu K, Ben‐Tal N, Kolodny R. Similar protein segments shared between domains of different evolutionary lineages. Protein Sci 2022; 31:e4407. [PMID: 36040261 PMCID: PMC9387206 DOI: 10.1002/pro.4407] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 07/01/2022] [Accepted: 07/25/2022] [Indexed: 11/21/2022]
Abstract
The emergence of novel proteins, beyond these that can be readily made by duplication and recombination of preexisting domains, is elusive. De novo emergence from random sequences is unlikely because the vast majority of random chains would not even fold, let alone function. An alternative explanation is that novel proteins emerge by duplication and fusion of pre-existing polypeptide segments. In this case, traces of such ancient events may remain within contemporary proteins in the form of reused segments. Together with the late Dan Tawfik, we detected such similar segments, far shorter than intact protein domains, which are found in different environments. The detection of these, "bridging themes," was based on a unique search strategy, where in addition to searching for similarity of shared fragments, so-called "themes," we also explicitly searched for cases in which the sequence segments before and after the theme are dissimilar (both in sequence and structure). Here, using a similar strategy, we further expanded the search and discovered almost 500 additional "bridging themes," linking domains that are often from ancient folds. The themes, of 20 residues or more (average 53), do not retain their structure despite sharing 37% sequence identity on average. Indeed, conformation flexibility may confer an evolutionary advantage, in that it fits in multiple environments. We elaborate on two interesting themes, shared between Rossmann/Trefoil-Plexin-like domains and a β-propeller-like domain. FOR A BROAD AUDIENCE: A fundamental question in molecular evolution is how protein domains emerged. Similar segments shared between domains of seemingly distinct origins, may offer clues, as these may be remnants of the evolutionary process through which these domains emerged. However, finding such cases is difficult. Here, we expand the set of such cases which we curated previously, adding segments shared between domains that are considered ancient.
Collapse
Affiliation(s)
- Kaiyu Qiu
- Department of Biochemistry and Molecular Biology, George S. Wise Faculty of Life SciencesTel Aviv UniversityTel AvivIsrael
| | - Nir Ben‐Tal
- Department of Biochemistry and Molecular Biology, George S. Wise Faculty of Life SciencesTel Aviv UniversityTel AvivIsrael
| | - Rachel Kolodny
- Department of Computer ScienceUniversity of HaifaHaifaIsrael
| |
Collapse
|
11
|
Ferruz N, Schmidt S, Höcker B. ProtGPT2 is a deep unsupervised language model for protein design. Nat Commun 2022; 13:4348. [PMID: 35896542 PMCID: PMC9329459 DOI: 10.1038/s41467-022-32007-7] [Citation(s) in RCA: 233] [Impact Index Per Article: 77.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 07/13/2022] [Indexed: 11/29/2022] Open
Abstract
Protein design aims to build novel proteins customized for specific purposes, thereby holding the potential to tackle many environmental and biomedical problems. Recent progress in Transformer-based architectures has enabled the implementation of language models capable of generating text with human-like capabilities. Here, motivated by this success, we describe ProtGPT2, a language model trained on the protein space that generates de novo protein sequences following the principles of natural ones. The generated proteins display natural amino acid propensities, while disorder predictions indicate that 88% of ProtGPT2-generated proteins are globular, in line with natural sequences. Sensitive sequence searches in protein databases show that ProtGPT2 sequences are distantly related to natural ones, and similarity networks further demonstrate that ProtGPT2 is sampling unexplored regions of protein space. AlphaFold prediction of ProtGPT2-sequences yields well-folded non-idealized structures with embodiments and large loops and reveals topologies not captured in current structure databases. ProtGPT2 generates sequences in a matter of seconds and is freely available.
Collapse
Affiliation(s)
- Noelia Ferruz
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany.
- Institute of Informatics and Applications, University of Girona, Girona, Spain.
| | - Steffen Schmidt
- Computational Biochemistry, University of Bayreuth, 95447, Bayreuth, Germany
| | - Birte Höcker
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany
| |
Collapse
|
12
|
Jayaraman V, Toledo‐Patiño S, Noda‐García L, Laurino P. Mechanisms of protein evolution. Protein Sci 2022; 31:e4362. [PMID: 35762715 PMCID: PMC9214755 DOI: 10.1002/pro.4362] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 05/11/2022] [Accepted: 05/14/2022] [Indexed: 11/06/2022]
Abstract
How do proteins evolve? How do changes in sequence mediate changes in protein structure, and in turn in function? This question has multiple angles, ranging from biochemistry and biophysics to evolutionary biology. This review provides a brief integrated view of some key mechanistic aspects of protein evolution. First, we explain how protein evolution is primarily driven by randomly acquired genetic mutations and selection for function, and how these mutations can even give rise to completely new folds. Then, we also comment on how phenotypic protein variability, including promiscuity, transcriptional and translational errors, may also accelerate this process, possibly via "plasticity-first" mechanisms. Finally, we highlight open questions in the field of protein evolution, with respect to the emergence of more sophisticated protein systems such as protein complexes, pathways, and the emergence of pre-LUCA enzymes.
Collapse
Affiliation(s)
- Vijay Jayaraman
- Department of Molecular Cell BiologyWeizmann Institute of ScienceRehovotIsrael
| | - Saacnicteh Toledo‐Patiño
- Protein Engineering and Evolution UnitOkinawa Institute of Science and Technology Graduate UniversityOkinawaJapan
| | - Lianet Noda‐García
- Department of Plant Pathology and Microbiology, Institute of Environmental Sciences, Robert H. Smith Faculty of Agriculture, Food and EnvironmentHebrew University of JerusalemRehovotIsrael
| | - Paola Laurino
- Protein Engineering and Evolution UnitOkinawa Institute of Science and Technology Graduate UniversityOkinawaJapan
| |
Collapse
|
13
|
Longo LM, Kolodny R, McGlynn SE. Evidence for the emergence of β-trefoils by 'Peptide Budding' from an IgG-like β-sandwich. PLoS Comput Biol 2022; 18:e1009833. [PMID: 35157697 PMCID: PMC8880906 DOI: 10.1371/journal.pcbi.1009833] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Revised: 02/25/2022] [Accepted: 01/13/2022] [Indexed: 12/02/2022] Open
Abstract
As sequence and structure comparison algorithms gain sensitivity, the intrinsic interconnectedness of the protein universe has become increasingly apparent. Despite this general trend, β-trefoils have emerged as an uncommon counterexample: They are an isolated protein lineage for which few, if any, sequence or structure associations to other lineages have been identified. If β-trefoils are, in fact, remote islands in sequence-structure space, it implies that the oligomerizing peptide that founded the β-trefoil lineage itself arose de novo. To better understand β-trefoil evolution, and to probe the limits of fragment sharing across the protein universe, we identified both 'β-trefoil bridging themes' (evolutionarily-related sequence segments) and 'β-trefoil-like motifs' (structure motifs with a hallmark feature of the β-trefoil architecture) in multiple, ostensibly unrelated, protein lineages. The success of the present approach stems, in part, from considering β-trefoil sequence segments or structure motifs rather than the β-trefoil architecture as a whole, as has been done previously. The newly uncovered inter-lineage connections presented here suggest a novel hypothesis about the origins of the β-trefoil fold itself-namely, that it is a derived fold formed by 'budding' from an Immunoglobulin-like β-sandwich protein. These results demonstrate how the evolution of a folded domain from a peptide need not be a signature of antiquity and underpin an emerging truth: few protein lineages escape nature's sewing table.
Collapse
Affiliation(s)
- Liam M. Longo
- Earth-Life Science Institute, Tokyo Institute of Technology, Tokyo, Japan
- Blue Marble Space Institute of Science, Seattle, Washington, United States of America
| | - Rachel Kolodny
- Department of Computer Science, University of Haifa, Haifa, Israel
| | - Shawn E. McGlynn
- Earth-Life Science Institute, Tokyo Institute of Technology, Tokyo, Japan
- Blue Marble Space Institute of Science, Seattle, Washington, United States of America
| |
Collapse
|
14
|
Bitard-Feildel T. Navigating the amino acid sequence space between functional proteins using a deep learning framework. PeerJ Comput Sci 2021; 7:e684. [PMID: 34616884 PMCID: PMC8459775 DOI: 10.7717/peerj-cs.684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Accepted: 07/30/2021] [Indexed: 06/13/2023]
Abstract
MOTIVATION Shedding light on the relationships between protein sequences and functions is a challenging task with many implications in protein evolution, diseases understanding, and protein design. The protein sequence space mapping to specific functions is however hard to comprehend due to its complexity. Generative models help to decipher complex systems thanks to their abilities to learn and recreate data specificity. Applied to proteins, they can capture the sequence patterns associated with functions and point out important relationships between sequence positions. By learning these dependencies between sequences and functions, they can ultimately be used to generate new sequences and navigate through uncharted area of molecular evolution. RESULTS This study presents an Adversarial Auto-Encoder (AAE) approached, an unsupervised generative model, to generate new protein sequences. AAEs are tested on three protein families known for their multiple functions the sulfatase, the HUP and the TPP families. Clustering results on the encoded sequences from the latent space computed by AAEs display high level of homogeneity regarding the protein sequence functions. The study also reports and analyzes for the first time two sampling strategies based on latent space interpolation and latent space arithmetic to generate intermediate protein sequences sharing sequential properties of original sequences linked to known functional properties issued from different families and functions. Generated sequences by interpolation between latent space data points demonstrate the ability of the AAE to generalize and produce meaningful biological sequences from an evolutionary uncharted area of the biological sequence space. Finally, 3D structure models computed by comparative modelling using generated sequences and templates of different sub-families point out to the ability of the latent space arithmetic to successfully transfer protein sequence properties linked to function between different sub-families. All in all this study confirms the ability of deep learning frameworks to model biological complexity and bring new tools to explore amino acid sequence and functional spaces.
Collapse
Affiliation(s)
- Tristan Bitard-Feildel
- IBPS, CNRS, Laboratoire de Biologie Computationnelle et Quantitative, Sorbonne Université, Paris, France
- Institut des Sciences du Calcul et de des Données (ISCD), Sorbonne Université, Paris, France
| |
Collapse
|
15
|
Abstract
Domains are the structural, functional and evolutionary units of proteins. They combine to form multidomain proteins. The evolutionary history of this molecular combinatorics has been studied with phylogenomic methods. Here, we construct networks of domain organization and explore their evolution. A time series of networks revealed two ancient waves of structural novelty arising from ancient 'p-loop' and 'winged helix' domains and a massive 'big bang' of domain organization. The evolutionary recruitment of domains was highly modular, hierarchical and ongoing. Domain rearrangements elicited non-random and scale-free network structure. Comparative analyses of preferential attachment, randomness and modularity showed yin-and-yang complementary transition and biphasic patterns along the structural chronology. Remarkably, the evolving networks highlighted a central evolutionary role of cofactor-supporting structures of non-ribosomal peptide synthesis pathways, likely crucial to the early development of the genetic code. Some highly modular domains featured dual response regulation in two-component signal transduction systems with DNA-binding activity linked to transcriptional regulation of responses to environmental change. Interestingly, hub domains across the evolving networks shared the historical role of DNA binding and editing, an ancient protein function in molecular evolution. Our investigation unfolds historical source-sink patterns of evolutionary recruitment that further our understanding of protein architectures and functions.
Collapse
|
16
|
Romero-Romero S, Kordes S, Michel F, Höcker B. Evolution, folding, and design of TIM barrels and related proteins. Curr Opin Struct Biol 2021; 68:94-104. [PMID: 33453500 PMCID: PMC8250049 DOI: 10.1016/j.sbi.2020.12.007] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2020] [Revised: 12/13/2020] [Accepted: 12/14/2020] [Indexed: 12/16/2022]
Abstract
Proteins are chief actors in life that perform a myriad of exquisite functions. This diversity has been enabled through the evolution and diversification of protein folds. Analysis of sequences and structures strongly suggest that numerous protein pieces have been reused as building blocks and propagated to many modern folds. This information can be traced to understand how the protein world has diversified. In this review, we discuss the latest advances in the analysis of protein evolutionary units, and we use as a model system one of the most abundant and versatile topologies, the TIM-barrel fold, to highlight the existing common principles that interconnect protein evolution, structure, folding, function, and design.
Collapse
Affiliation(s)
| | - Sina Kordes
- Department of Biochemistry, University of Bayreuth, 95447 Bayreuth, Germany
| | - Florian Michel
- Department of Biochemistry, University of Bayreuth, 95447 Bayreuth, Germany
| | - Birte Höcker
- Department of Biochemistry, University of Bayreuth, 95447 Bayreuth, Germany.
| |
Collapse
|
17
|
Kolodny R, Nepomnyachiy S, Tawfik DS, Ben-Tal N. Bridging Themes: Short Protein Segments Found in Different Architectures. Mol Biol Evol 2021; 38:2191-2208. [PMID: 33502503 PMCID: PMC8136508 DOI: 10.1093/molbev/msab017] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
The vast majority of theoretically possible polypeptide chains do not fold, let alone confer function. Hence, protein evolution from preexisting building blocks has clear potential advantages over ab initio emergence from random sequences. In support of this view, sequence similarities between different proteins is generally indicative of common ancestry, and we collectively refer to such homologous sequences as "themes." At the domain level, sequence homology is routinely detected. However, short themes which are segments, or fragments of intact domains, are particularly interesting because they may provide hints about the emergence of domains, as opposed to divergence of preexisting domains, or their mixing-and-matching to form multi-domain proteins. Here we identified 525 representative short themes, comprising 20-80 residues that are unexpectedly shared between domains considered to have emerged independently. Among these "bridging themes" are ones shared between the most ancient domains, for example, Rossmann, P-loop NTPase, TIM-barrel, flavodoxin, and ferredoxin-like. We elaborate on several particularly interesting cases, where the bridging themes mediate ligand binding. Ligand binding may have contributed to the stability and the plasticity of these building blocks, and to their ability to invade preexisting domains or serve as starting points for completely new domains.
Collapse
Affiliation(s)
- Rachel Kolodny
- Department of Computer Science, University of Haifa, Haifa, Israel
| | | | - Dan S Tawfik
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, Israel
| | - Nir Ben-Tal
- George S. Wise Faculty of Life Sciences, Department of Biochemistry and Molecular Biology, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
18
|
Heizinger L, Merkl R. Evidence for the preferential reuse of sub-domain motifs in primordial protein folds. Proteins 2021; 89:1167-1179. [PMID: 33957009 DOI: 10.1002/prot.26089] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Revised: 04/15/2021] [Accepted: 04/28/2021] [Indexed: 11/06/2022]
Abstract
A comparison of protein backbones makes clear that not more than approximately 1400 different folds exist, each specifying the three-dimensional topology of a protein domain. Large proteins are composed of specific domain combinations and many domains can accommodate different functions. These findings confirm that the reuse of domains is key for the evolution of multi-domain proteins. If reuse was also the driving force for domain evolution, ancestral fragments of sub-domain size exist that are shared between domains possessing significantly different topologies. For the fully automated detection of putatively ancestral motifs, we developed the algorithm Fragstatt that compares proteins pairwise to identify fragments, that is, instantiations of the same motif. To reach maximal sensitivity, Fragstatt compares sequences by means of cascaded alignments of profile Hidden Markov Models. If the fragment sequences are sufficiently similar, the program determines and scores the structural concordance of the fragments. By analyzing a comprehensive set of proteins from the CATH database, Fragstatt identified 12 532 partially overlapping and structurally similar motifs that clustered to 134 unique motifs. The dissemination of these motifs is limited: We found only two domain topologies that contain two different motifs and generally, these motifs occur in not more than 18% of the CATH topologies. Interestingly, motifs are enriched in topologies that are considered ancestral. Thus, our findings suggest that the reuse of sub-domain sized fragments was relevant in early phases of protein evolution and became less important later on.
Collapse
Affiliation(s)
- Leonhard Heizinger
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Regensburg, Germany
| | - Rainer Merkl
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Regensburg, Germany
| |
Collapse
|
19
|
Janaki C, Gowri VS, Srinivasan N. Master Blaster: an approach to sensitive identification of remotely related proteins. Sci Rep 2021; 11:8746. [PMID: 33888741 PMCID: PMC8062480 DOI: 10.1038/s41598-021-87833-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Accepted: 04/06/2021] [Indexed: 11/11/2022] Open
Abstract
Genome sequencing projects unearth sequences of all the protein sequences encoded in a genome. As the first step, homology detection is employed to obtain clues to structure and function of these proteins. However, high evolutionary divergence between homologous proteins challenges our ability to detect distant relationships. In the past, an approach involving multiple Position Specific Scoring Matrices (PSSMs) was found to be more effective than traditional single PSSMs. Cascaded search is another successful approach where hits of a search are queried to detect more homologues. We propose a protocol, ‘Master Blaster’, which combines the principles adopted in these two approaches to enhance our ability to detect remote homologues even further. Assessment of the approach was performed using known relationships available in the SCOP70 database, and the results were compared against that of PSI-BLAST and HHblits, a hidden Markov model-based method. Compared to PSI-BLAST, Master Blaster resulted in 10% improvement with respect to detection of cross superfamily connections, nearly 35% improvement in cross family and more than 80% improvement in intra family connections. From the results it was observed that HHblits is more sensitive in detecting remote homologues compared to Master Blaster. However, there are true hits from 46-folds for which Master Blaster reported homologs that are not reported by HHblits even using the optimal parameters indicating that for detecting remote homologues, use of multiple methods employing a combination of different approaches can be more effective in detecting remote homologs. Master Blaster stand-alone code is available for download in the supplementary archive.
Collapse
Affiliation(s)
- Chintalapati Janaki
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, 560012, India.,Centre for Development of Advanced Computing, Knowledge Park, Byappanahalli, Bangalore, 560038, India
| | - Venkatraman S Gowri
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, 560012, India.,Department of Chemistry, Auxilium College, Gandhinagar, Vellore, 632006, India
| | | |
Collapse
|
20
|
Gullotto D. Fine tuned exploration of evolutionary relationships within the protein universe. Stat Appl Genet Mol Biol 2021; 20:17-36. [PMID: 33594839 DOI: 10.1515/sagmb-2019-0039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2019] [Accepted: 01/12/2021] [Indexed: 11/15/2022]
Abstract
In the regime of domain classifications, the protein universe unveils a discrete set of folds connected by hierarchical relationships. Instead, at sub-domain-size resolution and because of physical constraints not necessarily requiring evolution to shape polypeptide chains, networks of protein motifs depict a continuous view that lies beyond the extent of hierarchical classification schemes. A number of studies, however, suggest that universal sub-sequences could be the descendants of peptides emerged in an ancient pre-biotic world. Should this be the case, evolutionary signals retained by structurally conserved motifs, along with hierarchical features of ancient domains, could sew relationships among folds that diverged beyond the point where homology is discernable. In view of the aforementioned, this paper provides a rationale where a network with hierarchical and continuous levels of the protein space, together with sequence profiles that probe the extent of sequence similarity and contacting residues that capture the transition from pre-biotic to domain world, has been used to explore relationships between ancient folds. Statistics of detected signals have been reported. As a result, an example of an emergent sub-network that makes sense from an evolutionary perspective, where conserved signals retrieved from the assessed protein space have been co-opted, has been discussed.
Collapse
Affiliation(s)
- Danilo Gullotto
- Advanced Computational Biostructural Research Collaboratory, I-95019, Zafferana Etnea, Italy
| |
Collapse
|
21
|
Gospodinov A, Kunnev D. Universal Codons with Enrichment from GC to AU Nucleotide Composition Reveal a Chronological Assignment from Early to Late Along with LUCA Formation. Life (Basel) 2020; 10:life10060081. [PMID: 32516985 PMCID: PMC7345086 DOI: 10.3390/life10060081] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Revised: 05/30/2020] [Accepted: 06/03/2020] [Indexed: 12/14/2022] Open
Abstract
The emergence of a primitive genetic code should be considered the most essential event during the origin of life. Almost a complete set of codons (as we know them) should have been established relatively early during the evolution of the last universal common ancestor (LUCA) from which all known organisms descended. Many hypotheses have been proposed to explain the driving forces and chronology of the evolution of the genetic code; however, none is commonly accepted. In the current paper, we explore the features of the genetic code that, in our view, reflect the mechanism and the chronological order of the origin of the genetic code. Our hypothesis postulates that the primordial RNA was mostly GC-rich, and this bias was reflected in the order of amino acid codon assignment. If we arrange the codons and their corresponding amino acids from GC-rich to AU-rich, we find that: 1. The amino acids encoded by GC-rich codons (Ala, Gly, Arg, and Pro) are those that contribute the most to the interactions with RNA (if incorporated into short peptides). 2. This order correlates with the addition of novel functions necessary for the evolution from simple to longer folded peptides. 3. The overlay of aminoacyl-tRNA synthetases (aaRS) to the amino acid order produces a distinctive zonal distribution for class I and class II suggesting an interdependent origin. These correlations could be explained by the active role of the bridge peptide (BP), which we proposed earlier in the evolution of the genetic code.
Collapse
Affiliation(s)
- Anastas Gospodinov
- Roumen Tsanev Institute of Molecular Biology, Bulgarian Academy of Sciences, Acad. G. Bonchev Str. 21, Sofia 1113, Bulgaria;
| | - Dimiter Kunnev
- Department of Molecular & Cellular Biology, Roswell Park Cancer Institute, Buffalo, NY 14263, USA
- Correspondence:
| |
Collapse
|
22
|
Ferruz N, Lobos F, Lemm D, Toledo-Patino S, Farías-Rico JA, Schmidt S, Höcker B. Identification and Analysis of Natural Building Blocks for Evolution-Guided Fragment-Based Protein Design. J Mol Biol 2020; 432:3898-3914. [PMID: 32330481 PMCID: PMC7322520 DOI: 10.1016/j.jmb.2020.04.013] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Revised: 04/12/2020] [Accepted: 04/13/2020] [Indexed: 12/15/2022]
Abstract
Natural evolution has generated an impressively diverse protein universe via duplication and recombination from a set of protein fragments that served as building blocks. The application of these concepts to the design of new proteins using subdomain-sized fragments from different folds has proven to be experimentally successful. To better understand how evolution has shaped our protein universe, we performed an all-against-all comparison of protein domains representing all naturally existing folds and identified conserved homologous protein fragments. Overall, we found more than 1000 protein fragments of various lengths among different folds through similarity network analysis. These fragments are present in very different protein environments and represent versatile building blocks for protein design. These data are available in our web server called F(old P)uzzle (fuzzle.uni-bayreuth.de), which allows to individually filter the dataset and create customized networks for folds of interest. We believe that our results serve as an invaluable resource for structural and evolutionary biologists and as raw material for the design of custom-made proteins.
Collapse
Affiliation(s)
- Noelia Ferruz
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany
| | - Francisco Lobos
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany; Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Dominik Lemm
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany
| | - Saacnicteh Toledo-Patino
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany; Max Planck Institute for Developmental Biology, Tübingen, Germany
| | | | - Steffen Schmidt
- Max Planck Institute for Developmental Biology, Tübingen, Germany; Computational Biochemistry, University of Bayreuth, Bayreuth, Germany.
| | - Birte Höcker
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany; Max Planck Institute for Developmental Biology, Tübingen, Germany.
| |
Collapse
|
23
|
Mura C, Veretnik S, Bourne PE. The Urfold: Structural similarity just above the superfold level? Protein Sci 2019; 28:2119-2126. [PMID: 31599042 PMCID: PMC6863707 DOI: 10.1002/pro.3742] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Revised: 09/30/2019] [Accepted: 10/01/2019] [Indexed: 01/16/2023]
Abstract
We suspect that there is a level of granularity of protein structure intermediate between the classical levels of "architecture" and "topology," as reflected in such phenomena as extensive three-dimensional structural similarity above the level of (super)folds. Here, we examine this notion of architectural identity despite topological variability, starting with a concept that we call the "Urfold." We believe that this model could offer a new conceptual approach for protein structural analysis and classification: indeed, the Urfold concept may help reconcile various phenomena that have been frequently recognized or debated for years, such as the precise meaning of "significant" structural overlap and the degree of continuity of fold space. More broadly, the role of structural similarity in sequence↔structure↔function evolution has been studied via many models over the years; by addressing a conceptual gap that we believe exists between the architecture and topology levels of structural classification schemes, the Urfold eventually may help synthesize these models into a generalized, consistent framework. Here, we begin by qualitatively introducing the concept.
Collapse
Affiliation(s)
- Cameron Mura
- Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia
| | - Stella Veretnik
- Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia
| | - Philip E Bourne
- Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia.,School of Data Science, University of Virginia, Charlottesville, Virginia
| |
Collapse
|
24
|
Navigating Among Known Structures in Protein Space. Methods Mol Biol 2018. [PMID: 30298400 DOI: 10.1007/978-1-4939-8736-8_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
Present-day protein space is the result of 3.7 billion years of evolution, constrained by the underlying physicochemical qualities of the proteins. It is difficult to differentiate between evolutionary traces and effects of physicochemical constraints. Nonetheless, as a rule of thumb, instances of structural reuse, or focusing on structural similarity, are likely attributable to physicochemical constraints, whereas sequence reuse, or focusing on sequence similarity, may be more indicative of evolutionary relationships. Both types of relationships have been studied and can provide meaningful insights to protein biophysics and evolution, which in turn can lead to better algorithms for protein search, annotation, and maybe even design.In broad strokes, studies of protein space vary in the entities they represent, the similarity measure comparing these entities, and the representation used. The entities can be, for example, protein chains, domains, supra-domains, or smaller protein sub-parts denoted themes. The measures of similarity between the entities can be based on sequence, structure, function, or any combination of these. The representation can be global, encompassing the whole space, or local, focusing on a particular region surrounding protein(s) of interest. Global representations include lists of grouped proteins, protein networks, and maps. Networks are the abstraction that is derived most directly from the similarity data: each node is the protein entity (e.g., a domain), and edges connect similar domains. Selecting the entities, the similarity measure, and the abstraction are three intertwined decisions: the similarity measures allow us to identify the entities, and the selection of entities influences what is a meaningful similarity measure. Similarly, we seek entities that are related to each other in a way, for which a simple representation describes their relationships succinctly and accurately. This chapter will cover studies that rely on different entities, similarity measures, and a range of representations to better understand protein structure space. Scholars may use publicly available navigators offering a global representation, and in particular the hierarchical classifications SCOP, CATH, and ECOD, or a local representation, which encompass structural alignment algorithms. Alternatively, scholars can configure their own navigator using existing tools. To demonstrate this DIY (do it yourself) approach for navigating in protein space, we investigate substrate-binding proteins. By presenting sequence similarities among this large and diverse protein family as a network, we can infer that one member (pdb ID 4ntl; of yet unknown function) may bind methionine and suggest a putative binding mechanism.
Collapse
|
25
|
Complex evolutionary footprints revealed in an analysis of reused protein segments of diverse lengths. Proc Natl Acad Sci U S A 2017; 114:11703-11708. [PMID: 29078314 PMCID: PMC5676897 DOI: 10.1073/pnas.1707642114] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
We question a central paradigm: namely, that the protein domain is the “atomic unit” of evolution. In conflict with the current textbook view, our results unequivocally show that duplication of protein segments happens both above and below the domain level among amino acid segments of diverse lengths. Indeed, we show that significant evolutionary information is lost when the protein is approached as a string of domains. Our finer-grained approach reveals a far more complicated picture, where reused segments often intertwine and overlap with each other. Our results are consistent with a recursive model of evolution, in which segments of various lengths, typically smaller than domains, “hop” between environments. The fit segments remain, leaving traces that can still be detected. Proteins share similar segments with one another. Such “reused parts”—which have been successfully incorporated into other proteins—are likely to offer an evolutionary advantage over de novo evolved segments, as most of the latter will not even have the capacity to fold. To systematically explore the evolutionary traces of segment “reuse” across proteins, we developed an automated methodology that identifies reused segments from protein alignments. We search for “themes”—segments of at least 35 residues of similar sequence and structure—reused within representative sets of 15,016 domains [Evolutionary Classification of Protein Domains (ECOD) database] or 20,398 chains [Protein Data Bank (PDB)]. We observe that theme reuse is highly prevalent and that reuse is more extensive when the length threshold for identifying a theme is lower. Structural domains, the best characterized form of reuse in proteins, are just one of many complex and intertwined evolutionary traces. Others include long themes shared among a few proteins, which encompass and overlap with shorter themes that recur in numerous proteins. The observed complexity is consistent with evolution by duplication and divergence, and some of the themes might include descendants of ancestral segments. The observed recursive footprints, where the same amino acid can simultaneously participate in several intertwined themes, could be a useful concept for protein design. Data are available at http://trachel-srv.cs.haifa.ac.il/rachel/ppi/themes/.
Collapse
|
26
|
Puustusmaa M, Kirsip H, Gaston K, Abroi A. The Enigmatic Origin of Papillomavirus Protein Domains. Viruses 2017; 9:v9090240. [PMID: 28832519 PMCID: PMC5618006 DOI: 10.3390/v9090240] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2017] [Revised: 08/17/2017] [Accepted: 08/19/2017] [Indexed: 12/17/2022] Open
Abstract
Almost a century has passed since the discovery of papillomaviruses. A few decades of research have given a wealth of information on the molecular biology of papillomaviruses. Several excellent studies have been performed looking at the long- and short-term evolution of these viruses. However, when and how papillomaviruses originate is still a mystery. In this study, we systematically searched the (sequenced) biosphere to find distant homologs of papillomaviral protein domains. Our data show that, even including structural information, which allows us to find deeper evolutionary relationships compared to sequence-only based methods, only half of the protein domains in papillomaviruses have relatives in the rest of the biosphere. We show that the major capsid protein L1 and the replication protein E1 have relatives in several viral families, sharing three protein domains with Polyomaviridae and Parvoviridae. However, only the E1 replication protein has connections with cellular organisms. Most likely, the papillomavirus ancestor is of marine origin, a biotope that is not very well sequenced at the present time. Nevertheless, there is no evidence as to how papillomaviruses originated and how they became vertebrate and epithelium specific.
Collapse
Affiliation(s)
- Mikk Puustusmaa
- Department of Bioinformatics, University of Tartu, Riia 23a, Tartu 51010, Estonia.
| | - Heleri Kirsip
- Department of Bioinformatics, University of Tartu, Riia 23a, Tartu 51010, Estonia.
| | - Kevin Gaston
- School of Biochemistry, University of Bristol, Bristol BS8 1TD, UK.
| | - Aare Abroi
- Estonian Biocentre, Riia 23b, Tartu 51010, Estonia.
- Institute of Technology, University of Tartu, Nooruse 1, Tartu 50411, Estonia.
| |
Collapse
|
27
|
What Froze the Genetic Code? Life (Basel) 2017; 7:life7020014. [PMID: 28379164 PMCID: PMC5492136 DOI: 10.3390/life7020014] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2017] [Revised: 03/27/2017] [Accepted: 04/03/2017] [Indexed: 11/16/2022] Open
Abstract
The frozen accident theory of the Genetic Code was a proposal by Francis Crick that attempted to explain the universal nature of the Genetic Code and the fact that it only contains information for twenty amino acids. Fifty years later, it is clear that variations to the universal Genetic Code exist in nature and that translation is not limited to twenty amino acids. However, given the astonishing diversity of life on earth, and the extended evolutionary time that has taken place since the emergence of the extant Genetic Code, the idea that the translation apparatus is for the most part immobile remains true. Here, we will offer a potential explanation to the reason why the code has remained mostly stable for over three billion years, and discuss some of the mechanisms that allow species to overcome the intrinsic functional limitations of the protein synthesis machinery.
Collapse
|
28
|
Dybas JM, Fiser A. Development of a motif-based topology-independent structure comparison method to identify evolutionarily related folds. Proteins 2016; 84:1859-1874. [PMID: 27671894 PMCID: PMC5118133 DOI: 10.1002/prot.25169] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2016] [Revised: 08/17/2016] [Accepted: 08/25/2016] [Indexed: 11/09/2022]
Abstract
Structure conservation, functional similarities, and homologous relationships that exist across diverse protein topologies suggest that some regions of the protein fold universe are continuous. However, the current structure classification systems are based on hierarchical organizations, which cannot accommodate structural relationships that span fold definitions. Here, we describe a novel, super-secondary-structure motif-based, topology-independent structure comparison method (SmotifCOMP) that is able to quantitatively identify structural relationships between disparate topologies. The basis of SmotifCOMP is a systematically defined super-secondary-structure motif library whose representative geometries are shown to be saturated in the Protein Data Bank and exhibit a unique distribution within the known folds. SmotifCOMP offers a robust and quantitative technique to compare domains that adopt different topologies since the method does not rely on a global superposition. SmotifCOMP is used to perform an exhaustive comparison of the known folds and the identified relationships are used to produce a nonhierarchical representation of the fold space that reflects the notion of a continuous and connected fold universe. The current work offers insight into previously hypothesized evolutionary relationships between disparate folds and provides a resource for exploring novel ones. Proteins 2016; 84:1859-1874. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Joseph M. Dybas
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Avenue Bronx, NY 10461, USA
- Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue Bronx, NY 10461, USA
| | - Andras Fiser
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Avenue Bronx, NY 10461, USA
- Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue Bronx, NY 10461, USA
| |
Collapse
|
29
|
Barkan DT, Cheng XL, Celino H, Tran TT, Bhandari A, Craik CS, Sali A, Smythe ML. Clustering of disulfide-rich peptides provides scaffolds for hit discovery by phage display: application to interleukin-23. BMC Bioinformatics 2016; 17:481. [PMID: 27881076 PMCID: PMC5120537 DOI: 10.1186/s12859-016-1350-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2016] [Accepted: 11/10/2016] [Indexed: 03/01/2023] Open
Abstract
BACKGROUND Disulfide-rich peptides (DRPs) are found throughout nature. They are suitable scaffolds for drug development due to their small cores, whose disulfide bonds impart extraordinary chemical and biological stability. A challenge in developing a DRP therapeutic is to engineer binding to a specific target. This challenge can be overcome by (i) sampling the large sequence space of a given scaffold through a phage display library and by (ii) panning multiple libraries encoding structurally distinct scaffolds. Here, we implement a protocol for defining these diverse scaffolds, based on clustering structurally defined DRPs according to their conformational similarity. RESULTS We developed and applied a hierarchical clustering protocol based on DRP structural similarity, followed by two post-processing steps, to classify 806 unique DRP structures into 81 clusters. The 20 most populated clusters comprised 85% of all DRPs. Representative scaffolds were selected from each of these clusters; the representatives were structurally distinct from one another, but similar to other DRPs in their respective clusters. To demonstrate the utility of the clusters, phage libraries were constructed for three of the representative scaffolds and panned against interleukin-23. One library produced a peptide that bound to this target with an IC50 of 3.3 μM. CONCLUSIONS Most DRP clusters contained members that were diverse in sequence, host organism, and interacting proteins, indicating that cluster members were functionally diverse despite having similar structure. Only 20 peptide scaffolds accounted for most of the natural DRP structural diversity, providing suitable starting points for seeding phage display experiments. Through selection of the scaffold surface to vary in phage display, libraries can be designed that present sequence diversity in architecturally distinct, biologically relevant combinations of secondary structures. We supported this hypothesis with a proof-of-concept experiment in which three phage libraries were constructed and panned against the IL-23 target, resulting in a single-digit μM hit and suggesting that a collection of libraries based on the full set of 20 scaffolds increases the potential to identify efficiently peptide binders to a protein target in a drug discovery program.
Collapse
Affiliation(s)
- David T Barkan
- Protagonist Therapeutics, Inc., 521 Cottonwood Drive, Suite 100, Milpitas, CA, 95035-74521, USA
| | - Xiao-Li Cheng
- Protagonist Therapeutics, Inc., 521 Cottonwood Drive, Suite 100, Milpitas, CA, 95035-74521, USA
| | - Herodion Celino
- Protagonist Therapeutics, Inc., 521 Cottonwood Drive, Suite 100, Milpitas, CA, 95035-74521, USA
| | - Tran T Tran
- Protagonist Therapeutics, Inc., 521 Cottonwood Drive, Suite 100, Milpitas, CA, 95035-74521, USA
| | - Ashok Bhandari
- Protagonist Therapeutics, Inc., 521 Cottonwood Drive, Suite 100, Milpitas, CA, 95035-74521, USA
| | - Charles S Craik
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA, 94158, USA.,California Institute for Quantitative Biosciences (QB3), University of California, San Francisco, San Francisco, CA, 94158, USA
| | - Andrej Sali
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, 94158, USA.,Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA, 94158, USA.,California Institute for Quantitative Biosciences (QB3), University of California, San Francisco, San Francisco, CA, 94158, USA
| | - Mark L Smythe
- Protagonist Therapeutics, Inc., 521 Cottonwood Drive, Suite 100, Milpitas, CA, 95035-74521, USA. .,Institute for Molecular Bioscience, The University of Queensland, St. Lucia, Qld, 4072, Australia.
| |
Collapse
|
30
|
Undheim EAB, Mobli M, King GF. Toxin structures as evolutionary tools: Using conserved 3D folds to study the evolution of rapidly evolving peptides. Bioessays 2016; 38:539-48. [DOI: 10.1002/bies.201500165] [Citation(s) in RCA: 59] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Affiliation(s)
- Eivind A. B. Undheim
- Institute for Molecular BioscienceUniversity of QueenslandSt LuciaQueenslandAustralia
| | - Mehdi Mobli
- Centre for Advanced ImagingUniversity of QueenslandSt LuciaQueenslandAustralia
| | - Glenn F. King
- Institute for Molecular BioscienceUniversity of QueenslandSt LuciaQueenslandAustralia
| |
Collapse
|
31
|
Alva V, Nam SZ, Söding J, Lupas AN. The MPI bioinformatics Toolkit as an integrative platform for advanced protein sequence and structure analysis. Nucleic Acids Res 2016; 44:W410-5. [PMID: 27131380 PMCID: PMC4987908 DOI: 10.1093/nar/gkw348] [Citation(s) in RCA: 304] [Impact Index Per Article: 33.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2016] [Accepted: 04/19/2016] [Indexed: 12/21/2022] Open
Abstract
The MPI Bioinformatics Toolkit (http://toolkit.tuebingen.mpg.de) is an open, interactive web service for comprehensive and collaborative protein bioinformatic analysis. It offers a wide array of interconnected, state-of-the-art bioinformatics tools to experts and non-experts alike, developed both externally (e.g. BLAST+, HMMER3, MUSCLE) and internally (e.g. HHpred, HHblits, PCOILS). While a beta version of the Toolkit was released 10 years ago, the current production-level release has been available since 2008 and has serviced more than 1.6 million external user queries. The usage of the Toolkit has continued to increase linearly over the years, reaching more than 400 000 queries in 2015. In fact, through the breadth of its tools and their tight interconnection, the Toolkit has become an excellent platform for experimental scientists as well as a useful resource for teaching bioinformatic inquiry to students in the life sciences. In this article, we report on the evolution of the Toolkit over the last ten years, focusing on the expansion of the tool repertoire (e.g. CS-BLAST, HHblits) and on infrastructural work needed to remain operative in a changing web environment.
Collapse
Affiliation(s)
- Vikram Alva
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, Tübingen D-72076, Germany
| | - Seung-Zin Nam
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, Tübingen D-72076, Germany
| | - Johannes Söding
- Group for Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, Göttingen D-37077, Germany
| | - Andrei N Lupas
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, Tübingen D-72076, Germany
| |
Collapse
|
32
|
Schaeffer RD, Kinch LN, Liao Y, Grishin NV. Classification of proteins with shared motifs and internal repeats in the ECOD database. Protein Sci 2016; 25:1188-203. [PMID: 26833690 DOI: 10.1002/pro.2893] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2015] [Revised: 01/23/2016] [Accepted: 01/27/2016] [Indexed: 12/19/2022]
Abstract
Proteins and their domains evolve by a set of events commonly including the duplication and divergence of small motifs. The presence of short repetitive regions in domains has generally constituted a difficult case for structural domain classifications and their hierarchies. We developed the Evolutionary Classification Of protein Domains (ECOD) in part to implement a new schema for the classification of these types of proteins. Here we document the ways in which ECOD classifies proteins with small internal repeats, widespread functional motifs, and assemblies of small domain-like fragments in its evolutionary schema. We illustrate the ways in which the structural genomics project impacted the classification and characterization of new structural domains and sequence families over the decade.
Collapse
Affiliation(s)
- R Dustin Schaeffer
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, 75390-9050
| | - Lisa N Kinch
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, 75390-9050
| | - Yuxing Liao
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, 75390-9050
| | - Nick V Grishin
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, 75390-9050.,Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, 75390-9050
| |
Collapse
|
33
|
Oh Brother, Where Art Thou? Finding Orthologs in the Twilight and Midnight Zones of Sequence Similarity. Evol Biol 2016. [DOI: 10.1007/978-3-319-41324-2_22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
34
|
Huang PS, Feldmeier K, Parmeggiani F, Velasco DAF, Höcker B, Baker D. De novo design of a four-fold symmetric TIM-barrel protein with atomic-level accuracy. Nat Chem Biol 2015; 12:29-34. [PMID: 26595462 PMCID: PMC4684731 DOI: 10.1038/nchembio.1966] [Citation(s) in RCA: 179] [Impact Index Per Article: 17.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2015] [Accepted: 10/07/2015] [Indexed: 12/26/2022]
Abstract
Despite efforts for over 25 years, de novo protein design has not succeeded in achieving the TIM-barrel fold. Here we describe the computational design of four-fold symmetrical (β/α)8 barrels guided by geometrical and chemical principles. Experimental characterization of 33 designs revealed the importance of side chain-backbone hydrogen bonds for defining the strand register between repeat units. The X-ray crystal structure of a designed thermostable 184-residue protein is nearly identical to that of the designed TIM-barrel model. PSI-BLAST searches do not identify sequence similarities to known TIM-barrel proteins, and sensitive profile-profile searches indicate that the design sequence is distant from other naturally occurring TIM-barrel superfamilies, suggesting that Nature has sampled only a subset of the sequence space available to the TIM-barrel fold. The ability to design TIM barrels de novo opens new possibilities for custom-made enzymes.
Collapse
Affiliation(s)
- Po-Ssu Huang
- Department of Biochemistry, University of Washington, Seattle, Washington 98195, USA.,Institute for Protein Design, University of Washington, Seattle, Washington 98195, USA
| | - Kaspar Feldmeier
- Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Fabio Parmeggiani
- Department of Biochemistry, University of Washington, Seattle, Washington 98195, USA.,Institute for Protein Design, University of Washington, Seattle, Washington 98195, USA
| | | | - Birte Höcker
- Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, Washington 98195, USA.,Institute for Protein Design, University of Washington, Seattle, Washington 98195, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
35
|
Edwards H, Deane CM. Structural Bridges through Fold Space. PLoS Comput Biol 2015; 11:e1004466. [PMID: 26372166 PMCID: PMC4570669 DOI: 10.1371/journal.pcbi.1004466] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Accepted: 07/12/2015] [Indexed: 12/05/2022] Open
Abstract
Several protein structure classification schemes exist that partition the protein universe into structural units called folds. Yet these schemes do not discuss how these units sit relative to each other in a global structure space. In this paper we construct networks that describe such global relationships between folds in the form of structural bridges. We generate these networks using four different structural alignment methods across multiple score thresholds. The networks constructed using the different methods remain a similar distance apart regardless of the probability threshold defining a structural bridge. This suggests that at least some structural bridges are method specific and that any attempt to build a picture of structural space should not be reliant on a single structural superposition method. Despite these differences all representations agree on an organisation of fold space into five principal community structures: all-α, all-β sandwiches, all-β barrels, α/β and α + β. We project estimated fold ages onto the networks and find that not only are the pairings of unconnected folds associated with higher age differences than bridged folds, but this difference increases with the number of networks displaying an edge. We also examine different centrality measures for folds within the networks and how these relate to fold age. While these measures interpret the central core of fold space in varied ways they all identify the disposition of ancestral folds to fall within this core and that of the more recently evolved structures to provide the peripheral landscape. These findings suggest that evolutionary information is encoded along these structural bridges. Finally, we identify four highly central pivotal folds representing dominant topological features which act as key attractors within our landscapes. Folds are considered to be the structural units which make up the protein universe. Structural classification schemes focus on the assignment and organisation of protein domains into folds. However, they do not suggest how different folds might relate to one another in a global way. We introduce the concept of bridges through fold space: significant similarities between these units. We consider four alignment methods and a dynamic approach to placing these bridges. A greater consensus between these methods cannot be achieved by simply increasing the stringency with which edges are assigned. Instead, we emphasise the importance of considering consensus maps and only report results where there is agreement across all networks. It is possible that a study of the bridges may reveal evolutionary relationships. Based on a phylogenetic analysis of structures, we find that bridges consistently fall between folds which evolved at similar times. Moreover, the landscapes all consist of a core of older folds, with younger structures more often seen at the periphery. Finally we identify four pivotal folds in the landscapes. They contain topological motifs which unite disparate regions of fold space.
Collapse
Affiliation(s)
- Hannah Edwards
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Charlotte M. Deane
- Department of Statistics, University of Oxford, Oxford, United Kingdom
- * E-mail:
| |
Collapse
|
36
|
Abstract
To explore protein space from a global perspective, we consider 9,710 SCOP (Structural Classification of Proteins) domains with up to 70% sequence identity and present all similarities among them as networks: In the "domain network," nodes represent domains, and edges connect domains that share "motifs," i.e., significantly sized segments of similar sequence and structure. We explore the dependence of the network on the thresholds that define the evolutionary relatedness of the domains. At excessively strict thresholds the network falls apart completely; for very lax thresholds, there are network paths between virtually all domains. Interestingly, at intermediate thresholds the network constitutes two regions that can be described as "continuous" versus "discrete." The continuous region comprises a large connected component, dominated by domains with alternating alpha and beta elements, and the discrete region includes the rest of the domains in isolated islands, each generally corresponding to a fold. We also construct the "motif network," in which nodes represent recurring motifs, and edges connect motifs that appear in the same domain. This network also features a large and highly connected component of motifs that originate from domains with alternating alpha/beta elements (and some all-alpha domains), and smaller isolated islands. Indeed, the motif network suggests that nature reuses such motifs extensively. The networks suggest evolutionary paths between domains and give hints about protein evolution and the underlying biophysics. They provide natural means of organizing protein space, and could be useful for the development of strategies for protein search and design.
Collapse
|
37
|
Evolutionary relationship of two ancient protein superfolds. Nat Chem Biol 2014; 10:710-5. [PMID: 25038785 DOI: 10.1038/nchembio.1579] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2014] [Accepted: 06/02/2014] [Indexed: 01/29/2023]
Abstract
Proteins are the molecular machines of the cell that fold into specific three-dimensional structures to fulfill their functions. To improve our understanding of how the structure and function of proteins arises, it is crucial to understand how evolution has generated the structural diversity we observe today. Classically, proteins that adopt different folds are considered to be nonhomologous. However, using state-of-the-art tools for homology detection, we found evidence of homology between proteins of two ancient and highly populated protein folds, the (βα)8-barrel and the flavodoxin-like fold. We detected a family of sequences that show intermediate features between both folds and determined what is to our knowledge the first representative crystal structure of one of its members, giving new insights into the evolutionary link of two of the earliest folds. Our findings contribute to an emergent vision where protein superfolds share common ancestry and encourage further approaches to complete the mapping of structure space onto sequence space.
Collapse
|
38
|
Evolutionary history of redox metal-binding domains across the tree of life. Proc Natl Acad Sci U S A 2014; 111:7042-7. [PMID: 24778258 DOI: 10.1073/pnas.1403676111] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Oxidoreductases mediate electron transfer (i.e., redox) reactions across the tree of life and ultimately facilitate the biologically driven fluxes of hydrogen, carbon, nitrogen, oxygen, and sulfur on Earth. The core enzymes responsible for these reactions are ancient, often small in size, and highly diverse in amino acid sequence, and many require specific transition metals in their active sites. Here we reconstruct the evolution of metal-binding domains in extant oxidoreductases using a flexible network approach and permissive profile alignments based on available microbial genome data. Our results suggest there were at least 10 independent origins of redox domain families. However, we also identified multiple ancient connections between Fe2S2- (adrenodoxin-like) and heme- (cytochrome c) binding domains. Our results suggest that these two iron-containing redox families had a single common ancestor that underwent duplication and divergence. The iron-containing protein family constitutes ∼50% of all metal-containing oxidoreductases and potentially catalyzed redox reactions in the Archean oceans. Heme-binding domains seem to be derived via modular evolutionary processes that ultimately form the backbone of redox reactions in both anaerobic and aerobic respiration and photosynthesis. The empirically discovered network allows us to peer into the ancient history of microbial metabolism on our planet.
Collapse
|
39
|
Hadzipasic O, Wrabl JO, Hilser VJ. A horizontal alignment tool for numerical trend discovery in sequence data: application to protein hydropathy. PLoS Comput Biol 2013; 9:e1003247. [PMID: 24130469 PMCID: PMC3794901 DOI: 10.1371/journal.pcbi.1003247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2013] [Accepted: 07/10/2013] [Indexed: 11/19/2022] Open
Abstract
An algorithm is presented that returns the optimal pairwise gapped alignment of two sets of signed numerical sequence values. One distinguishing feature of this algorithm is a flexible comparison engine (based on both relative shape and absolute similarity measures) that does not rely on explicit gap penalties. Additionally, an empirical probability model is developed to estimate the significance of the returned alignment with respect to randomized data. The algorithm's utility for biological hypothesis formulation is demonstrated with test cases including database search and pairwise alignment of protein hydropathy. However, the algorithm and probability model could possibly be extended to accommodate other diverse types of protein or nucleic acid data, including positional thermodynamic stability and mRNA translation efficiency. The algorithm requires only numerical values as input and will readily compare data other than protein hydropathy. The tool is therefore expected to complement, rather than replace, existing sequence and structure based tools and may inform medical discovery, as exemplified by proposed similarity between a chlamydial ORFan protein and bacterial colicin pore-forming domain. The source code, documentation, and a basic web-server application are available.
Collapse
Affiliation(s)
- Omar Hadzipasic
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - James O. Wrabl
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Vincent J. Hilser
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, United States of America
- T.C. Jenkins Department of Biophysics, Johns Hopkins University, Baltimore, Maryland, United States of America
- * E-mail:
| |
Collapse
|
40
|
Kopec KO, Lupas AN. β-Propeller blades as ancestral peptides in protein evolution. PLoS One 2013; 8:e77074. [PMID: 24143202 PMCID: PMC3797127 DOI: 10.1371/journal.pone.0077074] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2013] [Accepted: 09/05/2013] [Indexed: 12/04/2022] Open
Abstract
Proteins of the β-propeller fold are ubiquitous in nature and widely used as structural scaffolds for ligand binding and enzymatic activity. This fold comprises between four and twelve four-stranded β-meanders, the so called blades that are arranged circularly around a central funnel-shaped pore. Despite the large size range of β-propellers, their blades frequently show sequence similarity indicative of a common ancestry and it has been proposed that the majority of β-propellers arose divergently by amplification and diversification of an ancestral blade. Given the structural versatility of β-propellers and the hypothesis that the first folded proteins evolved from a simpler set of peptides, we investigated whether this blade may have given rise to other folds as well. Using sequence comparisons, we identified proteins of four other folds as potential homologs of β-propellers: the luminal domain of inositol-requiring enzyme 1 (IRE1-LD), type II β-prisms, β-pinwheels, and WW domains. Because, with increasing evolutionary distance and decreasing sequence length, the statistical significance of sequence comparisons becomes progressively harder to distinguish from the background of convergent similarities, we complemented our analyses with a new method that evaluates possible homology based on the correlation between sequence and structure similarity. Our results indicate a homologous relationship of IRE1-LD and type II β-prisms with β-propellers, and an analogous one for β-pinwheels and WW domains. Whereas IRE1-LD most likely originated by fold-changing mutations from a fully formed PQQ motif β-propeller, type II β-prisms originated by amplification and differentiation of a single blade, possibly also of the PQQ type. We conclude that both β-propellers and type II β-prisms arose by independent amplification of a blade-sized fragment, which represents a remnant of an ancient peptide world.
Collapse
Affiliation(s)
- Klaus O. Kopec
- Department of Protein Evolution, Max-Planck-Institute for Developmental Biology, Tübingen, Baden-Württemberg, Germany
| | - Andrei N. Lupas
- Department of Protein Evolution, Max-Planck-Institute for Developmental Biology, Tübingen, Baden-Württemberg, Germany
- * E-mail:
| |
Collapse
|
41
|
Lenart A, Dudkiewicz M, Grynberg M, Pawłowski K. CLCAs - a family of metalloproteases of intriguing phylogenetic distribution and with cases of substituted catalytic sites. PLoS One 2013; 8:e62272. [PMID: 23671590 PMCID: PMC3650047 DOI: 10.1371/journal.pone.0062272] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2012] [Accepted: 03/19/2013] [Indexed: 01/08/2023] Open
Abstract
The zinc-dependent metalloproteases with His-Glu-x-x-His (HExxH) active site motif, zincins, are a broad group of proteins involved in many metabolic and regulatory functions, and found in all forms of life. Human genome contains more than 100 genes encoding proteins with known zincin-like domains. A survey of all proteins containing the HExxH motif shows that approximately 52% of HExxH occurrences fall within known protein structural domains (as defined in the Pfam database). Domain families with majority of members possessing a conserved HExxH motif include, not surprisingly, many known and putative metalloproteases. Furthermore, several HExxH-containing protein domains thus identified can be confidently predicted to be putative peptidases of zincin fold. Thus, we predict zincin-like fold for eight uncharacterised Pfam families. Besides the domains with the HExxH motif strictly conserved, and those with sporadic occurrences, intermediate families are identified that contain some members with a conserved HExxH motif, but also many homologues with substitutions at the conserved positions. Such substitutions can be evolutionarily conserved and non-random, yet functional roles of these inactive zincins are not known. The CLCAs are a novel zincin-like protease family with many cases of substituted active sites. We show that this allegedly metazoan family has a number of bacterial and archaeal members. An extremely patchy phylogenetic distribution of CLCAs in prokaryotes and their conserved protein domain composition strongly suggests an evolutionary scenario of horizontal gene transfer (HGT) from multicellular eukaryotes to bacteria, providing an example of eukaryote-derived xenologues in bacterial genomes. Additionally, in a protein family identified here as closely homologous to CLCA, the CLCA_X (CLCA-like) family, a number of proteins is found in phages and plasmids, supporting the HGT scenario.
Collapse
Affiliation(s)
- Anna Lenart
- Department of Cellular and Molecular Neurobiology, Nencki Institute of Experimental Biology, Polish Academy of Sciences, Warsaw, Poland
| | - Małgorzata Dudkiewicz
- Faculty of Agriculture and Biology, Warsaw University of Life Sciences, Warsaw, Poland
| | - Marcin Grynberg
- Department of Genetics, Institute of Biochemistry and Biophysics, Polish Academy of Sciences,Warsaw, Poland
| | - Krzysztof Pawłowski
- Department of Cellular and Molecular Neurobiology, Nencki Institute of Experimental Biology, Polish Academy of Sciences, Warsaw, Poland
- Faculty of Agriculture and Biology, Warsaw University of Life Sciences, Warsaw, Poland
- * E-mail:
| |
Collapse
|
42
|
Johansson MU, Zoete V, Guex N. Recurrent structural motifs in non-homologous protein structures. Int J Mol Sci 2013; 14:7795-814. [PMID: 23574940 PMCID: PMC3645717 DOI: 10.3390/ijms14047795] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2013] [Revised: 03/27/2013] [Accepted: 04/01/2013] [Indexed: 11/18/2022] Open
Abstract
We have extracted an extensive collection of recurrent structural motifs (RSMs), which consist of sequentially non-contiguous structural motifs (4–6 residues), each of which appears with very similar conformation in three or more mutually unrelated protein structures. We find that the proteins in our set are covered to a substantial extent by the recurrent non-contiguous structural motifs, especially the helix and strand regions. Computational alanine scanning calculations indicate that the average folding free energy changes upon alanine mutation for most types of non-alanine residues are higher for amino acids that are present in recurrent structural motifs than for amino acids that are not. The non-alanine amino acids that are most common in the recurrent structural motifs, i.e., phenylalanine, isoleucine, leucine, valine and tyrosine and the less abundant methionine and tryptophan, have the largest folding free energy changes. This indicates that the recurrent structural motifs, as we define them, describe recurrent structural patterns that are important for protein stability. In view of their properties, such structural motifs are potentially useful for inter-residue contact prediction and protein structure refinement.
Collapse
Affiliation(s)
- Maria U. Johansson
- Vital-IT Group, SIB Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland
- Authors to whom correspondence should be addressed; E-Mails: (M.U.J.); (N.G.); Tel.: +41-21-692-40-86 (M.U.J.); +41-21-692-40-37 (N.G.); Fax: +41-21-692-40-65 (M.U.J. & N.G.)
| | - Vincent Zoete
- Molecular Modelling Group, SIB Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland; E-Mail:
| | - Nicolas Guex
- Vital-IT Group, SIB Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland
- Authors to whom correspondence should be addressed; E-Mails: (M.U.J.); (N.G.); Tel.: +41-21-692-40-86 (M.U.J.); +41-21-692-40-37 (N.G.); Fax: +41-21-692-40-65 (M.U.J. & N.G.)
| |
Collapse
|
43
|
Fernandez-Fuentes N, Fiser A. A modular perspective of protein structures: application to fragment based loop modeling. Methods Mol Biol 2013; 932:141-58. [PMID: 22987351 PMCID: PMC3635063 DOI: 10.1007/978-1-62703-065-6_9] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Proteins can be decomposed into supersecondary structure modules. We used a generic definition of supersecondary structure elements, so-called Smotifs, which are composed of two flanking regular secondary structures connected by a loop, to explore the evolution and current variety of structure building blocks. Here, we discuss recent observations about the saturation of Smotif geometries in protein structures and how it opens new avenues in protein structure modeling and design. As a first application of these observations we describe our loop conformation modeling algorithm, ArchPred that takes advantage of Smotifs classification. In this application, instead of focusing on specific loop properties the method narrows down possible template conformations in other, often not homologous structures, by identifying the most likely supersecondary structure environment that cradles the loop. Beyond identifying the correct starting supersecondary structure geometry, it takes into account information of fit of anchor residues, sterical clashes, match of predicted and observed dihedral angle preferences, and local sequence signal.
Collapse
Affiliation(s)
- Narcis Fernandez-Fuentes
- Leeds Institute of Molecular Medicine, Section of Experimental Therapeutics, University of Leeds, St. James's University Hospital, Leeds LS9 7TF, UK
| | - Andras Fiser
- Department of Systems and Computational Biology, Department of Biochemistry Albert Einstein College of Medicine, 1301 Morris Park Ave, Bronx, NY 10461, USA
| |
Collapse
|
44
|
Bonitz T, Alva V, Saleh O, Lupas AN, Heide L. Evolutionary relationships of microbial aromatic prenyltransferases. PLoS One 2011; 6:e27336. [PMID: 22140437 PMCID: PMC3227686 DOI: 10.1371/journal.pone.0027336] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2011] [Accepted: 10/14/2011] [Indexed: 11/19/2022] Open
Abstract
The linkage of isoprenoid and aromatic moieties, catalyzed by aromatic prenyltransferases (PTases), leads to an impressive diversity of primary and secondary metabolites, including important pharmaceuticals and toxins. A few years ago, a hydroxynaphthalene PTase, NphB, featuring a novel ten-stranded β-barrel fold was identified in Streptomyces sp. strain CL190. This fold, termed the PT-barrel, is formed of five tandem ααββ structural repeats and remained exclusive to the NphB family until its recent discovery in the DMATS family of indole PTases. Members of these two families exist only in fungi and bacteria, and all of them appear to catalyze the prenylation of aromatic substrates involved in secondary metabolism. Sequence comparisons using PSI-BLAST do not yield matches between these two families, suggesting that they may have converged upon the same fold independently. However, we now provide evidence for a common ancestry for the NphB and DMATS families of PTases. We also identify sequence repeats that coincide with the structural repeats in proteins belonging to these two families. Therefore we propose that the PT-barrel arose by amplification of an ancestral ααββ module. In view of their homology and their similarities in structure and function, we propose to group the NphB and DMATS families together into a single superfamily, the PT-barrel superfamily.
Collapse
Affiliation(s)
- Tobias Bonitz
- Pharmaceutical Institute, Eberhard Karls-Universität Tübingen, Tübingen, Germany
| | - Vikram Alva
- Department of Protein Evolution, Max-Planck-Institute for Developmental Biology, Tübingen, Germany
| | - Orwah Saleh
- Pharmaceutical Institute, Eberhard Karls-Universität Tübingen, Tübingen, Germany
| | - Andrei N. Lupas
- Department of Protein Evolution, Max-Planck-Institute for Developmental Biology, Tübingen, Germany
| | - Lutz Heide
- Pharmaceutical Institute, Eberhard Karls-Universität Tübingen, Tübingen, Germany
- * E-mail:
| |
Collapse
|
45
|
Abstract
Gene evolution has long been thought to be primarily driven by duplication and rearrangement mechanisms. However, every evolutionary lineage harbours orphan genes that lack homologues in other lineages and whose evolutionary origin is only poorly understood. Orphan genes might arise from duplication and rearrangement processes followed by fast divergence; however, de novo evolution out of non-coding genomic regions is emerging as an important additional mechanism. This process appears to provide raw material continuously for the evolution of new gene functions, which can become relevant for lineage-specific adaptations.
Collapse
|
46
|
Omelchenko MV, Galperin MY, Wolf YI, Koonin EV. Non-homologous isofunctional enzymes: a systematic analysis of alternative solutions in enzyme evolution. Biol Direct 2010; 5:31. [PMID: 20433725 PMCID: PMC2876114 DOI: 10.1186/1745-6150-5-31] [Citation(s) in RCA: 108] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2010] [Accepted: 04/30/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Evolutionarily unrelated proteins that catalyze the same biochemical reactions are often referred to as analogous - as opposed to homologous - enzymes. The existence of numerous alternative, non-homologous enzyme isoforms presents an interesting evolutionary problem; it also complicates genome-based reconstruction of the metabolic pathways in a variety of organisms. In 1998, a systematic search for analogous enzymes resulted in the identification of 105 Enzyme Commission (EC) numbers that included two or more proteins without detectable sequence similarity to each other, including 34 EC nodes where proteins were known (or predicted) to have distinct structural folds, indicating independent evolutionary origins. In the past 12 years, many putative non-homologous isofunctional enzymes were identified in newly sequenced genomes. In addition, efforts in structural genomics resulted in a vastly improved structural coverage of proteomes, providing for definitive assessment of (non)homologous relationships between proteins. RESULTS We report the results of a comprehensive search for non-homologous isofunctional enzymes (NISE) that yielded 185 EC nodes with two or more experimentally characterized - or predicted - structurally unrelated proteins. Of these NISE sets, only 74 were from the original 1998 list. Structural assignments of the NISE show over-representation of proteins with the TIM barrel fold and the nucleotide-binding Rossmann fold. From the functional perspective, the set of NISE is enriched in hydrolases, particularly carbohydrate hydrolases, and in enzymes involved in defense against oxidative stress. CONCLUSIONS These results indicate that at least some of the non-homologous isofunctional enzymes were recruited relatively recently from enzyme families that are active against related substrates and are sufficiently flexible to accommodate changes in substrate specificity.
Collapse
Affiliation(s)
- Marina V Omelchenko
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Michael Y Galperin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| |
Collapse
|
47
|
Fernandez-Fuentes N, Dybas JM, Fiser A. Structural characteristics of novel protein folds. PLoS Comput Biol 2010; 6:e1000750. [PMID: 20421995 PMCID: PMC2858679 DOI: 10.1371/journal.pcbi.1000750] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2009] [Accepted: 03/19/2010] [Indexed: 11/29/2022] Open
Abstract
Folds are the basic building blocks of protein structures. Understanding the emergence of novel protein folds is an important step towards understanding the rules governing the evolution of protein structure and function and for developing tools for protein structure modeling and design. We explored the frequency of occurrences of an exhaustively classified library of supersecondary structural elements (Smotifs), in protein structures, in order to identify features that would define a fold as novel compared to previously known structures. We found that a surprisingly small set of Smotifs is sufficient to describe all known folds. Furthermore, novel folds do not require novel Smotifs, but rather are a new combination of existing ones. Novel folds can be typified by the inclusion of a relatively higher number of rarely occurring Smotifs in their structures and, to a lesser extent, by a novel topological combination of commonly occurring Smotifs. When investigating the structural features of Smotifs, we found that the top 10% of most frequent ones have a higher fraction of internal contacts, while some of the most rare motifs are larger, and contain a longer loop region. Structural genomics efforts aim at exploring the repertoire of three-dimensional structures of protein molecules. While genome scale sequencing projects have already provided us with all the genes of many organisms, it is the three dimensional shape of gene encoded proteins that defines all the interactions among these components. Understanding the versatility and, ultimately, the role of all possible molecular shapes in the cell is a necessary step toward understanding how organisms function. In this work we explored the rules that identify certain shapes as novel compared to all already known structures. The findings of this work provide possible insights into the rules that can be used in future works to identify or design new molecular shapes or to relate folds with each other in a quantitative manner.
Collapse
Affiliation(s)
- Narcis Fernandez-Fuentes
- University of Leeds, Leeds Institute of Molecular Medicine Section of Experimental Therapeutics, St. James's University Hospital, Leeds, United Kingdom
| | - Joseph M. Dybas
- Department of Systems and Computational Biology, Department of Biochemistry, Albert Einstein College of Medicine, Bronx, New York, United States of America
| | - Andras Fiser
- Department of Systems and Computational Biology, Department of Biochemistry, Albert Einstein College of Medicine, Bronx, New York, United States of America
- * E-mail:
| |
Collapse
|