1
|
Flamholz AI, Goyal A, Fischer WW, Newman DK, Phillips R. The proteome is a terminal electron acceptor. Proc Natl Acad Sci U S A 2025; 122:e2404048121. [PMID: 39752522 PMCID: PMC11725909 DOI: 10.1073/pnas.2404048121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Accepted: 10/30/2024] [Indexed: 01/15/2025] Open
Abstract
Microbial metabolism is impressively flexible, enabling growth even when available nutrients differ greatly from biomass in redox state. Escherichia coli, for example, rearranges its physiology to grow on reduced and oxidized carbon sources through several forms of fermentation and respiration. To understand the limits on and evolutionary consequences of this metabolic flexibility, we developed a coarse-grained mathematical framework coupling redox chemistry with principles of cellular resource allocation. Our models inherit key qualities from both of their antecedents: i) describing diverse metabolic chemistries and ii) enforcing the simultaneous balancing of atom (e.g., carbon), electron, and energy (adenosine triphosphate) flows, as in redox models, while iii) treating biomass as both the product and catalyst of the growth process, as in resource allocation models. Assembling integrated models of respiration, fermentation, and photosynthesis clarified key microbiological phenomena, including demonstrating that autotrophs grow more slowly than heterotrophs because of constraints imposed by the intracellular production of reduced carbon. Our model further predicted that heterotrophic growth is improved by matching the redox state of biomass to the nutrient environment. Through analysis of [Formula: see text]60,000 genomes and diverse proteomic datasets, we found evidence that proteins indeed accumulate amino acid substitutions promoting redox matching. We therefore propose an unexpected mode of genome evolution where substitutions neutral or even deleterious to the individual biochemical or structural functions of proteins can nonetheless be selected due to a redox-chemical benefit to the population.
Collapse
Affiliation(s)
- Avi I. Flamholz
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA91125
| | - Akshit Goyal
- Physics of Living Systems, Department of Physics, Massachusetts Institute of Technology, CambridgeMA02139
- International Centre for Theoretical Sciences, Tata Institute of Fundamental Research, Bengaluru560089, India
| | - Woodward W. Fischer
- Division of Geological and Planetary Sciences, California Institute of Technology, Pasadena, CA91125
| | - Dianne K. Newman
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA91125
- Division of Geological and Planetary Sciences, California Institute of Technology, Pasadena, CA91125
| | - Rob Phillips
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA91125
- Division of Physics, Mathematics and Astronomy, California Institute of Technology, Pasadena, CA91125
| |
Collapse
|
2
|
O'Connor PBF. The Evolutionary Transition of the RNA World to Obcells to Cellular-Based Life. J Mol Evol 2024; 92:278-285. [PMID: 38683368 DOI: 10.1007/s00239-024-10171-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Accepted: 04/08/2024] [Indexed: 05/01/2024]
Abstract
The obcell hypothesis is a proposed route for the RNA world to develop into a primitive cellular one. It posits that this transition began with the emergence of the proto-ribosome which enabled RNA to colonise the external surface of lipids by the synthesis of amphipathic peptidyl-RNAs. The obcell hypothesis also posits that the emergence of a predation-based ecosystem provided a selection mechanism for continued sophistication amongst early life forms. Here, I argue for this hypothesis owing to its significant explanatory power; it offers a rationale why a ribosome which initially was capable only of producing short non-coded peptides was advantageous and it forgoes issues related to maintaining a replicating RNA inside a lipid enclosure. I develop this model by proposing that the evolutionary selection for improved membrane anchors resulted in the emergence of primitive membrane pores which enabled obcells to gradually evolve into a cellular morphology. Moreover, I introduce a model of obcell production which advances that tRNAs developed from primers of the RNA world.
Collapse
|
3
|
Rozhoňová H, Martí-Gómez C, McCandlish DM, Payne JL. Robust genetic codes enhance protein evolvability. PLoS Biol 2024; 22:e3002594. [PMID: 38754362 PMCID: PMC11098591 DOI: 10.1371/journal.pbio.3002594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 03/19/2024] [Indexed: 05/18/2024] Open
Abstract
The standard genetic code defines the rules of translation for nearly every life form on Earth. It also determines the amino acid changes accessible via single-nucleotide mutations, thus influencing protein evolvability-the ability of mutation to bring forth adaptive variation in protein function. One of the most striking features of the standard genetic code is its robustness to mutation, yet it remains an open question whether such robustness facilitates or frustrates protein evolvability. To answer this question, we use data from massively parallel sequence-to-function assays to construct and analyze 6 empirical adaptive landscapes under hundreds of thousands of rewired genetic codes, including those of codon compression schemes relevant to protein engineering and synthetic biology. We find that robust genetic codes tend to enhance protein evolvability by rendering smooth adaptive landscapes with few peaks, which are readily accessible from throughout sequence space. However, the standard genetic code is rarely exceptional in this regard, because many alternative codes render smoother landscapes than the standard code. By constructing low-dimensional visualizations of these landscapes, which each comprise more than 16 million mRNA sequences, we show that such alternative codes radically alter the topological features of the network of high-fitness genotypes. Whereas the genetic codes that optimize evolvability depend to some extent on the detailed relationship between amino acid sequence and protein function, we also uncover general design principles for engineering nonstandard genetic codes for enhanced and diminished evolvability, which may facilitate directed protein evolution experiments and the bio-containment of synthetic organisms, respectively.
Collapse
Affiliation(s)
- Hana Rozhoňová
- Institute of Integrative Biology, ETH Zürich, Zürich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Carlos Martí-Gómez
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - David M. McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - Joshua L. Payne
- Institute of Integrative Biology, ETH Zürich, Zürich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
4
|
Flamholz AI, Goyal A, Fischer WW, Newman DK, Phillips R. The proteome is a terminal electron acceptor. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.31.578293. [PMID: 38352589 PMCID: PMC10862836 DOI: 10.1101/2024.01.31.578293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/25/2024]
Abstract
Microbial metabolism is impressively flexible, enabling growth even when available nutrients differ greatly from biomass in redox state. E. coli, for example, rearranges its physiology to grow on reduced and oxidized carbon sources through several forms of fermentation and respiration. To understand the limits on and evolutionary consequences of metabolic flexibility, we developed a mathematical model coupling redox chemistry with principles of cellular resource allocation. Our integrated model clarifies key phenomena, including demonstrating that autotrophs grow slower than heterotrophs because of constraints imposed by intracellular production of reduced carbon. Our model further indicates that growth is improved by adapting the redox state of biomass to nutrients, revealing an unexpected mode of evolution where proteins accumulate mutations benefiting organismal redox balance.
Collapse
Affiliation(s)
- Avi I. Flamholz
- Division of Biology and Biological Engineering, California Institute of Technology; Pasadena, CA 91125
| | - Akshit Goyal
- Physics of Living Systems, Department of Physics, Massachusetts Institute of Technology; Cambridge, MA 02139
- International Centre for Theoretical Sciences, Tata Institute of Fundamental Research; Bengaluru 560089
| | - Woodward W. Fischer
- Division of Geological & Planetary Sciences, California Institute of Technology; Pasadena, CA 91125
| | - Dianne K. Newman
- Division of Biology and Biological Engineering, California Institute of Technology; Pasadena, CA 91125
- Division of Geological & Planetary Sciences, California Institute of Technology; Pasadena, CA 91125
| | - Rob Phillips
- Division of Biology and Biological Engineering, California Institute of Technology; Pasadena, CA 91125
- Department of Physics, California Institute of Technology; Pasadena, CA 91125, USA
| |
Collapse
|
5
|
Yarus M. Ordering events in a developing genetic code. RNA Biol 2024; 21:1-8. [PMID: 38169326 PMCID: PMC10766418 DOI: 10.1080/15476286.2023.2299615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Revised: 12/18/2023] [Accepted: 12/20/2023] [Indexed: 01/05/2024] Open
Abstract
Preexisting partial genetic codes can fuse to evolve towards the complete Standard Genetic Code (SGC). Such code fusion provides a path of 'least selection', readily generating precursor codes that resemble the SGC. Consequently, such least selections produce the SGC via minimal, thus rapid, change. Optimal code evolution therefore requires delayed wobble. Early wobble encoding slows code evolution, very specifically diminishing the most likely SGC precursors: near-complete, accurate codes which are the products of code fusions. In contrast: given delayed wobble, the SGC can emerge from a truncation selection/evolutionary radiation based on proficient fused coding.
Collapse
Affiliation(s)
- Michael Yarus
- Department of Molecular, Cellular and Developmental Biology, University of Colorado, Boulder, CO, USA
| |
Collapse
|
6
|
Davey-Young J, Hasan F, Tennakoon R, Rozik P, Moore H, Hall P, Cozma E, Genereaux J, Hoffman KS, Chan PP, Lowe TM, Brandl CJ, O’Donoghue P. Mistranslating the genetic code with leucine in yeast and mammalian cells. RNA Biol 2024; 21:1-23. [PMID: 38629491 PMCID: PMC11028032 DOI: 10.1080/15476286.2024.2340297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Revised: 02/04/2024] [Accepted: 04/03/2024] [Indexed: 04/19/2024] Open
Abstract
Translation fidelity relies on accurate aminoacylation of transfer RNAs (tRNAs) by aminoacyl-tRNA synthetases (AARSs). AARSs specific for alanine (Ala), leucine (Leu), serine, and pyrrolysine do not recognize the anticodon bases. Single nucleotide anticodon variants in their cognate tRNAs can lead to mistranslation. Human genomes include both rare and more common mistranslating tRNA variants. We investigated three rare human tRNALeu variants that mis-incorporate Leu at phenylalanine or tryptophan codons. Expression of each tRNALeu anticodon variant in neuroblastoma cells caused defects in fluorescent protein production without significantly increased cytotoxicity under normal conditions or in the context of proteasome inhibition. Using tRNA sequencing and mass spectrometry we confirmed that each tRNALeu variant was expressed and generated mistranslation with Leu. To probe the flexibility of the entire genetic code towards Leu mis-incorporation, we created 64 yeast strains to express all possible tRNALeu anticodon variants in a doxycycline-inducible system. While some variants showed mild or no growth defects, many anticodon variants, enriched with G/C at positions 35 and 36, including those replacing Leu for proline, arginine, alanine, or glycine, caused dramatic reductions in growth. Differential phenotypic defects were observed for tRNALeu mutants with synonymous anticodons and for different tRNALeu isoacceptors with the same anticodon. A comparison to tRNAAla anticodon variants demonstrates that Ala mis-incorporation is more tolerable than Leu at nearly every codon. The data show that the nature of the amino acid substitution, the tRNA gene, and the anticodon are each important factors that influence the ability of cells to tolerate mistranslating tRNAs.
Collapse
Affiliation(s)
- Josephine Davey-Young
- Department of Biochemistry, The University of Western Ontario, London, Ontario, Canada
| | - Farah Hasan
- Department of Biochemistry, The University of Western Ontario, London, Ontario, Canada
| | - Rasangi Tennakoon
- Department of Biochemistry, The University of Western Ontario, London, Ontario, Canada
| | - Peter Rozik
- Department of Biochemistry, The University of Western Ontario, London, Ontario, Canada
| | - Henry Moore
- Department of Biomolecular Engineering, Baskin School of Engineering & UCSC Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Peter Hall
- Department of Biochemistry, The University of Western Ontario, London, Ontario, Canada
| | - Ecaterina Cozma
- Department of Biochemistry, The University of Western Ontario, London, Ontario, Canada
| | - Julie Genereaux
- Department of Biochemistry, The University of Western Ontario, London, Ontario, Canada
| | | | - Patricia P. Chan
- Department of Biomolecular Engineering, Baskin School of Engineering & UCSC Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Todd M. Lowe
- Department of Biomolecular Engineering, Baskin School of Engineering & UCSC Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Christopher J. Brandl
- Department of Biochemistry, The University of Western Ontario, London, Ontario, Canada
| | - Patrick O’Donoghue
- Department of Biochemistry, The University of Western Ontario, London, Ontario, Canada
- Department of Chemistry, The University of Western Ontario, London, Ontario, Canada
| |
Collapse
|
7
|
Katoh T, Suga H. A comprehensive analysis of translational misdecoding pattern and its implication on genetic code evolution. Nucleic Acids Res 2023; 51:10642-10652. [PMID: 37638759 PMCID: PMC10602915 DOI: 10.1093/nar/gkad707] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2023] [Revised: 07/19/2023] [Accepted: 08/19/2023] [Indexed: 08/29/2023] Open
Abstract
The universal genetic code is comprised of 61 sense codons, which are assigned to 20 canonical amino acids. However, the evolutionary basis for the highly conserved mapping between amino acids and their codons remains incompletely understood. A possible selective pressure of evolution would be minimization of deleterious effects caused by misdecoding. Here we comprehensively analyzed the misdecoding pattern of 61 codons against 19 noncognate amino acids where an arbitrary amino acid was omitted, and revealed the following two rules. (i) If the second codon base is U or C, misdecoding is frequently induced by mismatches at the first and/or third base, where any mismatches are widely tolerated; whereas misdecoding with the second-base mismatch is promoted by only U-G or C-A pair formation. (ii) If the second codon base is A or G, misdecoding is promoted by only G-U or U-G pair formation at the first or second position. In addition, evaluation of functional/structural diversities of amino acids revealed that less diverse amino acid sets are assigned at codons that induce more frequent misdecoding, and vice versa, so as to minimize deleterious effects of misdecoding in the modern genetic code.
Collapse
Affiliation(s)
- Takayuki Katoh
- Department of Chemistry, Graduate School of Science, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan
| | - Hiroaki Suga
- Department of Chemistry, Graduate School of Science, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan
| |
Collapse
|
8
|
Yarus M. The Genetic Code Assembles via Division and Fusion, Basic Cellular Events. Life (Basel) 2023; 13:2069. [PMID: 37895450 PMCID: PMC10608286 DOI: 10.3390/life13102069] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 10/04/2023] [Accepted: 10/07/2023] [Indexed: 10/29/2023] Open
Abstract
Standard Genetic Code (SGC) evolution is quantitatively modeled in up to 2000 independent coding 'environments'. Environments host multiple codes that may fuse or divide, with division yielding identical descendants. Code division may be selected-sophisticated gene products could be required for an orderly separation that preserves the coding. Several unforeseen results emerge: more rapid evolution requires unselective code division rather than its selective form. Combining selective and unselective code division, with/without code fusion, with/without independent environmental coding tables, and with/without wobble defines 25 = 32 possible pathways for SGC evolution. These 32 possible histories are compared, specifically, for evolutionary speed and code accuracy. Pathways differ greatly, for example, by ≈300-fold in time to evolve SGC-like codes. Eight of thirty-two pathways employing code division evolve quickly. Four of these eight that combine fusion and division also unite speed and accuracy. The two most precise, swiftest paths; thus the most likely routes to the SGC are similar, differing only in fusion with independent environmental codes. Code division instead of fusion with unrelated codes implies that exterior codes can be dispensable. Instead, a single ancestral code that divides and fuses can initiate fully encoded peptide biosynthesis. Division and fusion create a 'crescendo of competent coding', facilitating the search for the SGC and also assisting the advent of otherwise uniformly disfavored wobble coding. Code fusion can unite multiple codon assignment mechanisms. However, via code division and fusion, an SGC can emerge from a single primary origin via familiar cellular events.
Collapse
Affiliation(s)
- Michael Yarus
- Department of Molecular, Cellular and Developmental Biology, University of Colorado, Boulder, CO 80309-0347, USA
| |
Collapse
|
9
|
Cuevas-Zuviría B, Adam ZR, Goldman AD, Kaçar B. Informatic Capabilities of Translation and Its Implications for the Origins of Life. J Mol Evol 2023; 91:567-569. [PMID: 37526692 DOI: 10.1007/s00239-023-10125-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Accepted: 06/22/2023] [Indexed: 08/02/2023]
Abstract
The ability to encode and convert heritable information into molecular function is a defining feature of life as we know it. The conversion of information into molecular function is performed by the translation process, in which triplets of nucleotides in a nucleic acid polymer (mRNA) encode specific amino acids in a protein polymer that folds into a three-dimensional structure. The folded protein then performs one or more molecular activities, often as one part of a complex and coordinated physiological network. Prebiotic systems, lacking the ability to explicitly translate information between genotype and phenotype, would have depended upon either chemosynthetic pathways to generate its components-constraining its complexity and evolvability- or on the ambivalence of RNA as both carrier of information and of catalytic functions-a possibility which is still supported by a very limited set of catalytic RNAs. Thus, the emergence of translation during early evolutionary history may have allowed life to unmoor from the setting of its origin. The origin of translation machinery also represents an entirely novel and distinct threshold of behavior for which there is no abiotic counterpart-it could be the only known example of computing that emerged naturally at the chemical level. Here we describe translation machinery's decoding system as the basis of cellular translation's information-processing capabilities, and the four operation types that find parallels in computer systems engineering that this biological machinery exhibits.
Collapse
Affiliation(s)
- Bruno Cuevas-Zuviría
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, USA.
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid, Madrid, Spain.
| | - Zachary R Adam
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, USA
- Department of Geosciences, University of Wisconsin-Madison, Madison, WI, USA
| | | | - Betül Kaçar
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, USA
| |
Collapse
|
10
|
Lamolle G, Simón D, Iriarte A, Musto H. Main Factors Shaping Amino Acid Usage Across Evolution. J Mol Evol 2023:10.1007/s00239-023-10120-5. [PMID: 37264211 DOI: 10.1007/s00239-023-10120-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 05/17/2023] [Indexed: 06/03/2023]
Abstract
The standard genetic code determines that in most species, including viruses, there are 20 amino acids that are coded by 61 codons, while the other three codons are stop triplets. Considering the whole proteome each species features its own amino acid frequencies, given the slow rate of change, closely related species display similar GC content and amino acids usage. In contrast, distantly related species display different amino acid frequencies. Furthermore, within certain multicellular species, as mammals, intragenomic differences in the usage of amino acids are evident. In this communication, we shall summarize some of the most prominent and well-established factors that determine the differences found in the amino acid usage, both across evolution and intragenomically.
Collapse
Affiliation(s)
- Guillermo Lamolle
- Laboratorio de Genómica Evolutiva, Facultad de Ciencias, Universidad de La República, Montevideo, Uruguay
| | - Diego Simón
- Laboratorio de Genómica Evolutiva, Facultad de Ciencias, Universidad de La República, Montevideo, Uruguay
- Laboratorio de Virología Molecular, Centro de Investigaciones Nucleares, Facultad de Ciencias, Universidad de La República, Montevideo, Uruguay
- Laboratorio de Evolución Experimental de Virus, Institut Pasteur de Montevideo, Montevideo, Uruguay
| | - Andrés Iriarte
- Laboratorio de Genómica Evolutiva, Facultad de Ciencias, Universidad de La República, Montevideo, Uruguay
- Laboratorio de Biología Computacional, Departamento de Desarrollo Biotecnológico, Instituto de Higiene, Facultad de Medicina, Universidad de La República, Montevideo, Uruguay
| | - Héctor Musto
- Laboratorio de Genómica Evolutiva, Facultad de Ciencias, Universidad de La República, Montevideo, Uruguay.
| |
Collapse
|
11
|
Abstract
The mechanism and the evolution of DNA replication and transcription, the key elements of the central dogma of biology, are fundamentally well explained by the physicochemical complementarity between strands of nucleic acids. However, the determinants that have shaped the third part of the dogma-the process of biological translation and the universal genetic code-remain unclear. We review and seek parallels between different proposals that view the evolution of translation through the prism of weak, noncovalent interactions between biological macromolecules. In particular, we focus on a recent proposal that there exists a hitherto unrecognized complementarity at the heart of biology, that between messenger RNA coding regions and the proteins that they encode, especially if the two are unstructured. Reflecting the idea that the genetic code evolved from intrinsic binding propensities between nucleotides and amino acids, this proposal promises to forge a link between the distant past and the present of biological systems.
Collapse
Affiliation(s)
- Bojan Zagrovic
- Department of Structural and Computational Biology, Max Perutz Labs & University of Vienna, Vienna, Austria;
| | - Marlene Adlhart
- Department of Structural and Computational Biology, Max Perutz Labs & University of Vienna, Vienna, Austria;
| | - Thomas H Kapral
- Department of Structural and Computational Biology, Max Perutz Labs & University of Vienna, Vienna, Austria;
- Vienna BioCenter PhD Program, Doctoral School of the University of Vienna and Medical University of Vienna, Vienna, Austria
| |
Collapse
|
12
|
Halpern A, Bartsch LR, Ibrahim K, Harrison SA, Ahn M, Christodoulou J, Lane N. Biophysical Interactions Underpin the Emergence of Information in the Genetic Code. Life (Basel) 2023; 13:1129. [PMID: 37240774 PMCID: PMC10221087 DOI: 10.3390/life13051129] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 04/25/2023] [Accepted: 04/30/2023] [Indexed: 05/28/2023] Open
Abstract
The genetic code conceals a 'code within the codons', which hints at biophysical interactions between amino acids and their cognate nucleotides. Yet, research over decades has failed to corroborate systematic biophysical interactions across the code. Using molecular dynamics simulations and NMR, we have analysed interactions between the 20 standard proteinogenic amino acids and 4 RNA mononucleotides in 3 charge states. Our simulations show that 50% of amino acids bind best with their anticodonic middle base in the -1 charge state common to the backbone of RNA, while 95% of amino acids interact most strongly with at least 1 of their codonic or anticodonic bases. Preference for the cognate anticodonic middle base was greater than 99% of randomised assignments. We verify a selection of our results using NMR, and highlight challenges with both techniques for interrogating large numbers of weak interactions. Finally, we extend our simulations to a range of amino acids and dinucleotides, and corroborate similar preferences for cognate nucleotides. Despite some discrepancies between the predicted patterns and those observed in biology, the existence of weak stereochemical interactions means that random RNA sequences could template non-random peptides. This offers a compelling explanation for the emergence of genetic information in biology.
Collapse
Affiliation(s)
- Aaron Halpern
- UCL Centre for Life’s Origins and Evolution (CLOE), Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Lilly R. Bartsch
- UCL Centre for Life’s Origins and Evolution (CLOE), Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Kaan Ibrahim
- UCL Centre for Life’s Origins and Evolution (CLOE), Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Stuart A. Harrison
- UCL Centre for Life’s Origins and Evolution (CLOE), Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Minkoo Ahn
- Department of Structural and Molecular Biology, Institute of Structural and Molecular Biology (ISMB), University College London, London WC1E 6BT, UK
| | - John Christodoulou
- Department of Structural and Molecular Biology, Institute of Structural and Molecular Biology (ISMB), University College London, London WC1E 6BT, UK
| | - Nick Lane
- UCL Centre for Life’s Origins and Evolution (CLOE), Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| |
Collapse
|
13
|
Omachi Y, Saito N, Furusawa C. Rare-event sampling analysis uncovers the fitness landscape of the genetic code. PLoS Comput Biol 2023; 19:e1011034. [PMID: 37068098 PMCID: PMC10138212 DOI: 10.1371/journal.pcbi.1011034] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 04/27/2023] [Accepted: 03/16/2023] [Indexed: 04/18/2023] Open
Abstract
The genetic code refers to a rule that maps 64 codons to 20 amino acids. Nearly all organisms, with few exceptions, share the same genetic code, the standard genetic code (SGC). While it remains unclear why this universal code has arisen and been maintained during evolution, it may have been preserved under selection pressure. Theoretical studies comparing the SGC and numerically created hypothetical random genetic codes have suggested that the SGC has been subject to strong selection pressure for being robust against translation errors. However, these prior studies have searched for random genetic codes in only a small subspace of the possible code space due to limitations in computation time. Thus, how the genetic code has evolved, and the characteristics of the genetic code fitness landscape, remain unclear. By applying multicanonical Monte Carlo, an efficient rare-event sampling method, we efficiently sampled random codes from a much broader random ensemble of genetic codes than in previous studies, estimating that only one out of every 1020 random codes is more robust than the SGC. This estimate is significantly smaller than the previous estimate, one in a million. We also characterized the fitness landscape of the genetic code that has four major fitness peaks, one of which includes the SGC. Furthermore, genetic algorithm analysis revealed that evolution under such a multi-peaked fitness landscape could be strongly biased toward a narrow peak, in an evolutionary path-dependent manner.
Collapse
Affiliation(s)
- Yuji Omachi
- Graduate School of Sciences, The University of Tokyo, Hongo, Tokyo, Japan
| | - Nen Saito
- Graduate School of Integrated Sciences for Life, Hiroshima University, Higashi-Hiroshima City, Hiroshima, Japan
- Exploratory Research Center on Life and Living Systems, National Institutes of Natural Sciences, Okazaki, Aichi, Japan
- Universal Biology Institute, The University of Tokyo, Hongo, Tokyo, Japan
| | - Chikara Furusawa
- Graduate School of Sciences, The University of Tokyo, Hongo, Tokyo, Japan
- Universal Biology Institute, The University of Tokyo, Hongo, Tokyo, Japan
- Center for Biosystems Dynamics Research, RIKEN, Suita, Osaka, Japan
| |
Collapse
|
14
|
Chung C, Verheijen BM, Navapanich Z, McGann EG, Shemtov S, Lai GJ, Arora P, Towheed A, Haroon S, Holczbauer A, Chang S, Manojlovic Z, Simpson S, Thomas KW, Kaplan C, van Hasselt P, Timmers M, Erie D, Chen L, Gout JF, Vermulst M. Evolutionary conservation of the fidelity of transcription. Nat Commun 2023; 14:1547. [PMID: 36941254 PMCID: PMC10027832 DOI: 10.1038/s41467-023-36525-w] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Accepted: 02/03/2023] [Indexed: 03/23/2023] Open
Abstract
Accurate transcription is required for the faithful expression of genetic information. However, relatively little is known about the molecular mechanisms that control the fidelity of transcription, or the conservation of these mechanisms across the tree of life. To address these issues, we measured the error rate of transcription in five organisms of increasing complexity and found that the error rate of RNA polymerase II ranges from 2.9 × 10-6 ± 1.9 × 10-7/bp in yeast to 4.0 × 10-6 ± 5.2 × 10-7/bp in worms, 5.69 × 10-6 ± 8.2 × 10-7/bp in flies, 4.9 × 10-6 ± 3.6 × 10-7/bp in mouse cells and 4.7 × 10-6 ± 9.9 × 10-8/bp in human cells. These error rates were modified by various factors including aging, mutagen treatment and gene modifications. For example, the deletion or modification of several related genes increased the error rate substantially in both yeast and human cells. This research highlights the evolutionary conservation of factors that control the fidelity of transcription. Additionally, these experiments provide a reasonable estimate of the error rate of transcription in human cells and identify disease alleles in a subunit of RNA polymerase II that display error-prone transcription. Finally, we provide evidence suggesting that the error rate and spectrum of transcription co-evolved with our genetic code.
Collapse
Affiliation(s)
- Claire Chung
- School of Gerontology, University of Southern California, Los Angeles, CA, USA
| | - Bert M Verheijen
- School of Gerontology, University of Southern California, Los Angeles, CA, USA
| | - Zoe Navapanich
- School of Gerontology, University of Southern California, Los Angeles, CA, USA
| | - Eric G McGann
- School of Gerontology, University of Southern California, Los Angeles, CA, USA
| | - Sarah Shemtov
- School of Gerontology, University of Southern California, Los Angeles, CA, USA
| | - Guan-Ju Lai
- School of Gerontology, University of Southern California, Los Angeles, CA, USA
| | - Payal Arora
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA, USA
| | - Atif Towheed
- Children's hospital of Philadelphia, Center for Mitochondrial and Epigenomic Medicine, Philadelphia, PA, USA
| | - Suraiya Haroon
- Children's hospital of Philadelphia, Center for Mitochondrial and Epigenomic Medicine, Philadelphia, PA, USA
| | - Agnes Holczbauer
- Children's hospital of Philadelphia, Center for Mitochondrial and Epigenomic Medicine, Philadelphia, PA, USA
| | - Sharon Chang
- Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Zarko Manojlovic
- Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Stephen Simpson
- College of Life Sciences and Agriculture, University of New Hampshire, Durham, NH, USA
| | - Kelley W Thomas
- College of Life Sciences and Agriculture, University of New Hampshire, Durham, NH, USA
| | - Craig Kaplan
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA, USA
| | - Peter van Hasselt
- Department of Metabolic Disease, University of Utrecht, Utrecht, the Netherlands
| | - Marc Timmers
- Department of Urology, Medical Center - University of Freiburg, Freiburg, Germany
- German Cancer Consortium (DKTK) Partner Site Freiburg, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Dorothy Erie
- Department of Chemistry, University of North Carolina, Chapel Hill, NC, USA
| | - Lin Chen
- Department of Molecular and Cellular Biology, University of Southern California, Los Angeles, CA, USA
| | - Jean-Franćois Gout
- Department of Biological Sciences, Mississippi State University, Mississippi State, MS, USA
| | - Marc Vermulst
- School of Gerontology, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
15
|
Błażej P, Kowalski DR, Mackiewicz D, Wnetrzak M, Aloqalaa DA, Mackiewicz P. The structure of the genetic code as an optimal graph clustering problem. J Math Biol 2022; 85:9. [PMID: 35838803 DOI: 10.1007/s00285-022-01778-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2017] [Revised: 06/20/2022] [Accepted: 06/24/2022] [Indexed: 11/29/2022]
Abstract
The standard genetic code (SGC) is the set of rules by which genetic information is translated into proteins, from codons, i.e. triplets of nucleotides, to amino acids. The questions about the origin and the main factor responsible for the present structure of the code are still under a hot debate. Various methodologies have been used to study the features of the code and assess the level of its potential optimality. Here, we introduced a new general approach to evaluate the quality of the genetic code structure. This methodology comes from graph theory and allows us to describe new properties of the genetic code in terms of conductance. This parameter measures the robustness of codon groups against the potential changes in translation of the protein-coding sequences generated by single nucleotide substitutions. We described the genetic code as a partition of an undirected and unweighted graph, which makes the model general and universal. Using this approach, we showed that the structure of the genetic code is a solution to the graph clustering problem. We presented and discussed the structure of the codes that are optimal according to the conductance. Despite the fact that the standard genetic code is far from being optimal according to the conductance, its structure is characterised by many codon groups reaching the minimum conductance for their size. The SGC represents most likely a local minimum in terms of errors occurring in protein-coding sequences and their translation.
Collapse
Affiliation(s)
- Paweł Błażej
- Department of Genomics, Faculty of Biotechnology, University of Wrocław, ul. Joliot-Curie 14a, Wrocław, Poland.
| | - Dariusz R Kowalski
- School of Computer and Cyber Sciences, Augusta University, Augusta, GA, USA
| | - Dorota Mackiewicz
- Department of Genomics, Faculty of Biotechnology, University of Wrocław, ul. Joliot-Curie 14a, Wrocław, Poland
| | - Małgorzata Wnetrzak
- Department of Genomics, Faculty of Biotechnology, University of Wrocław, ul. Joliot-Curie 14a, Wrocław, Poland
| | | | - Paweł Mackiewicz
- Department of Genomics, Faculty of Biotechnology, University of Wrocław, ul. Joliot-Curie 14a, Wrocław, Poland
| |
Collapse
|
16
|
Janzen E, Shen Y, Vázquez-Salazar A, Liu Z, Blanco C, Kenchel J, Chen IA. Emergent properties as by-products of prebiotic evolution of aminoacylation ribozymes. Nat Commun 2022; 13:3631. [PMID: 35752631 PMCID: PMC9233669 DOI: 10.1038/s41467-022-31387-0] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2021] [Accepted: 06/16/2022] [Indexed: 11/24/2022] Open
Abstract
Systems of catalytic RNAs presumably gave rise to important evolutionary innovations, such as the genetic code. Such systems may exhibit particular tolerance to errors (error minimization) as well as coding specificity. While often assumed to result from natural selection, error minimization may instead be an emergent by-product. In an RNA world, a system of self-aminoacylating ribozymes could enforce the mapping of amino acids to anticodons. We measured the activity of thousands of ribozyme mutants on alternative substrates (activated analogs for tryptophan, phenylalanine, leucine, isoleucine, valine, and methionine). Related ribozymes exhibited shared preferences for substrates, indicating that adoption of additional amino acids by existing ribozymes would itself lead to error minimization. Furthermore, ribozyme activity was positively correlated with specificity, indicating that selection for increased activity would also lead to increased specificity. These results demonstrate that by-products of ribozyme evolution could lead to adaptive value in specificity and error tolerance.
Collapse
Affiliation(s)
- Evan Janzen
- Program in Biomolecular Science and Engineering, University of California, Santa Barbara, CA, 93106, USA.,Department of Chemistry and Biochemistry, University of California, Santa Barbara, CA, 93106, USA
| | - Yuning Shen
- Department of Chemistry and Biochemistry, University of California, Santa Barbara, CA, 93106, USA.,Department of Chemical and Biomolecular Engineering, Department of Chemistry and Biochemistry, University of California, Los Angeles, CA, 90095, USA
| | - Alberto Vázquez-Salazar
- Department of Chemical and Biomolecular Engineering, Department of Chemistry and Biochemistry, University of California, Los Angeles, CA, 90095, USA
| | - Ziwei Liu
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0QH, UK
| | - Celia Blanco
- Department of Chemical and Biomolecular Engineering, Department of Chemistry and Biochemistry, University of California, Los Angeles, CA, 90095, USA
| | - Josh Kenchel
- Program in Biomolecular Science and Engineering, University of California, Santa Barbara, CA, 93106, USA.,Department of Chemistry and Biochemistry, University of California, Santa Barbara, CA, 93106, USA.,Department of Chemical and Biomolecular Engineering, Department of Chemistry and Biochemistry, University of California, Los Angeles, CA, 90095, USA
| | - Irene A Chen
- Program in Biomolecular Science and Engineering, University of California, Santa Barbara, CA, 93106, USA. .,Department of Chemistry and Biochemistry, University of California, Santa Barbara, CA, 93106, USA. .,Department of Chemical and Biomolecular Engineering, Department of Chemistry and Biochemistry, University of California, Los Angeles, CA, 90095, USA.
| |
Collapse
|
17
|
Wang X, Dong Q, Chen G, Zhang J, Liu Y, Cai Y. Frameshift and wild-type proteins are often highly similar because the genetic code and genomes were optimized for frameshift tolerance. BMC Genomics 2022; 23:416. [PMID: 35655139 PMCID: PMC9164415 DOI: 10.1186/s12864-022-08435-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Accepted: 03/02/2022] [Indexed: 11/10/2022] Open
Abstract
Frameshift mutations have been considered of significant importance for the molecular evolution of proteins and their coding genes, while frameshift protein sequences encoded in the alternative reading frames of coding genes have been considered to be meaningless. However, functional frameshifts have been found widely existing. It was puzzling how a frameshift protein kept its structure and functionality while substantial changes occurred in its primary amino-acid sequence. This study shows that the similarities among frameshifts and wild types are higher than random similarities and are determined at different levels. Frameshift substitutions are more conservative than random substitutions in the standard genetic code (SGC). The frameshift substitutions score of SGC ranks in the top 2.0-3.5% of alternative genetic codes, showing that SGC is nearly optimal for frameshift tolerance. In many genes and certain genomes, frameshift-resistant codons and codon pairs appear more frequently than expected, suggesting that frameshift tolerance is achieved through not only the optimality of the genetic code but, more importantly, the further optimization of a specific gene or genome through the usages of codons/codon pairs, which sheds light on the role of frameshift mutations in molecular and genomic evolution.
Collapse
Affiliation(s)
- Xiaolong Wang
- Department of Biotechnology, College of Marine Life Sciences, Ocean University of China, No. 5 Yushan Road, Shandong, Qingdao, 266003, P. R. China.
| | - Quanjiang Dong
- Qingdao Municipal Hospital, Qingdao, Shandong, 266003, P. R. China
| | - Gang Chen
- Department of Biotechnology, College of Marine Life Sciences, Ocean University of China, No. 5 Yushan Road, Shandong, Qingdao, 266003, P. R. China
| | - Jianye Zhang
- Department of Biotechnology, College of Marine Life Sciences, Ocean University of China, No. 5 Yushan Road, Shandong, Qingdao, 266003, P. R. China
| | - Yongqiang Liu
- Department of Biotechnology, College of Marine Life Sciences, Ocean University of China, No. 5 Yushan Road, Shandong, Qingdao, 266003, P. R. China
| | - Yujia Cai
- Department of Biotechnology, College of Marine Life Sciences, Ocean University of China, No. 5 Yushan Road, Shandong, Qingdao, 266003, P. R. China
| |
Collapse
|
18
|
Kondratyeva LG, Dyachkova MS, Galchenko AV. The Origin of Genetic Code and Translation in the Framework of Current Concepts on the Origin of Life. BIOCHEMISTRY. BIOKHIMIIA 2022; 87:150-169. [PMID: 35508902 DOI: 10.1134/s0006297922020079] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
The origin of genetic code and translation system is probably the central and most difficult problem in the investigations on the origin of life and one of the most complex problems in the evolutionary biology in general. There are multiple hypotheses on the emergence and development of existing genetic systems that propose the mechanisms for the origin and early evolution of genetic code, as well as for the emergence of replication and translation. Here, we discuss the most well-known of these hypotheses, although none of them provides a description of the early evolution of genetic systems without gaps and assumptions. The RNA world hypothesis is a currently prevailing scientific idea on the early evolution of biological and pre-biological structures, the main advantage of which is the assumption that RNAs as the first living systems were self-sufficient, i.e., capable of functioning as both catalysts and templates. However, this hypothesis has also significant limitations. In particular, no ribozymes with processive polymerase activity have been yet discovered or synthesized. Taking into account the mutual need of proteins and nucleic acids in each other in the current world, many authors propose the early evolution scenarios based on the co-evolution of these two classes of organic molecules. They postulate that the emergence of translation was necessary for the replication of nucleic acids, in contrast to the RNA world hypothesis, according to which the emergence of translation was preceded by the era of self-replicating RNAs. Although such scenarios are less parsimonious from the evolutionary point of view, since they require simultaneous emergence and evolution of two classes of organic molecules, as well as the emergence of synchronized replication and translation, their major advantage is that they explain the development of processive and much more accurate protein-dependent replication.
Collapse
Affiliation(s)
- Liya G Kondratyeva
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, 117997, Russia
| | | | - Alexey V Galchenko
- Peoples' Friendship University of Russia (RUDN University), Moscow, 117198, Russia.
| |
Collapse
|
19
|
Model of Genetic Code Structure Evolution under Various Types of Codon Reading. Int J Mol Sci 2022; 23:ijms23031690. [PMID: 35163612 PMCID: PMC8835785 DOI: 10.3390/ijms23031690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Revised: 01/23/2022] [Accepted: 01/25/2022] [Indexed: 11/28/2022] Open
Abstract
The standard genetic code (SGC) is a set of rules according to which 64 codons are assigned to 20 canonical amino acids and stop coding signal. As a consequence, the SGC is redundant because there is a greater number of codons than the number of encoded labels. This redundancy implies the existence of codons that encode the same genetic information. The size and organization of such synonymous codon blocks are important characteristics of the SGC structure whose evolution is still unclear. Therefore, we studied possible evolutionary mechanisms of the codon block structure. We conducted computer simulations assuming that coding systems at early stages of the SGC evolution were sets of ambiguous codon assignments with high entropy. We included three types of reading systems characterized by different inaccuracy and pattern of codon recognition. In contrast to the previous study, we allowed for evolution of the reading systems and their competition. The simulations performed under minimization of translational errors and reduction of coding ambiguity produced the coding system resistant to these errors. The reading system similar to that present in the SGC dominated the others very quickly. The survived system was also characterized by low entropy and possessed properties similar to that in the SGC. Our simulation show that the unambiguous SGC could emerged from a code with a lower level of ambiguity and the number of tRNAs increased during the evolution.
Collapse
|
20
|
Caldararo F, Di Giulio M. The genetic code is very close to a global optimum in a model of its origin taking into account both the partition energy of amino acids and their biosynthetic relationships. Biosystems 2022; 214:104613. [DOI: 10.1016/j.biosystems.2022.104613] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2021] [Revised: 01/16/2022] [Accepted: 01/17/2022] [Indexed: 01/23/2023]
|
21
|
Average and Standard Deviation of the Error Function for Random Genetic Codes with Standard Stop Codons. Acta Biotheor 2021; 70:7. [PMID: 34919168 DOI: 10.1007/s10441-021-09427-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Accepted: 09/27/2021] [Indexed: 10/19/2022]
Abstract
The origin of the genetic code has been attributed in part to an accidental assignment of codons to amino acids. Although several lines of evidence indicate the subsequent expansion and improvement of the genetic code, the hypothesis of Francis Crick concerning a frozen accident occurring at the early stage of genetic code evolution is still widely accepted. Considering Crick's hypothesis, mathematical descriptions of hypothetical scenarios involving a huge number of possible coexisting random genetic codes could be very important to explain the origin and evolution of a selected genetic code. This work aims to contribute in this regard, that is, it provides a theoretical framework in which statistical parameters of error functions are calculated. Given a genetic code and an amino acid property, the functional code robustness is estimated by means of a known error function. In this work, using analytical calculations, general expressions for the average and standard deviation of the error function distributions of completely random codes with standard stop codons were obtained. As a possible biological application of these results, any set of amino acids and any pure or mixed amino acid properties can be used in the calculations, such that, in case of having to select a set of amino acids to create a genetic code, possible advantages of natural selection of the genetic codes could be discussed.
Collapse
|
22
|
Abstract
Selection for resource conservation can shape the coding sequences of organisms living in nutrient-limited environments. Recently, it was proposed that selection for resource conservation, specifically for nitrogen and carbon content, has also shaped the structure of the standard genetic code, such that the missense mutations the code allows tend to cause small increases in the number of nitrogen and carbon atoms in amino acids. Moreover, it was proposed that this optimization is not confounded by known optimizations of the standard genetic code, such as for polar requirement or hydropathy. We challenge these claims. We show the proposed optimization for nitrogen conservation is highly sensitive to choice of null model and the proposed optimization for carbon conservation is confounded by the known conservative nature of the standard genetic code with respect to the molecular volume of amino acids. There is therefore little evidence the standard genetic code is optimized for resource conservation. We discuss our findings in the context of null models of the standard genetic code.
Collapse
Affiliation(s)
- Hana Rozhoňová
- Institute of Integrative Biology, ETH Zürich, Zürich, Switzerland
- Swiss Institute of Bioinformatics, Quartier UNIL-Sorge, Lausanne, Switzerland
| | - Joshua L Payne
- Institute of Integrative Biology, ETH Zürich, Zürich, Switzerland
- Swiss Institute of Bioinformatics, Quartier UNIL-Sorge, Lausanne, Switzerland
| |
Collapse
|
23
|
Morales AC, Rice AM, Ho AT, Mordstein C, Mühlhausen S, Watson S, Cano L, Young B, Kudla G, Hurst LD. Causes and Consequences of Purifying Selection on SARS-CoV-2. Genome Biol Evol 2021; 13:evab196. [PMID: 34427640 PMCID: PMC8504154 DOI: 10.1093/gbe/evab196] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/19/2021] [Indexed: 02/06/2023] Open
Abstract
Owing to a lag between a deleterious mutation's appearance and its selective removal, gold-standard methods for mutation rate estimation assume no meaningful loss of mutations between parents and offspring. Indeed, from analysis of closely related lineages, in SARS-CoV-2, the Ka/Ks ratio was previously estimated as 1.008, suggesting no within-host selection. By contrast, we find a higher number of observed SNPs at 4-fold degenerate sites than elsewhere and, allowing for the virus's complex mutational and compositional biases, estimate that the mutation rate is at least 49-67% higher than would be estimated based on the rate of appearance of variants in sampled genomes. Given the high Ka/Ks one might assume that the majority of such intrahost selection is the purging of nonsense mutations. However, we estimate that selection against nonsense mutations accounts for only ∼10% of all the "missing" mutations. Instead, classical protein-level selective filters (against chemically disparate amino acids and those predicted to disrupt protein functionality) account for many missing mutations. It is less obvious why for an intracellular parasite, amino acid cost parameters, notably amino acid decay rate, is also significant. Perhaps most surprisingly, we also find evidence for real-time selection against synonymous mutations that move codon usage away from that of humans. We conclude that there is common intrahost selection on SARS-CoV-2 that acts on nonsense, missense, and possibly synonymous mutations. This has implications for methods of mutation rate estimation, for determining times to common ancestry and the potential for intrahost evolution including vaccine escape.
Collapse
Affiliation(s)
- Atahualpa Castillo Morales
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, United Kingdom
| | - Alan M Rice
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, United Kingdom
| | - Alexander T Ho
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, United Kingdom
| | - Christine Mordstein
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, United Kingdom
- MRC Human Genetics Unit, Institute for Genetics and Molecular Medicine, The University of Edinburgh, United Kingdom
- Department of Molecular Biology and Genetics, Aarhus University, Denmark
| | - Stefanie Mühlhausen
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, United Kingdom
| | - Samir Watson
- Department of Molecular Biology and Genetics, Aarhus University, Denmark
| | - Laura Cano
- MRC Human Genetics Unit, Institute for Genetics and Molecular Medicine, The University of Edinburgh, United Kingdom
| | - Bethan Young
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, United Kingdom
- MRC Human Genetics Unit, Institute for Genetics and Molecular Medicine, The University of Edinburgh, United Kingdom
| | - Grzegorz Kudla
- MRC Human Genetics Unit, Institute for Genetics and Molecular Medicine, The University of Edinburgh, United Kingdom
| | - Laurence D Hurst
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, United Kingdom
| |
Collapse
|
24
|
Abstract
The standard genetic code (SGC) has been extensively analyzed for the biological ramifications of its nonrandom structure. For instance, mismatch errors due to point mutation or mistranslation have an overall smaller effect on the amino acid polar requirement under the SGC than under random genetic codes (RGCs). A similar observation was recently made for frameshift errors, prompting the assertion that the SGC has been shaped by natural selection for frameshift-robustness-conservation of certain amino acid properties upon a frameshift mutation or translational frameshift. However, frameshift-robustness confers no benefit because frameshifts usually create premature stop codons that cause nonsense-mediated mRNA decay or production of nonfunctional truncated proteins. We here propose that the frameshift-robustness of the SGC is a byproduct of its mismatch-robustness. Of 564 amino acid properties considered, the SGC exhibits mismatch-robustness in 93-133 properties and frameshift-robustness in 55 properties, respectively, and that the latter is largely a subset of the former. For each of the 564 real and 564 randomly constructed fake properties of amino acids, there is a positive correlation between mismatch-robustness and frameshift-robustness across one million RGCs; this correlation arises because most amino acid changes resulting from a frameshift are also achievable by a mismatch error. Importantly, the SGC does not show significantly higher frameshift-robustness in any of the 55 properties than RGCs of comparable mismatch-robustness. These findings support that the frameshift-robustness of the SGC need not originate through direct selection and can instead be a site effect of its mismatch-robustness.
Collapse
Affiliation(s)
- Haiqing Xu
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
25
|
The Combinatorial Fusion Cascade to Generate the Standard Genetic Code. Life (Basel) 2021; 11:life11090975. [PMID: 34575125 PMCID: PMC8467831 DOI: 10.3390/life11090975] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Revised: 09/14/2021] [Accepted: 09/14/2021] [Indexed: 11/17/2022] Open
Abstract
Combinatorial fusion cascade was proposed as a transition stage between prebiotic chemistry and early forms of life. The combinatorial fusion cascade consists of three stages: eight initial complimentary pairs of amino acids, four protocodes, and the standard genetic code. The initial complimentary pairs and the protocodes are divided into dominant and recessive entities. The transitions between these stages obey the same combinatorial fusion rules for all amino acids. The combinatorial fusion cascade mathematically describes the codon assignments in the standard genetic code. It explains the availability of amino acids with the even and odd numbers of codons, the appearance of stop codons, inclusion of novel canonical amino acids, exceptional high numbers of codons for amino acids arginine, leucine, and serine, and the temporal order of amino acid inclusion into the genetic code. The temporal order of amino acids within the cascade is congruent with the consensus temporal order previously derived from the similarities between the available hypotheses. The control over the combinatorial fusion cascades would open the road for a novel technology to develop artificial microorganisms.
Collapse
|
26
|
Pawlak K, Wnetrzak M, Mackiewicz D, Mackiewicz P, Błażej P. Models of genetic code structure evolution with variable number of coded labels. Biosystems 2021; 210:104528. [PMID: 34492316 DOI: 10.1016/j.biosystems.2021.104528] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Revised: 08/26/2021] [Accepted: 08/27/2021] [Indexed: 10/20/2022]
Abstract
It is assumed that at the early stage of cell evolution its translation machinery was characterized by high noise, i.e. ambiguous assignment of codons to amino acids in the genetic code, which initially encoded only few amino acids. Next, during its evolution new amino acids were added to this code. Taking into account this facts, we investigated theoretical models of genetic code's structure, which evolved from a set of ambiguous codons assignments into a coding system with a low level of uncertainty. We considered three types of translational inaccuracies assuming a different number of fixed codon positions. We applied a modified version of evolutionary algorithm for finding the genetic codes that the most effectively reduced the initial uncertainty in the assignment of codons to encoded labels, i.e. amino acids and a stop translation signal. We examined codes with the number of labels from four to 22. Our results indicated that the quality of genetic code structure is strongly dependent on the number of encoded labels as well as the type of translational mechanism. The more strict assignments of codon to the labels was preferred by the codes encoding more number of labels. The results showed that a smaller degeneracy of codes evolved from a more tolerant coding with the stepwise addition of coded amino acids to the genetic code. The distribution of codon groups in the standard genetic code corresponds well to the translation model assuming two fixed codon positions, whereas the six-codon groups can be relics form previous stages of evolution when the code characterized by a greater uncertainty.
Collapse
Affiliation(s)
- Konrad Pawlak
- Department of Bioinformatics and Genomics, Faculty of Biotechnology, University of Wrocław, ul. Joliot-Curie 14a, Wrocław, Poland
| | - Małgorzata Wnetrzak
- Department of Bioinformatics and Genomics, Faculty of Biotechnology, University of Wrocław, ul. Joliot-Curie 14a, Wrocław, Poland
| | - Dorota Mackiewicz
- Department of Bioinformatics and Genomics, Faculty of Biotechnology, University of Wrocław, ul. Joliot-Curie 14a, Wrocław, Poland
| | - Paweł Mackiewicz
- Department of Bioinformatics and Genomics, Faculty of Biotechnology, University of Wrocław, ul. Joliot-Curie 14a, Wrocław, Poland
| | - Paweł Błażej
- Department of Bioinformatics and Genomics, Faculty of Biotechnology, University of Wrocław, ul. Joliot-Curie 14a, Wrocław, Poland.
| |
Collapse
|
27
|
Abstract
The causes and consequences of the nonrandom structure of the standard genetic code (SGC) have been of long-standing interest. A recent study reported that mutations in present-day protein-coding sequences are less likely to increase proteomic nitrogen and carbon uses under the SGC than under random genetic codes, concluding that the SGC has been selectively optimized for resource conservation. If true, this finding might offer important information on the environment in which the SGC and some of the earliest life forms evolved. However, we here show that the hypothesis of optimization of a genetic code for resource conservation is theoretically untenable. We discover that the aforementioned study estimated the expected mutational effect by inappropriately excluding mutations lowering resource consumptions and including mutations involving stop codons. After remedying these problems, we find no evidence that the SGC is optimized for nitrogen or carbon conservation.
Collapse
Affiliation(s)
- Haiqing Xu
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, 48109, USA
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, 48109, USA
| |
Collapse
|
28
|
The Mutational Robustness of the Genetic Code and Codon Usage in Environmental Context: A Non-Extremophilic Preference? Life (Basel) 2021; 11:life11080773. [PMID: 34440517 PMCID: PMC8398314 DOI: 10.3390/life11080773] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Revised: 07/23/2021] [Accepted: 07/28/2021] [Indexed: 12/12/2022] Open
Abstract
The genetic code was evolved, to some extent, to minimize the effects of mutations. The effects of mutations depend on the amino acid repertoire, the structure of the genetic code and frequencies of amino acids in proteomes. The amino acid compositions of proteins and corresponding codon usages are still under selection, which allows us to ask what kind of environment the standard genetic code is adapted to. Using simple computational models and comprehensive datasets comprising genomic and environmental data from all three domains of Life, we estimate the expected severity of non-synonymous genomic mutations in proteins, measured by the change in amino acid physicochemical properties. We show that the fidelity in these physicochemical properties is expected to deteriorate with extremophilic codon usages, especially in thermophiles. These findings suggest that the genetic code performs better under non-extremophilic conditions, which not only explains the low substitution rates encountered in halophiles and thermophiles but the revealed relationship between the genetic code and habitat allows us to ponder on earlier phases in the history of Life.
Collapse
|
29
|
Phylogenetic analysis of mutational robustness based on codon usage supports that the standard genetic code does not prefer extreme environments. Sci Rep 2021; 11:10963. [PMID: 34040064 PMCID: PMC8154912 DOI: 10.1038/s41598-021-90440-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Accepted: 05/10/2021] [Indexed: 02/04/2023] Open
Abstract
The mutational robustness of the genetic code is rarely discussed in the context of biological diversity, such as codon usage and related factors, often considered as independent of the actual organism's proteome. Here we put the living beings back to picture and use distortion as a metric of mutational robustness. Distortion estimates the expected severities of non-synonymous mutations measuring it by amino acid physicochemical properties and weighting for codon usage. Using the biological variance of codon frequencies, we interpret the mutational robustness of the standard genetic code with regards to their corresponding environments and genomic compositions (GC-content). Employing phylogenetic analyses, we show that coding fidelity in physicochemical properties can deteriorate with codon usages adapted to extreme environments and these putative effects are not the artefacts of phylogenetic bias. High temperature environments select for codon usages with decreased mutational robustness of hydrophobic, volumetric, and isoelectric properties. Selection at high saline concentrations also leads to reduced fidelity in polar and isoelectric patterns. These show that the genetic code performs best with mesophilic codon usages, strengthening the view that LUCA or its ancestors preferred lower temperature environments. Taxonomic implications, such as rooting the tree of life, are also discussed.
Collapse
|
30
|
Argyriadis JA, He YH, Jejjala V, Minic D. Dynamics of genetic code evolution: The emergence of universality. Phys Rev E 2021; 103:052409. [PMID: 34134257 DOI: 10.1103/physreve.103.052409] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Accepted: 10/08/2020] [Indexed: 11/07/2022]
Abstract
We study the dynamics of genetic code evolution. The model of Vetsigian et al. [Proc. Natl. Acad. Sci. USA 103, 10696 (2006)PNASA60027-842410.1073/pnas.0603780103] and Vetsigian [Collective evolution of biological and physical systems, Ph.D. thesis, 2005] uses the mechanism of horizontal gene transfer to demonstrate convergence of the genetic code to a near universal solution. We reproduce and analyze the algorithm as a dynamical system. All the parameters used in the model are varied to assess their impact on convergence and optimality score. We show that by allowing specific parameters to vary with time, the solution exhibits attractor dynamics. Finally, we study automorphisms of the genetic code arising due to this model. We use this to examine the scaling of the solutions to re-examine universality and find that there is a direct link to mutation rate.
Collapse
Affiliation(s)
- John-Antonio Argyriadis
- Jesus College, University of Oxford, OX1 3DW, United Kingdom and Rudolf Peierls Centre for Theoretical Physics, Clarendon Laboratory, Parks Road, University of Oxford, OX1 3PU, United Kingdom
| | - Yang-Hui He
- Department of Mathematics, City, University of London, EC1V 0HB, United Kingdom; Merton College, University of Oxford, OX1 4JD, United Kingdom; and School of Physics, NanKai University, Tianjin, 300071, People's Republic of China
| | - Vishnu Jejjala
- Mandelstam Institute for Theoretical Physics, School of Physics, NITheP, and CoE-MaSS, University of the Witwatersrand, Johannesburg, WITS 2050, South Africa and David Rittenhouse Laboratory, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Djordje Minic
- Department of Physics and Center for Soft Matter and Biological Physics, Virginia Tech, Blacksburg, Virginia 24061, USA
| |
Collapse
|
31
|
Schmidt M, Kubyshkin V. How To Quantify a Genetic Firewall? A Polarity-Based Metric for Genetic Code Engineering. Chembiochem 2021; 22:1268-1284. [PMID: 33231343 PMCID: PMC8049029 DOI: 10.1002/cbic.202000758] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Revised: 11/20/2020] [Indexed: 12/14/2022]
Abstract
Genetic code engineering aims to produce organisms that translate genetic information in a different way from that prescribed by the standard genetic code. This endeavor could eventually lead to genetic isolation, where an organism that operates under a different genetic code will not be able to transfer functional genes with other living species, thereby standing behind a genetic firewall. It is not clear however, how distinct the code should be, or how to measure the distance. We have developed a metric (Δcode ) where we assigned polarity indices (clog D7 ) to amino acids to calculate the distances between pairs of genetic codes. We then calculated the distance between a set of 204 genetic codes, including the 24 known distinct natural codes, 11 extreme-distance codes created computationally, nine theoretical special purpose codes from literature and 160 codes in which canonical amino acids were replaced by noncanonical chemical analogues. The metric can be used for building strategies towards creating semantically alienated organisms, and testing the strength of genetic firewalls. This metric provides the basis for a map of the genetic codes that could guide future efforts towards novel biochemical worlds, biosafety and deep barcoding applications.
Collapse
Affiliation(s)
| | - Vladimir Kubyshkin
- Department of ChemistryUniversity of ManitobaDysart Road 144WinnipegR3T 2N2Canada
| |
Collapse
|
32
|
Nowak K, Błażej P, Wnetrzak M, Mackiewicz D, Mackiewicz P. Some theoretical aspects of reprogramming the standard genetic code. Genetics 2021; 218:6169163. [PMID: 33711098 PMCID: PMC8128387 DOI: 10.1093/genetics/iyab040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Accepted: 02/11/2021] [Indexed: 11/12/2022] Open
Abstract
Reprogramming of the standard genetic code to include non-canonical amino acids (ncAAs) opens new prospects for medicine, industry, and biotechnology. There are several methods of code engineering, which allow us for storing new genetic information in DNA sequences and producing proteins with new properties. Here, we provided a theoretical background for the optimal genetic code expansion, which may find application in the experimental design of the genetic code. We assumed that the expanded genetic code includes both canonical and non-canonical information stored in 64 classical codons. What is more, the new coding system is robust to point mutations and minimizes the possibility of reversion from the new to old information. In order to find such codes, we applied graph theory to analyze the properties of optimal codon sets. We presented the formal procedure in finding the optimal codes with various number of vacant codons that could be assigned to new amino acids. Finally, we discussed the optimal number of the newly incorporated ncAAs and also the optimal size of codon groups that can be assigned to ncAAs.
Collapse
Affiliation(s)
- Kuba Nowak
- Faculty of Mathematics and Computer Science, University of Wrocław, ul. F. Joliot-Curie 15, 50-383 Wrocław, Poland
| | - Paweł Błażej
- Department of Bioinformatics and Genomics, Faculty of Biotechnology, University of Wrocław, ul F. Joliot-Curie 14a, 50-383 Wrocław, Poland
| | - Małgorzata Wnetrzak
- Department of Bioinformatics and Genomics, Faculty of Biotechnology, University of Wrocław, ul F. Joliot-Curie 14a, 50-383 Wrocław, Poland
| | - Dorota Mackiewicz
- Department of Bioinformatics and Genomics, Faculty of Biotechnology, University of Wrocław, ul F. Joliot-Curie 14a, 50-383 Wrocław, Poland
| | - Paweł Mackiewicz
- Department of Bioinformatics and Genomics, Faculty of Biotechnology, University of Wrocław, ul F. Joliot-Curie 14a, 50-383 Wrocław, Poland
| |
Collapse
|
33
|
Characterizing genomic variants and mutations in SARS-CoV-2 proteins from Indian isolates. GENE REPORTS 2021; 25:101044. [PMID: 33623833 PMCID: PMC7893251 DOI: 10.1016/j.genrep.2021.101044] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2020] [Revised: 12/25/2020] [Accepted: 01/29/2021] [Indexed: 12/17/2022]
Abstract
SARS-CoV-2 is mutating and creating divergent variants by altering the composition of essential constituent proteins. Pharmacologically, it is crucial to understand the diverse mechanism of mutations for stable vaccine or anti-viral drug design. Our current study concentrates on all the constituent proteins of 469 SARS-CoV-2 genome samples, derived from Indian patients. However, the study may easily be extended to the samples across the globe. We perform clustering analysis towards identifying unique variants in each of the SARS-CoV-2 proteins. A total of 536 mutated positions within the coding regions of SARS-CoV-2 proteins are detected among the identified variants from Indian isolates. We quantify mutations by focusing on the unique variants of each SARS-CoV-2 protein. We report the average number of mutation per variant, percentage of mutated positions, synonymous and non-synonymous mutations, mutations occurring in three codon positions and so on. Our study reveals the most susceptible six (06) proteins, which are ORF1ab, Spike (S), Nucleocapsid (N), ORF3a, ORF7a, and ORF8. Several non-synonymous substitutions are observed to be unique in different SARS-CoV-2 proteins. A total of 57 possible deleterious amino acid substitutions are predicted, which may impact on the protein functions. Several mutations show a large decrease in protein stability and are observed in putative functional domains of the proteins that might have some role in disease pathogenesis. We observe a good number of physicochemical property change during above deleterious substitutions.
Collapse
|
34
|
Shenhav L, Zeevi D. Resource conservation manifests in the genetic code. Science 2020; 370:683-687. [PMID: 33154134 DOI: 10.1126/science.aaz9642] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Revised: 05/04/2020] [Accepted: 09/11/2020] [Indexed: 12/31/2022]
Abstract
Nutrient limitation drives competition for resources across organisms. However, much is unknown about how selective pressures resulting from nutrient limitation shape microbial coding sequences. Here, we study this "resource-driven selection" by using metagenomic and single-cell data of marine microbes, alongside environmental measurements. We show that a significant portion of the selection exerted on microbes is explained by the environment and is associated with nitrogen availability. Notably, this resource conservation optimization is encoded in the structure of the standard genetic code, providing robustness against mutations that increase carbon and nitrogen incorporation into protein sequences. This robustness generalizes to codon choices from multiple taxa across all domains of life, including the human genome.
Collapse
Affiliation(s)
- Liat Shenhav
- Center for Studies in Physics and Biology, Rockefeller University, New York, NY, USA.,Department of Computer Science, University of California Los Angeles, Los Angeles, CA, USA
| | - David Zeevi
- Center for Studies in Physics and Biology, Rockefeller University, New York, NY, USA.
| |
Collapse
|
35
|
Schwersensky M, Rooman M, Pucci F. Large-scale in silico mutagenesis experiments reveal optimization of genetic code and codon usage for protein mutational robustness. BMC Biol 2020; 18:146. [PMID: 33081759 PMCID: PMC7576759 DOI: 10.1186/s12915-020-00870-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Accepted: 09/16/2020] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND How, and the extent to which, evolution acts on DNA and protein sequences to ensure mutational robustness and evolvability is a long-standing open question in the field of molecular evolution. We addressed this issue through the first structurome-scale computational investigation, in which we estimated the change in folding free energy upon all possible single-site mutations introduced in more than 20,000 protein structures, as well as through available experimental stability and fitness data. RESULTS At the amino acid level, we found the protein surface to be more robust against random mutations than the core, this difference being stronger for small proteins. The destabilizing and neutral mutations are more numerous in the core and on the surface, respectively, whereas the stabilizing mutations are about 4% in both regions. At the genetic code level, we observed smallest destabilization for mutations that are due to substitutions of base III in the codon, followed by base I, bases I+III, base II, and other multiple base substitutions. This ranking highly anticorrelates with the codon-anticodon mispairing frequency in the translation process. This suggests that the standard genetic code is optimized to limit the impact of random mutations, but even more so to limit translation errors. At the codon level, both the codon usage and the usage bias appear to optimize mutational robustness and translation accuracy, especially for surface residues. CONCLUSION Our results highlight the non-universality of mutational robustness and its multiscale dependence on protein features, the structure of the genetic code, and the codon usage. Our analyses and approach are strongly supported by available experimental mutagenesis data.
Collapse
Affiliation(s)
- Martin Schwersensky
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, CP 165/61, Roosevelt Ave. 50, Brussels, 1050, Belgium
| | - Marianne Rooman
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, CP 165/61, Roosevelt Ave. 50, Brussels, 1050, Belgium.
- Interuniversity Institute of Bioinformatics in Brussels, Boulevard du Triomphe, Brussels, 1050, Belgium.
| | - Fabrizio Pucci
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, CP 165/61, Roosevelt Ave. 50, Brussels, 1050, Belgium.
- Interuniversity Institute of Bioinformatics in Brussels, Boulevard du Triomphe, Brussels, 1050, Belgium.
| |
Collapse
|
36
|
Wills PR, Carter CW. Impedance Matching and the Choice Between Alternative Pathways for the Origin of Genetic Coding. Int J Mol Sci 2020; 21:E7392. [PMID: 33036401 PMCID: PMC7582391 DOI: 10.3390/ijms21197392] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2020] [Revised: 09/28/2020] [Accepted: 09/30/2020] [Indexed: 01/07/2023] Open
Abstract
We recently observed that errors in gene replication and translation could be seen qualitatively to behave analogously to the impedances in acoustical and electronic energy transducing systems. We develop here quantitative relationships necessary to confirm that analogy and to place it into the context of the minimization of dissipative losses of both chemical free energy and information. The formal developments include expressions for the information transferred from a template to a new polymer, Iσ; an impedance parameter, Z; and an effective alphabet size, neff; all of which have non-linear dependences on the fidelity parameter, q, and the alphabet size, n. Surfaces of these functions over the {n,q} plane reveal key new insights into the origin of coding. Our conclusion is that the emergence and evolutionary refinement of information transfer in biology follow principles previously identified to govern physical energy flows, strengthening analogies (i) between chemical self-organization and biological natural selection, and (ii) between the course of evolutionary trajectories and the most probable pathways for time-dependent transitions in physics. Matching the informational impedance of translation to the four-letter alphabet of genes uncovers a pivotal role for the redundancy of triplet codons in preserving as much intrinsic genetic information as possible, especially in early stages when the coding alphabet size was small.
Collapse
Affiliation(s)
- Peter R. Wills
- Department of Physics and Te Ao Marama Centre for Fundamental Inquiry, University of Auckland, PB 92019, Auckland 1142, New Zealand
| | - Charles W. Carter
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-7260, USA
| |
Collapse
|
37
|
Dila G, Michel CJ, Thompson JD. Optimality of circular codes versus the genetic code after frameshift errors. Biosystems 2020; 195:104134. [DOI: 10.1016/j.biosystems.2020.104134] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2020] [Revised: 03/23/2020] [Accepted: 03/25/2020] [Indexed: 12/24/2022]
|
38
|
A search for the physical basis of the genetic code. Biosystems 2020; 195:104148. [DOI: 10.1016/j.biosystems.2020.104148] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Revised: 04/09/2020] [Accepted: 04/09/2020] [Indexed: 01/01/2023]
|
39
|
Scale-invariant topology and bursty branching of evolutionary trees emerge from niche construction. Proc Natl Acad Sci U S A 2020; 117:7879-7887. [PMID: 32209672 DOI: 10.1073/pnas.1915088117] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Phylogenetic trees describe both the evolutionary process and community diversity. Recent work has established that they exhibit scale-invariant topology, which quantifies the fact that their branching lies in between the two extreme cases of balanced binary trees and maximally unbalanced ones. In addition, the backbones of phylogenetic trees exhibit bursts of diversification on all timescales. Here, we present a simple, coarse-grained statistical model of niche construction coupled to speciation. Finite-size scaling analysis of the dynamics shows that the resultant phylogenetic tree topology is scale-invariant due to a singularity arising from large niche construction fluctuations that follow extinction events. The same model recapitulates the bursty pattern of diversification in time. These results show how dynamical scaling laws of phylogenetic trees on long timescales can reflect the indelible imprint of the interplay between ecological and evolutionary processes.
Collapse
|
40
|
Abstract
Frameshifts in protein coding sequences are widely perceived as resulting in either nonfunctional or even deleterious protein products. Indeed, frameshifts typically lead to markedly altered protein sequences and premature stop codons. By analyzing complete proteomes from all three domains of life, we demonstrate that, in contrast, several key physicochemical properties of protein sequences exhibit significant robustness against +1 and -1 frameshifts. In particular, we show that hydrophobicity profiles of many protein sequences remain largely invariant upon frameshifting. For example, over 2,900 human proteins exhibit a Pearson's correlation coefficient R between the hydrophobicity profiles of the original and the +1-frameshifted variants greater than 0.7, despite an average sequence identity between the two of only 6.5% in this group. We observe a similar effect for protein sequence profiles of affinity for certain nucleobases as well as protein sequence profiles of intrinsic disorder. Finally, analysis of significance and optimality demonstrates that frameshift stability is embedded in the structure of the universal genetic code and may have contributed to shaping it. Our results suggest that frameshifting may be a powerful evolutionary mechanism for creating new proteins with vastly different sequences, yet similar physicochemical properties to the proteins from which they originate.
Collapse
|
41
|
Błażej P, Wnetrzak M, Mackiewicz D, Mackiewicz P. Basic principles of the genetic code extension. ROYAL SOCIETY OPEN SCIENCE 2020; 7:191384. [PMID: 32257313 PMCID: PMC7062095 DOI: 10.1098/rsos.191384] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Accepted: 01/09/2020] [Indexed: 05/08/2023]
Abstract
Compounds including non-canonical amino acids (ncAAs) or other artificially designed molecules can find a lot of applications in medicine, industry and biotechnology. They can be produced thanks to the modification or extension of the standard genetic code (SGC). Such peptides or proteins including the ncAAs can be constantly delivered in a stable way by organisms with the customized genetic code. Among several methods of engineering the code, using non-canonical base pairs is especially promising, because it enables generating many new codons, which can be used to encode any new amino acid. Since even one pair of new bases can extend the SGC up to 216 codons generated by a six-letter nucleotide alphabet, the extension of the SGC can be achieved in many ways. Here, we proposed a stepwise procedure of the SGC extension with one pair of non-canonical bases to minimize the consequences of point mutations. We reported relationships between codons in the framework of graph theory. All 216 codons were represented as nodes of the graph, whereas its edges were induced by all possible single nucleotide mutations occurring between codons. Therefore, every set of canonical and newly added codons induces a specific subgraph. We characterized the properties of the induced subgraphs generated by selected sets of codons. Thanks to that, we were able to describe a procedure for incremental addition of the set of meaningful codons up to the full coding system consisting of three pairs of bases. The procedure of gradual extension of the SGC makes the whole system robust to changing genetic information due to mutations and is compatible with the views assuming that codons and amino acids were added successively to the primordial SGC, which evolved minimizing harmful consequences of mutations or mistranslations of encoded proteins.
Collapse
Affiliation(s)
- Paweł Błażej
- Department of Bioinformatics and Genomics, Faculty of Biotechnology, University of Wrocław, ul. Joliot-Curie 14a, Wrocław, Poland
| | | | | | | |
Collapse
|
42
|
Determining amino acid scores of the genetic code table: Complementarity, structure, function and evolution. Biosystems 2020; 187:104026. [DOI: 10.1016/j.biosystems.2019.104026] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Accepted: 08/28/2019] [Indexed: 11/22/2022]
|
43
|
The Ancient Operational Code is Embedded in the Amino Acid Substitution Matrix and aaRS Phylogenies. J Mol Evol 2019; 88:136-150. [PMID: 31781936 DOI: 10.1007/s00239-019-09918-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Accepted: 11/14/2019] [Indexed: 10/25/2022]
Abstract
The underlying structure of the canonical amino acid substitution matrix (aaSM) is examined by considering stepwise improvements in the differential recognition of amino acids according to their chemical properties during the branching history of the two aminoacyl-tRNA synthetase (aaRS) superfamilies. The evolutionary expansion of the genetic code is described by a simple parameterization of the aaSM, in which (i) the number of distinguishable amino acid types, (ii) the matrix dimension and (iii) the number of parameters, each increases by one for each bifurcation in an aaRS phylogeny. Parameterized matrices corresponding to trees in which the size of an amino acid sidechain is the only discernible property behind its categorization as a substrate, exclusively for a Class I or II aaRS, provide a significantly better fit to empirically determined aaSM than trees with random bifurcation patterns. A second split between polar and nonpolar amino acids in each Class effects a vastly greater further improvement. The earliest Class-separated epochs in the phylogenies of the aaRS reflect these enzymes' capability to distinguish tRNAs through the recognition of acceptor stem identity elements via the minor (Class I) and major (Class II) helical grooves, which is how the ancient operational code functioned. The advent of tRNA recognition using the anticodon loop supports the evolution of the optimal map of amino acid chemistry found in the later genetic code, an essentially digital categorization, in which polarity is the major functional property, compensating for the unrefined, haphazard differentiation of amino acids achieved by the operational code.
Collapse
|
44
|
Schmidt M. A metric space for semantic containment: Towards the implementation of genetic firewalls. Biosystems 2019; 185:104015. [PMID: 31408698 DOI: 10.1016/j.biosystems.2019.104015] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Revised: 08/06/2019] [Accepted: 08/08/2019] [Indexed: 12/13/2022]
Abstract
Analysing or engineering the genetic code has mainly been considered as an approach to reduce or increase the mutational robustness of the genetic code, i.e. the error tolerance in DNA mutations, or to enable the incorporation of non-canonical amino acids. The approach of "semantic containment", however, is less interested in altering the mutational tolerance of the standard code, but to create synthetic alternative genetic codes that limit or all together impede horizontal gene transfer between a natural and genomically recoded organisms (GRO). A major claim or conjecture of semantic containment is: "the farther, the safer", meaning, the less similarity there is between two codes, the less chance of a horizontal gene transfer, and the stronger the genetic firewall. So far, no metrics were available to measure and quantify the "genetic distance" between different genetic codes. Such a metric, however, is iis paramount to allow the experimental testing and evaluation of the validity of semantic biocontainment for the first time. Here, we introduce a metric space to measure exactly the distance (dissimilarity) between different genetic codes, in order to provide a framework to evaluate the relation between distance and strength of a genetic firewall. Results are presented that incorporate bespoken metrics when producing alternative genetic codes according to predefined goals, specifications and limitations. Finally, as an outlook, implications and challenges for genetic firewall(s) are discussed for dual- and multi-code systems.
Collapse
|
45
|
Wichmann S, Ardern Z. Optimality in the standard genetic code is robust with respect to comparison code sets. Biosystems 2019; 185:104023. [DOI: 10.1016/j.biosystems.2019.104023] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Revised: 08/22/2019] [Accepted: 08/24/2019] [Indexed: 01/22/2023]
|
46
|
Barbieri M. Evolution of the genetic code: The ambiguity-reduction theory. Biosystems 2019; 185:104024. [DOI: 10.1016/j.biosystems.2019.104024] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Revised: 08/26/2019] [Accepted: 08/26/2019] [Indexed: 10/26/2022]
|
47
|
Genetic codes optimized as a traveling salesman problem. PLoS One 2019; 14:e0224552. [PMID: 31658301 PMCID: PMC6816573 DOI: 10.1371/journal.pone.0224552] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Accepted: 10/16/2019] [Indexed: 11/19/2022] Open
Abstract
The Standard Genetic Code (SGC) is robust to mutational errors such that frequently occurring mutations minimally alter the physio-chemistry of amino acids. The apparent correlation between the evolutionary distances among codons and the physio-chemical distances among their cognate amino acids suggests an early co-diversification between the codons and amino acids. Here we formulated the co-minimization of evolutionary distances between codons and physio-chemical distances between amino acids as a Traveling Salesman Problem (TSP) and solved it with a Hopfield neural network. In this unsupervised learning algorithm, macromolecules (e.g., tRNAs and aminoacyl-tRNA synthetases) associating codons with amino acids were considered biological analogs of Hopfield neurons associating "tour cities" with "tour positions". The Hopfield network efficiently yielded an abundance of genetic codes that were more error-minimizing than SGC and could thus be used to design artificial genetic codes. We further argue that as a self-optimization algorithm, the Hopfield neural network provides a model of origin of SGC and other adaptive molecular systems through evolutionary learning.
Collapse
|
48
|
Reflexivity, coding and quantum biology. Biosystems 2019; 185:104027. [PMID: 31494127 DOI: 10.1016/j.biosystems.2019.104027] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2019] [Revised: 08/29/2019] [Accepted: 08/31/2019] [Indexed: 12/31/2022]
Abstract
Biological systems are fundamentally computational in that they process information in an apparently purposeful fashion rather than just transferring bits of it in a purely syntactical manner. Biological information, such has genetic information stored in DNA sequences, has semantic content. It carries meaning that is defined by the molecular context of its cellular environment. Information processing in biological systems displays an inherent reflexivity, a tendency for the computational information-processing to be "about" the behaviour of the molecules that participate in the computational process. This is most evident in the operation of the genetic code, where the specificity of the reactions catalysed by the aminoacyl-tRNA synthetase (aaRS) enzymes is required to be self-sustaining. A cell's suite of aaRS enzymes completes a reflexively autocatalytic set of molecular components capable of making themselves through the operation of the code. This set requires the existence of a body of reflexive information to be stored in an organism's genome. The genetic code is a reflexively self-organised mapping of the chemical properties of amino acid sidechains onto codon "tokens". It is a highly evolved symbolic system of chemical self-description. Although molecular biological coding is generally portrayed in terms of classical bit-transfer events, various biochemical events explicitly require quantum coherence for their occurrence. Whether the implicit transfer of quantum information, qbits, is indicative of wide-ranging quantum computation in living systems is currently the subject of extensive investigation and speculation in the field of Quantum Biology.
Collapse
|
49
|
A general model on the origin of biological codes. Biosystems 2019; 181:11-19. [DOI: 10.1016/j.biosystems.2019.04.010] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2019] [Revised: 04/16/2019] [Accepted: 04/16/2019] [Indexed: 01/09/2023]
|
50
|
Optimization of the standard genetic code in terms of two mutation types: Point mutations and frameshifts. Biosystems 2019; 181:44-50. [DOI: 10.1016/j.biosystems.2019.04.012] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Accepted: 04/27/2019] [Indexed: 02/08/2023]
|