1
|
Wong TF. Triphasic Development of the Genetic Code. Chem Rev 2024; 124:9866-9872. [PMID: 39088192 PMCID: PMC11393795 DOI: 10.1021/acs.chemrev.3c00915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/02/2024]
Abstract
The genetic code contains an alphabet of genetically encoded amino acids. The ten Phase 1 amino acids, including Gly, Ala, Ser, Asp, Glu, Val, Leu, Ile, Pro and Thr, were available from the prebiotic environment, whereas the ten Phase 2 amino acids, including Phe, Tyr, Arg, His, Trp, Asn, Gln, Lys, Cys, and Met, became available only later from amino acid biosyntheses. In the archaeon Methanopyrus kandleri, the oldest organism known, the standard alphabet of 20 amino acids was "frozen" and no additional amino acid was encoded in the subsequent 3 Gyrs. Four decades ago, it was discovered that the code was frozen because all the organisms were so well adapted to the standard amino acids that oligogenic barriers, consisting of genes that are thoroughly dependent on the standard code, would cause loss of viability upon the deletion of any one amino acid from the code. Once the reason for the freezing of the code was ascertained, procedures were devised by scientists worldwide to enable the encoding of novel noncanonical amino acids (ncAAs). These encoded Phase 3 ncAAs now surpass the 20 canonical Phase 2 amino acids in the code.
Collapse
Affiliation(s)
- Tze-Fei Wong
- Division of Life Science and Applied Genomics Center, Hong Kong University of Science & Technology Hong Kong, China
| |
Collapse
|
2
|
Marshall LK, Fahrenbach AC, Thordarson P. RNA-Binding Peptides Inspired by the RNA Recognition Motif. ACS Chem Biol 2024; 19:243-248. [PMID: 38314708 DOI: 10.1021/acschembio.3c00694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2024]
Abstract
β-Hairpin peptides with RNA-binding sequences mimicking the central two β-strands of the RNA recognition motif (RRM) protein domain have been observed to bind in a 2:1 fashion to a series of RNA homooligonucleotides in aqueous solution (PBS buffer, pH 7.40) with binding energies (-27 to -35 kJ mol-1) similar to those of full-size protein RRMs. The peptides display mild selectivities with respect to the binding of the different homooligomers. Binding studies in 500 mM magnesium chloride suggest that the complex formation is not predominantly driven by Coulombic attraction. These peptides represent a starting point for further studies of non-Coulombic binding of RNA by peptides and proteins, which is important in the context of contemporary biology, potential therapeutic applications, and prebiotic peptide-RNA interactions.
Collapse
|
3
|
Brown SM, Mayer-Bacon C, Freeland S. Xeno Amino Acids: A Look into Biochemistry as We Do Not Know It. Life (Basel) 2023; 13:2281. [PMID: 38137883 PMCID: PMC10744825 DOI: 10.3390/life13122281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 11/18/2023] [Accepted: 11/20/2023] [Indexed: 12/24/2023] Open
Abstract
Would another origin of life resemble Earth's biochemical use of amino acids? Here, we review current knowledge at three levels: (1) Could other classes of chemical structure serve as building blocks for biopolymer structure and catalysis? Amino acids now seem both readily available to, and a plausible chemical attractor for, life as we do not know it. Amino acids thus remain important and tractable targets for astrobiological research. (2) If amino acids are used, would we expect the same L-alpha-structural subclass used by life? Despite numerous ideas, it is not clear why life favors L-enantiomers. It seems clearer, however, why life on Earth uses the shortest possible (alpha-) amino acid backbone, and why each carries only one side chain. However, assertions that other backbones are physicochemically impossible have relaxed into arguments that they are disadvantageous. (3) Would we expect a similar set of side chains to those within the genetic code? Many plausible alternatives exist. Furthermore, evidence exists for both evolutionary advantage and physicochemical constraint as explanatory factors for those encoded by life. Overall, as focus shifts from amino acids as a chemical class to specific side chains used by post-LUCA biology, the probable role of physicochemical constraint diminishes relative to that of biological evolution. Exciting opportunities now present themselves for laboratory work and computing to explore how changing the amino acid alphabet alters the universe of protein folds. Near-term milestones include: (a) expanding evidence about amino acids as attractors within chemical evolution; (b) extending characterization of other backbones relative to biological proteins; and (c) merging computing and laboratory explorations of structures and functions unlocked by xeno peptides.
Collapse
|
4
|
Abstract
α-Amino acids are essential molecular constituents of life, twenty of which are privileged because they are encoded by the ribosomal machinery. The question remains open as to why this number and why this 20 in particular, an almost philosophical question that cannot be conclusively resolved. They are closely related to the evolution of the genetic code and whether nucleic acids, amino acids, and peptides appeared simultaneously and were available under prebiotic conditions when the first self-sufficient complex molecular system emerged on Earth. This report focuses on prebiotic and metabolic aspects of amino acids and proteins starting with meteorites, followed by their formation, including peptides, under plausible prebiotic conditions, and the major biosynthetic pathways in the various kingdoms of life. Coenzymes play a key role in the present analysis in that amino acid metabolism is linked to glycolysis and different variants of the tricarboxylic acid cycle (TCA, rTCA, and the incomplete horseshoe version) as well as the biosynthesis of the most important coenzymes. Thus, the report opens additional perspectives and facets on the molecular evolution of primary metabolism.
Collapse
Affiliation(s)
- Andreas Kirschning
- Institute of Organic ChemistryLeibniz University HannoverSchneiderberg 1B30167HannoverGermany
| |
Collapse
|
5
|
A Closer Look at Non-random Patterns Within Chemistry Space for a Smaller, Earlier Amino Acid Alphabet. J Mol Evol 2022; 90:307-323. [PMID: 35666290 DOI: 10.1007/s00239-022-10061-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 05/11/2022] [Indexed: 10/18/2022]
Abstract
Recent findings, in vitro and in silico, are strengthening the idea of a simpler, earlier stage of genetically encoded proteins which used amino acids produced by prebiotic chemistry. These findings motivate a re-examination of prior work which has identified unusual properties of the set of twenty amino acids found within the full genetic code, while leaving it unclear whether similar patterns also characterize the subset of prebiotically plausible amino acids. We have suggested previously that this ambiguity may result from the low number of amino acids recognized by the definition of prebiotic plausibility used for the analysis. Here, we test this hypothesis using significantly updated data for organic material detected within meteorites, which contain several coded and non-coded amino acids absent from prior studies. In addition to confirming the well-established idea that "late" arriving amino acids expanded the chemistry space encoded by genetic material, we find that a prebiotically plausible subset of coded amino acids generally emulates the patterns found in the full set of 20, namely an exceptionally broad and even distribution of volumes and an exceptionally even distribution of hydrophobicities (quantified as logP) over a narrow range. However, the strength of this pattern varies depending on both the size and composition the library used to create a background (null model) for a random alphabet, and the precise definition of exactly which amino acids were present in a simpler, earlier code. Findings support the idea that a small sample size of amino acids caused previous ambiguous results, and further improvements in meteorite analysis, and/or prebiotic simulations will further clarify the nature and extent of unusual properties. We discuss the case of sulfur-containing amino acids as a specific and clear example and conclude by reviewing the potential impact of better understanding the chemical "logic" of a smaller forerunner to the standard amino acid alphabet.
Collapse
|
6
|
Mayer-Bacon C, Freeland SJ. A broader context for understanding amino acid alphabet optimality. J Theor Biol 2021; 520:110661. [PMID: 33684404 DOI: 10.1016/j.jtbi.2021.110661] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 02/23/2021] [Accepted: 02/25/2021] [Indexed: 12/21/2022]
Abstract
A series of prior publications has reported unusual properties of the set of genetically encoded amino acids shared by all known life. This work uses quantitative measures (descriptors) of size, charge and hydrophobicity to compare the distribution of the genetically encoded amino acids with random samples of plausible alternatives. Results show that the standard "alphabet" of amino acids established by the time of LUCA is distributed with unusual evenness over a broad range for the three, key physicochemical properties. However, different publications have used slightly different assumptions, including variations in the precise descriptors used, the set of plausible alternative molecules considered, and the format in which results have been presented. Here we consolidate these findings into a unified framework in order to clarify unusual features. We find that in general, the remarkable features of the full set of 20 genetically encoded amino acids are robust when compared with random samples drawn from a densely populated picture of plausible, alternative L-α-amino acids. In particular, the genetically encoded set is distributed across an exceptionally broad range of volumes, and distributed exceptionally evenly within a modest range of hydrophobicities. Surprisingly, range and evenness of charge (pKa) is exceptional only for the full amino acid structures, not for their sidechains - a result inconsistent with prior interpretations involving the role that amino acid sidechains play within protein sequences. In stark contrast, these remarkable features are far less clear when the prebiotically plausible subset of genetically encoded amino acids is compared with a much smaller pool of prebiotically plausible alternatives. By considering the nature of the "optimality theory" approach taken to derive these and prior insights, we suggest productive avenues for further research.
Collapse
Affiliation(s)
- Christopher Mayer-Bacon
- Department of Biological Sciences, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 25250, USA.
| | - Stephen J Freeland
- Department of Biological Sciences, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 25250, USA
| |
Collapse
|
7
|
Adaptive Properties of the Genetically Encoded Amino Acid Alphabet Are Inherited from Its Subsets. Sci Rep 2019; 9:12468. [PMID: 31462646 PMCID: PMC6713743 DOI: 10.1038/s41598-019-47574-x] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Accepted: 07/08/2019] [Indexed: 01/11/2023] Open
Abstract
Life uses a common set of 20 coded amino acids (CAAs) to construct proteins. This set was likely canonicalized during early evolution; before this, smaller amino acid sets were gradually expanded as new synthetic, proofreading and coding mechanisms became biologically available. Many possible subsets of the modern CAAs or other presently uncoded amino acids could have comprised the earlier sets. We explore the hypothesis that the CAAs were selectively fixed due to their unique adaptive chemical properties, which facilitate folding, catalysis, and solubility of proteins, and gave adaptive value to organisms able to encode them. Specifically, we studied in silico hypothetical CAA sets of 3–19 amino acids comprised of 1913 structurally diverse α-amino acids, exploring the adaptive value of their combined physicochemical properties relative to those of the modern CAA set. We find that even hypothetical sets containing modern CAA members are especially adaptive; it is difficult to find sets even among a large choice of alternatives that cover the chemical property space more amply. These results suggest that each time a CAA was discovered and embedded during evolution, it provided an adaptive value unusual among many alternatives, and each selective step may have helped bootstrap the developing set to include still more CAAs.
Collapse
|
8
|
DNA partitions into triplets under tension in the presence of organic cations, with sequence evolutionary age predicting the stability of the triplet phase. Q Rev Biophys 2018; 50:e15. [PMID: 29233227 DOI: 10.1017/s0033583517000130] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Using atomistic simulations, we show the formation of stable triplet structure when particular GC-rich DNA duplexes are extended in solution over a timescale of hundreds of nanoseconds, in the presence of organic salt. We present planar-stacked triplet disproportionated DNA (Σ DNA) as a possible solution phase of the double helix under tension, subject to sequence and the presence of stabilising co-factors. Considering the partitioning of the duplexes into triplets of base pairs as the first step of operation of recombinase enzymes like RecA, we emphasise the structure-function relationship in Σ DNA. We supplement atomistic calculations with thermodynamic arguments to show that codons for 'phase 1' amino acids (those appearing early in evolution) are more likely than a lower entropy GC-rich sequence to form triplets under tension. We further observe that the four amino acids supposed (in the 'GADV world' hypothesis) to constitute the minimal set to produce functional globular proteins have the strongest triplet-forming propensity within the phase 1 set, showing a series of decreasing triplet propensity with evolutionary newness. The weak form of our observation provides a physical mechanism to minimise read frame and recombination alignment errors in the early evolution of the genetic code.
Collapse
|
9
|
Meringer M, Cleaves HJ. Exploring astrobiology using in silico molecular structure generation. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2017; 375:rsta.2016.0344. [PMID: 29133444 PMCID: PMC5686402 DOI: 10.1098/rsta.2016.0344] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 03/21/2017] [Indexed: 05/27/2023]
Abstract
The origin of life is typically understood as a transition from inanimate or disorganized matter to self-organized, 'animate' matter. This transition probably took place largely in the context of organic compounds, and most approaches, to date, have focused on using the organic chemical composition of modern organisms as the main guide for understanding this process. However, it has gradually come to be appreciated that biochemistry, as we know it, occupies a minute volume of the possible organic 'chemical space'. As the majority of abiotic syntheses appear to make a large set of compounds not found in biochemistry, as well as an incomplete subset of those that are, it is possible that life began with a significantly different set of components. Chemical graph-based structure generation methods allow for exhaustive in silico enumeration of different compound types and different types of 'chemical spaces' beyond those used by biochemistry, which can be explored to help understand the types of compounds biology uses, as well as to understand the nature of abiotic synthesis, and potentially design novel types of living systems.This article is part of the themed issue 'Reconceptualizing the origins of life'.
Collapse
Affiliation(s)
- Markus Meringer
- Earth Observation Center (EOC), German Aerospace Center (DLR), Münchner Straße 20, 82234 Oberpfaffenhofen-Wessling, Germany
| | - H James Cleaves
- Earth-Life Science Institute, Tokyo Institute of Technology, 2-12-IE-1 Ookayama, Meguro-ku, Tokyo 152-8551, Japan
- Institute for Advanced Study, Princeton, NJ 08540, USA
- Blue Marble Space Institute of Science, 1515 Gallatin Street NW, Washington, DC 20011, USA
- Center for Chemical Evolution, Georgia Institute of Technology, Atlanta, GA 30332, USA
| |
Collapse
|
10
|
Future of the Genetic Code. Life (Basel) 2017; 7:life7010010. [PMID: 28264473 PMCID: PMC5370410 DOI: 10.3390/life7010010] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2017] [Revised: 02/20/2017] [Accepted: 02/23/2017] [Indexed: 11/17/2022] Open
Abstract
The methods for establishing synthetic lifeforms with rewritten genetic codes comprising non-canonical amino acids (NCAA) in addition to canonical amino acids (CAA) include proteome-wide replacement of CAA, insertion through suppression of nonsense codon, and insertion via the pyrrolysine and selenocysteine pathways. Proteome-wide reassignments of nonsense codons and sense codons are also under development. These methods enable the application of NCAAs to enrich both fundamental and applied aspects of protein chemistry and biology. Sense codon reassignment to NCAA could incur problems arising from the usage of anticodons as identity elements on tRNA, and possible misreading of NNY codons by UNN anticodons. Evidence suggests that the problem of anticodons as identity elements can be diminished or resolved through removal from the tRNA of all identity elements besides the anticodon, and the problem of misreading of NNY codons by UNN anticodon can be resolved by the retirement of both the UNN anticodon and its complementary NNA codon from the proteome in the event that a restrictive post-transcriptional modification of the UNN anticodon by host enzymes to prevent the misreading cannot be obtained.
Collapse
|
11
|
Coevolution Theory of the Genetic Code at Age Forty: Pathway to Translation and Synthetic Life. Life (Basel) 2016; 6:life6010012. [PMID: 26999216 PMCID: PMC4810243 DOI: 10.3390/life6010012] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2016] [Revised: 02/26/2016] [Accepted: 03/04/2016] [Indexed: 11/17/2022] Open
Abstract
The origins of the components of genetic coding are examined in the present study. Genetic information arose from replicator induction by metabolite in accordance with the metabolic expansion law. Messenger RNA and transfer RNA stemmed from a template for binding the aminoacyl-RNA synthetase ribozymes employed to synthesize peptide prosthetic groups on RNAs in the Peptidated RNA World. Coevolution of the genetic code with amino acid biosynthesis generated tRNA paralogs that identify a last universal common ancestor (LUCA) of extant life close to Methanopyrus, which in turn points to archaeal tRNA introns as the most primitive introns and the anticodon usage of Methanopyrus as an ancient mode of wobble. The prediction of the coevolution theory of the genetic code that the code should be a mutable code has led to the isolation of optional and mandatory synthetic life forms with altered protein alphabets.
Collapse
|
12
|
|
13
|
Extraordinarily adaptive properties of the genetically encoded amino acids. Sci Rep 2015; 5:9414. [PMID: 25802223 PMCID: PMC4371090 DOI: 10.1038/srep09414] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2014] [Accepted: 02/12/2015] [Indexed: 02/02/2023] Open
Abstract
Using novel advances in computational chemistry, we demonstrate that the set of 20 genetically encoded amino acids, used nearly universally to construct all coded terrestrial proteins, has been highly influenced by natural selection. We defined an adaptive set of amino acids as one whose members thoroughly cover relevant physico-chemical properties, or “chemistry space.” Using this metric, we compared the encoded amino acid alphabet to random sets of amino acids. These random sets were drawn from a computationally generated compound library containing 1913 alternative amino acids that lie within the molecular weight range of the encoded amino acids. Sets that cover chemistry space better than the genetically encoded alphabet are extremely rare and energetically costly. Further analysis of more adaptive sets reveals common features and anomalies, and we explore their implications for synthetic biology. We present these computations as evidence that the set of 20 amino acids found within the standard genetic code is the result of considerable natural selection. The amino acids used for constructing coded proteins may represent a largely global optimum, such that any aqueous biochemistry would use a very similar set.
Collapse
|
14
|
How amino acids and peptides shaped the RNA world. Life (Basel) 2015; 5:230-46. [PMID: 25607813 PMCID: PMC4390850 DOI: 10.3390/life5010230] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2014] [Revised: 12/16/2014] [Accepted: 01/14/2015] [Indexed: 11/17/2022] Open
Abstract
The “RNA world” hypothesis is seen as one of the main contenders for a viable theory on the origin of life. Relatively small RNAs have catalytic power, RNA is everywhere in present-day life, the ribosome is seen as a ribozyme, and rRNA and tRNA are crucial for modern protein synthesis. However, this view is incomplete at best. The modern protein-RNA ribosome most probably is not a distorted form of a “pure RNA ribosome” evolution started out with. Though the oldest center of the ribosome seems “RNA only”, we cannot conclude from this that it ever functioned in an environment without amino acids and/or peptides. Very small RNAs (versatile and stable due to basepairing) and amino acids, as well as dipeptides, coevolved. Remember, it is the amino group of aminoacylated tRNA that attacks peptidyl-tRNA, destroying the bond between peptide and tRNA. This activity of the amino acid part of aminoacyl-tRNA illustrates the centrality of amino acids in life. With the rise of the “RNA world” view of early life, the pendulum seems to have swung too much towards the ribozymatic part of early biochemistry. The necessary presence and activity of amino acids and peptides is in need of highlighting. In this article, we try to bring the role of the peptide component of early life back into focus. We argue that an RNA world completely independent of amino acids never existed.
Collapse
|
15
|
Rouch DA. Evolution of the first genetic cells and the universal genetic code: a hypothesis based on macromolecular coevolution of RNA and proteins. J Theor Biol 2014; 357:220-44. [PMID: 24931677 DOI: 10.1016/j.jtbi.2014.06.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2013] [Revised: 06/02/2014] [Accepted: 06/03/2014] [Indexed: 11/19/2022]
Abstract
A qualitative hypothesis based on coevolution of protein and nucleic acid macromolecules was developed to explain the evolution of the first genetic cells, from the likely organic chemical-rich environment of early earth, through to the Last Universal Common Ancestor (LUCA). The evolution of the first genetic cell was divided into three phases, proto-genetic cells I, II and III, and the transition to each milestone is described, based on development of chemical cross-catalysis, bio-cross-catalysis, and the universal genetic code, respectively. Selection of macromolecular properties of both peptides and nucleic acids, in response to environmental factors, was likely to be a key aspect of early evolution. The development of hereditable nucleic acids with various key functions; translation, transcription and replication, is described. These functions are envisaged to have coevolved with protein enzymes, from simple organic precursors. Genetically heritable nucleotides may have developed after the local earth environment had cooled below 63 °C. Around this temperature G-C bases would have been preferentially utilized for nucleotide synthesis. Under these conditions RNA type nucleotides were then likely selected from a range of different types of nucleotide backbones through template-based synthesis. Initial development of the genetic coding system was simplified by the availability of proto-messenger RNA sequences that contained only G and C bases, and the need to encode only four amino acids. The step-wise addition of further amino acids to the code was predicted to parallel the growing metabolic complexity of the proto-genetic cell. On completion of this evolutionary process the proto-genetic cell is envisaged to have become the LUCA, the last common ancestor of bacteria, eukaryote and archaea domains. Key issues addressed by the model include: (a) the transition from non-hereditable random sequences of peptides and nucleic acids to specific proteins coded by hereditable nucleotide sequences, (b) the origin of homochiral amino acids and sugars, and (c) the mutation limits on the sizes of early nucleic acid genomes. The first genome was limited to a size of about 200 base pairs.
Collapse
Affiliation(s)
- Duncan A Rouch
- Biotechnology and Environmental Biology, RMIT University, PO Box 71, Bundoora, Melbourne, Vic 3083, Australia.
| |
Collapse
|
16
|
Ilardo MA, Freeland SJ. Testing for adaptive signatures of amino acid alphabet evolution using chemistry space. ACTA ACUST UNITED AC 2014. [DOI: 10.1186/1759-2208-5-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
17
|
Meringer M, Cleaves HJ, Freeland SJ. Beyond terrestrial biology: charting the chemical universe of α-amino acid structures. J Chem Inf Model 2013; 53:2851-62. [PMID: 24152173 DOI: 10.1021/ci400209n] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
α-Amino acids are fundamental to biochemistry as the monomeric building blocks with which cells construct proteins according to genetic instructions. However, the 20 amino acids of the standard genetic code represent a tiny fraction of the number of α-amino acid chemical structures that could plausibly play such a role, both from the perspective of natural processes by which life emerged and evolved, and from the perspective of human-engineered genetically coded proteins. Until now, efforts to describe the structures comprising this broader set, or even estimate their number, have been hampered by the complex combinatorial properties of organic molecules. Here, we use computer software based on graph theory and constructive combinatorics in order to conduct an efficient and exhaustive search of the chemical structures implied by two careful and precise definitions of the α-amino acids relevant to coded biological proteins. Our results include two virtual libraries of α-amino acid structures corresponding to these different approaches, comprising 121 044 and 3 846 structures, respectively, and suggest a simple approach to exploring much larger, as yet uncomputed, libraries of interest.
Collapse
Affiliation(s)
- Markus Meringer
- German Aerospace Center (DLR), Earth Observation Center (EOC) , Münchner Straße 20, D-82234 Oberpfaffenhofen-Wessling, Germany
| | | | | |
Collapse
|
18
|
Morgens DW, Cavalcanti ARO. An alternative look at code evolution: using non-canonical codes to evaluate adaptive and historic models for the origin of the genetic code. J Mol Evol 2013; 76:71-80. [PMID: 23344715 DOI: 10.1007/s00239-013-9542-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2012] [Accepted: 01/15/2013] [Indexed: 10/27/2022]
Abstract
The canonical code has been shown many times to be highly robust against point mutations; that is, mutations that change a single nucleotide tend to result in similar amino acids more often than expected by chance. There are two major types of models for the origin of the code, which explain how this sophisticated structure evolved. Adaptive models state that the primitive code was specifically selected for error minimization, while historic models hypothesize that the robustness of the code is an artifact or by-product of the mechanism of code evolution. In this paper, we evaluated the levels of robustness in existing non-canonical codes as well as codes that differ in only one codon assignment from the standard code. We found that the level of robustness of many of these codes is comparable or better than that of the standard code. Although these results do not preclude an adaptive origin of the genetic code, they suggest that the code was not selected for minimizing the effects of point mutations.
Collapse
Affiliation(s)
- David W Morgens
- Department of Biology, Pomona College, 175 W 6th Street, Claremont, CA, USA
| | | |
Collapse
|
19
|
|
20
|
Abstract
One important question in prebiotic chemistry is the search for simple structures that might have enclosed biological molecules in a cell-like space. Phospholipids, the components of biological membranes, are highly complex. Instead, we looked for molecules that might have been available on prebiotic Earth. Simple peptides with hydrophobic tails and hydrophilic heads that are made up of merely a combination of these robust, abiotically synthesized amino acids and could self-assemble into nanotubes or nanovesicles fulfilled our initial requirements. These molecules could provide a primitive enclosure for the earliest enzymes based on either RNA or peptides and other molecular structures with a variety of functions. We discovered and designed a class of these simple lipid-like peptides, which we describe in this Account. These peptides consist of natural amino acids (glycine, alanine, valine, isoleucine, leucine, aspartic acid, glutamic acid, lysine, and arginine) and exhibit lipid-like dynamic behaviors. These structures further undergo spontaneous assembly to form ordered arrangements including micelles, nanovesicles, and nanotubes with visible openings. Because of their simplicity and stability in water, such assemblies could provide examples of prebiotic molecular evolution that may predate the RNA world. These short and simple peptides have the potential to self-organize to form simple enclosures that stabilize other fragile molecules, to bring low concentration molecules into a local environment, and to enhance higher local concentration. As a result, these structures plausibly could not only accelerate the dehydration process for new chemical bond formation but also facilitate further self-organization and prebiotic evolution in a dynamic manner. We also expect that this class of lipid-like peptides will likely find a wide range of uses in the real world. Because of their favorable interactions with lipids, these lipid-like peptides have been used to solubilize and stabilize membrane proteins, both for scientific studies and for the fabrication of nanobiotechological devices. They can also increase the solubility of other water-insoluble molecules and increase long-term stability of some water-soluble proteins. Likewise, because of their lipophilicity, these structures can deliver molecular cargo, such as small molecules, siRNA, and DNA, in vivo for potential therapeutic applications.
Collapse
Affiliation(s)
- Shuguang Zhang
- Laboratory of Molecular Design, Center for Bits and Atoms, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139-4307, United States
| |
Collapse
|
21
|
Murtas G. Early self-reproduction, the emergence of division mechanisms in protocells. MOLECULAR BIOSYSTEMS 2012; 9:195-204. [PMID: 23232904 DOI: 10.1039/c2mb25375e] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Synthetic Biology approaches are proposing model systems and providing experimental evidences that life can arise as spontaneous chemical self-assembly process where the ability to reproduce itself is an essential feature of the living system. The appearance of early cells has required an amphiphilic membrane compartment to confine molecular information against diffusion, and the ability to self-replicate the boundary layer and the genetic information. The initial spontaneous self-replication mechanisms based on thermodynamic instability would have evolved in a prebiotic and later biological catalysis. Early studies demonstrate that fatty acids spontaneously assemble into bilayer membranes, building vesicles able to grow by incorporation of free lipid molecules and divide. Early replication mechanisms may have seen inorganic molecules playing a role as the first catalysts. The emergence of a short ribozyme or short catalytic peptide may have initiated the first prebiotic membrane lipid synthesis required for vesicle growth. The evolution of early catalysts towards the simplest translation machine to deliver proteins from RNA sequences was likely to give early birth to one single enzyme controlling protocell membrane division. The cell replication process assisted by complex enzymes for lipid synthesis is the result of evolved pathways in early cells. Evolution from organic molecules to protocells and early cells, thus from chemistry to biology, may have occurred in and out of the boundary layer. Here we review recent experimental work describing membrane and vesicle division mechanisms based on chemico-physical spontaneous processes, inorganic early catalysis and enzyme based mechanisms controlling early protocell division and finally the feedback from minimal genome studies.
Collapse
Affiliation(s)
- Giovanni Murtas
- Istituto di Farmacologia Traslazionale, CNR, via fosso del Cavaliere 100, 00133, Roma, Italy.
| |
Collapse
|
22
|
Cleaves HJ, Michalkova Scott A, Hill FC, Leszczynski J, Sahai N, Hazen R. Mineral-organic interfacial processes: potential roles in the origins of life. Chem Soc Rev 2012; 41:5502-25. [PMID: 22743683 DOI: 10.1039/c2cs35112a] [Citation(s) in RCA: 135] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Life is believed to have originated on Earth ∼4.4-3.5 Ga ago, via processes in which organic compounds supplied by the environment self-organized, in some geochemical environmental niches, into systems capable of replication with hereditary mutation. This process is generally supposed to have occurred in an aqueous environment and, likely, in the presence of minerals. Mineral surfaces present rich opportunities for heterogeneous catalysis and concentration which may have significantly altered and directed the process of prebiotic organic complexification leading to life. We review here general concepts in prebiotic mineral-organic interfacial processes, as well as recent advances in the study of mineral surface-organic interactions of potential relevance to understanding the origin of life.
Collapse
Affiliation(s)
- H James Cleaves
- Blue Marble Space Institute of Science, Washington, DC 20016, USA
| | | | | | | | | | | |
Collapse
|
23
|
McDonald GD, Storrie-Lombardi MC. Biochemical constraints in a protobiotic earth devoid of basic amino acids: the "BAA(-) world". ASTROBIOLOGY 2010; 10:989-1000. [PMID: 21162678 DOI: 10.1089/ast.2010.0484] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
It has been hypothesized in this journal and elsewhere, based on surveys of published data from prebiotic synthesis experiments and carbonaceous meteorite analyses, that basic amino acids such as lysine and arginine were not abundant on prebiotic Earth. If the basic amino acids were incorporated only rarely into the first peptides formed in that environment, it is important to understand what protobiotic chemistry is possible in their absence. As an initial test of the hypothesis that basic amino acid negative [BAA(-)] proteins could have performed at least a subset of protobiotic chemistry, the current work reports on a survey of 13 archaeal and 13 bacterial genomes that has identified 61 modern gene sequences coding for known or putative proteins not containing arginine or lysine. Eleven of the sequences found code for proteins whose functions are well known and important in the biochemistry of modern microbial life: lysine biosynthesis protein LysW, arginine cluster proteins, copper ion binding proteins, bacterial flagellar proteins, and PE or PPE family proteins. These data indicate that the lack of basic amino acids does not prevent peptides or proteins from serving useful structural and biochemical functions. However, as would be predicted from fundamental physicochemical principles, we see no fossil evidence of prebiotic BAA(-) peptide sequences capable of interacting directly with nucleic acids.
Collapse
Affiliation(s)
- Gene D McDonald
- Department of Chemistry and Biochemistry, University of Texas at Austin, Austin, Texas 78712, USA.
| | | |
Collapse
|
24
|
Seaborg DM. Was Wright right? The canonical genetic code is an empirical example of an adaptive peak in nature; deviant genetic codes evolved using adaptive bridges. J Mol Evol 2010; 71:87-99. [PMID: 20711776 PMCID: PMC2924497 DOI: 10.1007/s00239-010-9373-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2010] [Accepted: 07/02/2010] [Indexed: 11/30/2022]
Abstract
The canonical genetic code is on a sub-optimal adaptive peak with respect to its ability to minimize errors, and is close to, but not quite, optimal. This is demonstrated by the near-total adjacency of synonymous codons, the similarity of adjacent codons, and comparisons of frequency of amino acid usage with number of codons in the code for each amino acid. As a rare empirical example of an adaptive peak in nature, it shows adaptive peaks are real, not merely theoretical. The evolution of deviant genetic codes illustrates how populations move from a lower to a higher adaptive peak. This is done by the use of "adaptive bridges," neutral pathways that cross over maladaptive valleys by virtue of masking of the phenotypic expression of some maladaptive aspects in the genotype. This appears to be the general mechanism by which populations travel from one adaptive peak to another. There are multiple routes a population can follow to cross from one adaptive peak to another. These routes vary in the probability that they will be used, and this probability is determined by the number and nature of the mutations that happen along each of the routes. A modification of the depiction of adaptive landscapes showing genetic distances and probabilities of travel along their multiple possible routes would throw light on this important concept.
Collapse
Affiliation(s)
- David M Seaborg
- Foundation for Biological Conservation and Research, 1888 Pomar Way, Walnut Creek, CA 94598-1424, USA.
| |
Collapse
|
25
|
Higgs PG. A four-column theory for the origin of the genetic code: tracing the evolutionary pathways that gave rise to an optimized code. Biol Direct 2009; 4:16. [PMID: 19393096 PMCID: PMC2689856 DOI: 10.1186/1745-6150-4-16] [Citation(s) in RCA: 104] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2009] [Accepted: 04/24/2009] [Indexed: 11/18/2022] Open
Abstract
Background The arrangement of the amino acids in the genetic code is such that neighbouring codons are assigned to amino acids with similar physical properties. Hence, the effects of translational error are minimized with respect to randomly reshuffled codes. Further inspection reveals that it is amino acids in the same column of the code (i.e. same second base) that are similar, whereas those in the same row show no particular similarity. We propose a 'four-column' theory for the origin of the code that explains how the action of selection during the build-up of the code leads to a final code that has the observed properties. Results The theory makes the following propositions. (i) The earliest amino acids in the code were those that are easiest to synthesize non-biologically, namely Gly, Ala, Asp, Glu and Val. (ii) These amino acids are assigned to codons with G at first position. Therefore the first code may have used only these codons. (iii) The code rapidly developed into a four-column code where all codons in the same column coded for the same amino acid: NUN = Val, NCN = Ala, NAN = Asp and/or Glu, and NGN = Gly. (iv) Later amino acids were added sequentially to the code by a process of subdivision of codon blocks in which a subset of the codons assigned to an early amino acid were reassigned to a later amino acid. (v) Later amino acids were added into positions formerly occupied by amino acids with similar properties because this can occur with minimal disruption to the proteins already encoded by the earlier code. As a result, the properties of the amino acids in the final code retain a four-column pattern that is a relic of the earliest stages of code evolution. Conclusion The driving force during this process is not the minimization of translational error, but positive selection for the increased diversity and functionality of the proteins that can be made with a larger amino acid alphabet. Nevertheless, the code that results is one in which translational error is minimized. We define a cost function with which we can compare the fitness of codes with varying numbers of amino acids, and a barrier function, which measures the change in cost immediately after addition of a new amino acid. We show that the barrier is positive if an amino acid is added into a column with dissimilar properties, but negative if an amino acid is added into a column with similar physical properties. Thus, natural selection favours the assignment of amino acids to the positions that they occupy in the final code. Reviewers This article was reviewed by David Ardell, Eugene Koonin and Stephen Freeland (nominated by Laurence Hurst)
Collapse
Affiliation(s)
- Paul G Higgs
- Department of Physics and Astronomy, McMaster University, Hamilton, Ontario L8S 4M1, Canada.
| |
Collapse
|
26
|
Koonin EV, Novozhilov AS. Origin and evolution of the genetic code: the universal enigma. IUBMB Life 2009; 61:99-111. [PMID: 19117371 DOI: 10.1002/iub.146] [Citation(s) in RCA: 223] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
The genetic code is nearly universal, and the arrangement of the codons in the standard codon table is highly nonrandom. The three main concepts on the origin and evolution of the code are the stereochemical theory, according to which codon assignments are dictated by physicochemical affinity between amino acids and the cognate codons (anticodons); the coevolution theory, which posits that the code structure coevolved with amino acid biosynthesis pathways; and the error minimization theory under which selection to minimize the adverse effect of point mutations and translation errors was the principal factor of the code's evolution. These theories are not mutually exclusive and are also compatible with the frozen accident hypothesis, that is, the notion that the standard code might have no special properties but was fixed simply because all extant life forms share a common ancestor, with subsequent changes to the code, mostly, precluded by the deleterious effect of codon reassignment. Mathematical analysis of the structure and possible evolutionary trajectories of the code shows that it is highly robust to translational misreading but there are numerous more robust codes, so the standard code potentially could evolve from a random code via a short sequence of codon series reassignments. Thus, much of the evolution that led to the standard code could be a combination of frozen accident with selection for error minimization although contributions from coevolution of the code with metabolic pathways and weak affinities between amino acids and nucleotide triplets cannot be ruled out. However, such scenarios for the code evolution are based on formal schemes whose relevance to the actual primordial evolution is uncertain. A real understanding of the code origin and evolution is likely to be attainable only in conjunction with a credible scenario for the evolution of the coding principle itself and the translation system.
Collapse
Affiliation(s)
- Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
| | | |
Collapse
|
27
|
Rodin AS, Szathmáry E, Rodin SN. One ancestor for two codes viewed from the perspective of two complementary modes of tRNA aminoacylation. Biol Direct 2009; 4:4. [PMID: 19173731 PMCID: PMC2669802 DOI: 10.1186/1745-6150-4-4] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2009] [Accepted: 01/27/2009] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND The genetic code is brought into action by 20 aminoacyl-tRNA synthetases. These enzymes are evenly divided into two classes (I and II) that recognize tRNAs from the minor and major groove sides of the acceptor stem, respectively. We have reported recently that: (1) ribozymic precursors of the synthetases seem to have used the same two sterically mirror modes of tRNA recognition, (2) having these two modes might have helped in preventing erroneous aminoacylation of ancestral tRNAs with complementary anticodons, yet (3) the risk of confusion for the presumably earliest pairs of complementarily encoded amino acids had little to do with anticodons. Accordingly, in this communication we focus on the acceptor stem. RESULTS Our main result is the emergence of a palindrome structure for the acceptor stem's common ancestor, reconstructed from the phylogenetic trees of Bacteria, Archaea and Eukarya. In parallel, for pairs of ancestral tRNAs with complementary anticodons, we present updated evidence of concerted complementarity of the second bases in the acceptor stems. These two results suggest that the first pairs of "complementary" amino acids that were engaged in primordial coding, such as Gly and Ala, could have avoided erroneous aminoacylation if and only if the acceptor stems of their adaptors were recognized from the same, major groove, side. The class II protein synthetases then inherited this "primary preference" from isofunctional ribozymes. CONCLUSION Taken together, our results support the hypothesis that the genetic code per se (the one associated with the anticodons) and the operational code of aminoacylation (associated with the acceptor) diverged from a common ancestor that probably began developing before translation. The primordial advantage of linking some amino acids (most likely glycine and alanine) to the ancestral acceptor stem may have been selective retention in a protocell surrounded by a leaky membrane for use in nucleotide and coenzyme synthesis. Such acceptor stems (as cofactors) thus transferred amino acids as groups for biosynthesis. Later, with the advent of an anticodon loop, some amino acids (such as aspartic acid, histidine, arginine) assumed a catalytic role while bound to such extended adaptors, in line with the original coding coenzyme handle (CCH) hypothesis.
Collapse
Affiliation(s)
- Andrei S Rodin
- Human Genetics Center, School of Public Health, University of Texas, Houston, TX 77225, USA
| | - Eörs Szathmáry
- Collegium Budapest (Institute for Advanced Study), Szentháromság u. 2, H-1014 Budapest, Hungary
- Parmenides Center for the Study of Thinking, 14a Kardinal Faulhaber Str., D-80333 München, Germany
- Institute of Biology, Eötvös University, 1c Pázmány Péter sétány, H-1117 Budapest, Hungary
| | - Sergei N Rodin
- Collegium Budapest (Institute for Advanced Study), Szentháromság u. 2, H-1014 Budapest, Hungary
- Theoretical Biology, Department of Molecular Biology, Beckman Research Institute of the City of Hope, Duarte, CA 91010, USA
| |
Collapse
|
28
|
Lu Y, Freeland SJ. A quantitative investigation of the chemical space surrounding amino acid alphabet formation. J Theor Biol 2007; 250:349-61. [PMID: 18005995 DOI: 10.1016/j.jtbi.2007.10.007] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2007] [Revised: 09/21/2007] [Accepted: 10/08/2007] [Indexed: 11/29/2022]
Abstract
To date, explanations for the origin and emergence of the alphabet of amino acids encoded by the standard genetic code have been largely qualitative and speculative. Here, with the help of computational chemistry, we present the first quantitative exploration of nature's "choices" set against various models for plausible alternatives. Specifically, we consider the chemical space defined by three fundamental biophysical properties (size, charge, and hydrophobicity) to ask whether the amino acids that entered the genetic code exhibit a higher diversity than random samples of similar size drawn from several different definitions of amino acid possibility space. We found that in terms of the properties studied, the full, standard set of 20 biologically encoded amino acids is indeed significantly more diverse than an equivalently sized group drawn at random from the set of plausible, prebiotic alternatives (using the Murchison meteorite as a model for pre-biotic plausibility). However, when the set of possible amino acids is enlarged to include those that are produced by standard biosynthetic pathways (reflecting the widespread idea that many members of the standard alphabet were recruited in this way), then the genetically encoded amino acids can no longer be distinguished as more diverse than a random sample. Finally, if we turn to consider the overlap between biologically encoded amino acids and those that are prebiotically plausible, then we find that the biologically encoded subset are no more diverse as a group than would be expected from a random sample, unless the definition of "random sample" is adjusted to reflect possible prebiotic abundance (again, using the contents of the Murchison meteorite as our estimator). This final result is contingent on the accuracy of our computational estimates for amino acid properties, and prebiotic abundances, and an exploration of the likely effect of errors in our estimation reveals that our results should be treated with caution. We thus present this work as a first step in quantifying and thus testing various origin-of-life hypotheses regarding the origin and evolution of life's amino acid alphabet, and advocate the progress that would add valuable information in the future.
Collapse
Affiliation(s)
- Yi Lu
- Department of Biological Sciences, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 25250, USA
| | | |
Collapse
|
29
|
Wong JTF. Question 6: Coevolution Theory of the Genetic Code: A Proven Theory. ORIGINS LIFE EVOL B 2007; 37:403-8. [PMID: 17611816 DOI: 10.1007/s11084-007-9094-1] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2006] [Accepted: 02/12/2007] [Indexed: 10/23/2022]
Abstract
The coevolution theory proposes that primordial proteins consisted only of those amino acids readily obtainable from the prebiotic environment, representing about half the twenty encoded amino acids of today, and the missing amino acids entered the system as the code expanded along with pathways of amino acid biosynthesis. The isolation of genetic code mutants, and the antiquity of pretran synthesis revealed by the comparative genomics of tRNAs and aminoacyl-tRNA synthetases, have combined to provide a rigorous proof of the four fundamental tenets of the theory, thus solving the riddle of the structure of the universal genetic code.
Collapse
Affiliation(s)
- Jeffrey Tze-Fei Wong
- Department of Biochemistry and Applied Genomics Laboratory, Hong Kong University of Science & Technology, Clear Water Bay, Hong Kong, China.
| |
Collapse
|
30
|
Goodarzi H, Najafabadi HS, Hassani K, Nejad HA, Torabi N. On the optimality of the genetic code, with the consideration of coevolution theory by comparison of prominent cost measure matrices. J Theor Biol 2005; 235:318-25. [PMID: 15882694 DOI: 10.1016/j.jtbi.2005.01.012] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2004] [Revised: 01/20/2005] [Accepted: 01/24/2005] [Indexed: 11/22/2022]
Abstract
Statistical and biochemical studies have revealed non-random patterns in codon assignments. The canonical genetic code is known to be highly efficient in minimizing the effects of mistranslation errors and point mutations, since it is known that when an amino acid is converted to another due to error, the biochemical properties of the resulted amino acid are usually very similar to those of the original one. In this study, using altered forms of the fitness functions used in the prior studies, we have optimized the parameters involved in the calculation of the error minimizing property of the genetic code so that the genetic code outscores the random codes as much as possible. This work also compares two prominent matrices, the Mutation Matrix and Point Accepted Mutations 74-100 (PAM(74-100)). It has been resulted that the hypothetical properties of the coevolution theory of the genetic code are already considered in PAM(74-100), giving more evidence on the existence of bias towards the genetic code in this matrix. Furthermore, our results indicate that PAM(74-100) is biased towards the single base mistranslation occurrences in second codon position as well as the frequency of amino acids. Thus PAM(74-100) is not a suitable substitution matrix for the studies conducted on the evolution of the genetic code.
Collapse
Affiliation(s)
- Hani Goodarzi
- Department of Biotechnology, Faculty of Science, University of Tehran, Enghelab Ave., Tehran, Iran.
| | | | | | | | | |
Collapse
|
31
|
Yang CM. On the structural regularity in nucleobases and amino acids and relationship to the origin and evolution of the genetic code. ORIGINS LIFE EVOL B 2005; 35:275-95. [PMID: 16228642 DOI: 10.1007/s11084-005-1078-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2003] [Revised: 02/19/2004] [Accepted: 02/19/2004] [Indexed: 10/25/2022]
Abstract
To explore how chemical structures of both nucleobases and amino acids may have played a role in shaping the genetic code, numbers of sp2 hybrid nitrogen atoms in nucleobases were taken as a determinative measure for empirical stereo-electronic property to analyze the genetic code. Results revealed that amino acid hydropathy correlates strongly with the sp2 nitrogen atom numbers in nucleobases rather than with the overall electronic property such as redox potentials of the bases, reflecting that stereo-electronic property of bases may play a role. In the rearranged code, five simple but stereo-structurally distinctive amino acids (Gly, Pro, Val, Thr and Ala) and their codon quartets form a crossed intersection "core". Secondly, a re-categorization of the amino acids according to their beta-carbon stereochemistry, verified by charge density (at beta-carbon) calculation, results in five groups of stereo-structurally distinctive amino acids, the group leaders of which are Gly, Pro, Val, Thr and Ala, remarkably overlapping the above "core". These two lines of independent observations provide empirical arguments for a contention that a seemingly "frozen" "core" could have formed at a certain evolutionary stage. The possible existence of this codon "core" is in conformity with a previous evolutionary model whereby stereochemical interactions may have shaped the code. Moreover, the genetic code listed in UCGA succession together with this codon "core" has recently facilitated an identification of the unprecedented icosikaioctagon symmetry and bi-pyramidal nature of the genetic code.
Collapse
Affiliation(s)
- Chi Ming Yang
- Neurochemistry and System Chemical Biology, Nankai University, Tian Jin, 300071, China.
| |
Collapse
|
32
|
Abstract
The coevolution theory of the genetic code, which postulates that prebiotic synthesis was an inadequate source of all twenty protein amino acids, and therefore some of them had to be derived from the coevolving pathways of amino acid biosynthesis, has been assessed in the light of the discoveries of the past three decades. Its four fundamental tenets regarding the essentiality of amino acid biosynthesis, role of pretran synthesis, biosynthetic imprint on codon allocations and mutability of the encoded amino acids are proven by the new knowledge. Of the factors that guided the evolutionary selection of the universal code, the relative contributions of Amino Acid Biosynthesis: Error Minimization: Stereochemical Interaction are estimated to first approximation as 40,000,000:400:1, which suggests that amino acid biosynthesis represents the dominant factor shaping the code. The utility of the coevolution theory is demonstrated by its opening up experimental expansions of the code and providing a basis for locating the root of life.
Collapse
Affiliation(s)
- J Tze-Fei Wong
- Applied Genomics Laboratory and Department of Biochemistry, Hong Kong University of Science & Technology, Hong Kong, China.
| |
Collapse
|
33
|
Goodarzi H, Nejad HA, Torabi N. On the optimality of the genetic code, with the consideration of termination codons. Biosystems 2004; 77:163-73. [PMID: 15527955 DOI: 10.1016/j.biosystems.2004.05.031] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2004] [Revised: 05/09/2004] [Accepted: 05/25/2004] [Indexed: 11/18/2022]
Abstract
The existence of nonrandom patterns in codon assignments is supported by many statistical and biochemical studies. The canonical genetic code is known to be highly efficient in minimizing the effects of mistranslation errors and point mutations. For example, it is known that when an error induces the conversion of an amino acid to another, the biochemical properties of the resulting amino acid are usually very similar to that of the original. Prior studies include many attempts at quantitative estimation of the fraction of randomly generated codes which, based upon load minimization, score higher than the canonical genetic code. In this study, we took into consideration both the relative frequencies of amino acids and nonsense mistranslations, factors which had been previously ignored. Incorporation of these parameters, resulted in a fitness function (phi) which rendered the canonical genetic code to be highly optimized with respect to load minimization. Considering termination codons, we applied a biosynthetic version of the coevolution theory, however, with low significance. We employed a revised cost for the precursor-product pairs of amino acids and showed that the significance of this approach depends on the cost measure matrix used by the researcher. Thus, we have compared the two prominent matrices, point accepted mutations 74-100 (PAM(74-100)) and mutation matrix in our study.
Collapse
Affiliation(s)
- Hani Goodarzi
- Department of Biotechnology, Faculty of Science, University of Tehran, Enghelab Avenue, Tehran, Iran.
| | | | | |
Collapse
|
34
|
Abstract
Since discovering the pattern by which amino acids are assigned to codons within the standard genetic code, investigators have explored the idea that natural selection placed biochemically similar amino acids near to one another in coding space so as to minimize the impact of mutations and/or mistranslations. The analytical evidence to support this theory has grown in sophistication and strength over the years, and counterclaims questioning its plausibility and quantitative support have yet to transcend some significant weaknesses in their approach. These weaknesses are illustrated here by means of a simple simulation model for adaptive genetic code evolution. There remain ill explored facets of the 'error minimizing' code hypothesis, however, including the mechanism and pathway by which an adaptive pattern of codon assignments emerged, the extent to which natural selection created synonym redundancy, its role in shaping the amino acid and nucleotide languages, and even the correct interpretation of the adaptive codon assignment pattern: these represent fertile areas for future research.
Collapse
Affiliation(s)
- Stephen J Freeland
- Department of Biology, University of Maryland, Baltimore County, Catonsville, MD, USA.
| | | | | |
Collapse
|
35
|
Abstract
The coevolution theory of genetic code origin (Wong, J.T. 1975, Proc. Natl Acad. Sci. U.S.A.72, 1909-1912) is assumed here to be substantially correct. This theory is based on the strict parallelism of the biosynthetic relationships between amino acids and the organization of the genetic code and postulates that these relationships were mediated by tRNA-like molecules on which the biosynthetic transformations between precursor and product amino acids took place. These transformations underlay the mechanism that gave rise to genetic code organization. One of the pathways which represents these transformations found in current organisms, and which are thus probably molecular fossils, is the Met-tRNA(fMet)-->fMet-tRNA(fMet)pathway. This pathway is present only in the Bacteria domain. This along with other observations and arguments leads us to believe that this pathway is a clear violation of the universality of the genetic code. Furthermore, the presence of this pathway only in the Bacteria domain seems to imply that the translation apparatus was still rapidly evolving when this pathway was fixed. This, in turn, appears to imply that the last universal common ancestor was a progenote. Finally, the implications that the finding of this pathway has for the stereochemical theory of genetic code origin are discussed.
Collapse
Affiliation(s)
- M Di Giulio
- International Institute of Genetics and Biophysics, CNR, Via G. Marconi 10, 80125 Naples, Italy.
| |
Collapse
|
36
|
Ronneberg TA, Landweber LF, Freeland SJ. Testing a biosynthetic theory of the genetic code: fact or artifact? Proc Natl Acad Sci U S A 2000; 97:13690-5. [PMID: 11087835 PMCID: PMC17637 DOI: 10.1073/pnas.250403097] [Citation(s) in RCA: 61] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2000] [Indexed: 11/18/2022] Open
Abstract
It has long been conjectured that the canonical genetic code evolved from a simpler primordial form that encoded fewer amino acids [e.g., Crick, F. H. C. (1968) J. Mol. Biol. 38, 367-379]. The most influential form of this idea, "code coevolution" [Wong, J. T.-F. (1975) Proc. Natl. Acad. Sci. USA 72, 1909-1912], proposes that the genetic code coevolved with the invention of biosynthetic pathways for new amino acids. It further proposes that a comparison of modern codon assignments with the conserved metabolic pathways of amino acid biosynthesis can inform us about this history of code expansion. Here we re-examine the biochemical basis of this theory to test the validity of its statistical support. We show that the theory's definition of "precursor-product" amino acid pairs is unjustified biochemically because it requires the energetically unfavorable reversal of steps in extant metabolic pathways to achieve desired relationships. In addition, the theory neglects important biochemical constraints when calculating the probability that chance could assign precursor-product amino acids to contiguous codons. A conservative correction for these errors reveals a surprisingly high 23% probability that apparent patterns within the code are caused purely by chance. Finally, even this figure rests on post hoc assumptions about primordial codon assignments, without which the probability rises to 62% that chance alone could explain the precursor-product pairings found within the code. Thus we conclude that coevolution theory cannot adequately explain the structure of the genetic code.
Collapse
Affiliation(s)
- T A Ronneberg
- Departments of Ecology and Evolutionary Biology, and Chemistry, Princeton University, Princeton, NJ 08544, USA
| | | | | |
Collapse
|
37
|
Abstract
The evolutionary forces that produced the canonical genetic code before the last universal ancestor remain obscure. One hypothesis is that the arrangement of amino acid/codon assignments results from selection to minimize the effects of errors (e.g., mistranslation and mutation) on resulting proteins. If amino acid similarity is measured as polarity, the canonical code does indeed outperform most theoretical alternatives. However, this finding does not hold for other amino acid properties, ignores plausible restrictions on possible code structure, and does not address the naturally occurring nonstandard genetic codes. Finally, other analyses have shown that significantly better code structures are possible. Here, we show that if theoretically possible code structures are limited to reflect plausible biological constraints, and amino acid similarity is quantified using empirical data of substitution frequencies, the canonical code is at or very close to a global optimum for error minimization across plausible parameter space. This result is robust to variation in the methods and assumptions of the analysis. Although significantly better codes do exist under some assumptions, they are extremely rare and thus consistent with reports of an adaptive code: previous analyses which suggest otherwise derive from a misleading metric. However, all extant, naturally occurring, secondarily derived, nonstandard genetic codes do appear less adaptive. The arrangement of amino acid assignments to the codons of the standard genetic code appears to be a direct product of natural selection for a system that minimizes the phenotypic impact of genetic error. Potential criticisms of previous analyses appear to be without substance. That known variants of the standard genetic code appear less adaptive suggests that different evolutionary factors predominated before and after fixation of the canonical code. While the evidence for an adaptive code is clear, the process by which the code achieved this optimization requires further attention.
Collapse
Affiliation(s)
- S J Freeland
- Department of Ecology, Princeton University, University of Bath, Bath, England
| | | | | | | |
Collapse
|
38
|
Affiliation(s)
- S J Freeland
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, NJ 08544, USA
| | | | | |
Collapse
|
39
|
Load minimization of the genetic code: history does not explain the pattern. Proc Biol Sci 1998; 265:2111-2119. [PMCID: PMC1689495 DOI: 10.1098/rspb.1998.0547] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2023] Open
Abstract
The average effect of errors acting on a genetic code (the change in amino-acid meaning resulting from point mutation and mistranslation) may be quantified as its 'load'. The natural genetic code shows a clear property of minimizing this load when compared against randomly generated variant codes. Two hypotheses may be considered to explain this property. First, it is possible that the natural code is the result of selection to minimize this load. Second, it is possible that the property is an historical artefact. It has previously been reported that amino acids that have been assigned to codons starting with the same base come from the same biosynthetic pathway. This probably reflects the manner in which the code evolved from a simpler code, and says more about the physicochemical mechanisms of code assembly than about selection. The apparent load minimization of the code may therefore follow as a consequence of the fact that the code could not have evolved any other way than to allow biochemically related amino acids to have related codons. Here then, we ask whether this 'historical' force alone can explain the efficiency of the natural code in minimizing the effects of error. We therefore compare the error-minimizing ability of the natural code with that of alternative codes which, rather than being a random selection, are restricted such that amino acids from the same biochemical pathway all share the same first base. We find that although on average the restricted set of codes show a slightly higher efficiency than random ones, the real code remains extremely efficient relative to this subset P = 0.0003. This indicates that for the most part historical features do not explain the load- minimization property of the natural code. The importance of selection is further supported by the finding that the natural code's efficiency improves relative to that of historically related codes after allowance is made for realistic mutational and mistranslational biases. Once mistranslational biases have been considered, fewer than four per 100,000 alternative codes are better than the natural code.
Collapse
|
40
|
Di Giulio M. The beta-sheets of proteins, the biosynthetic relationships between amino acids, and the origin of the genetic code. ORIGINS LIFE EVOL B 1996; 26:589-609. [PMID: 9008882 DOI: 10.1007/bf01808222] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Two forces are generally hypothesised as being responsible for conditioning the origin of the organization of the genetic code: the physicochemical properties of amino acids and their biosynthetic relationships (relationships between precursor and product amino acids). If we assume that the biosynthetic relationships between amino acids were fundamental in defining the genetic code, then it is reasonable to expect that the distribution of physicochemical properties among the amino acids in precursor-product relationships cannot be random but must, rather, be affected by some selective constraints imposed by the structure of primitive proteins. Analysis shows that measurements representing the 'size' of amino acids, e.g. bulkiness, are specifically associated to the pairs of amino acids in precurso-product relationships. However, the size of amino acids cannot have been selected per se but, rather, because it reflects the beta-sheets of proteins which are, therefore, identified as the main adaptive theme promoting the origin of genetic code organization. Whereas there are no traces of the alpha-helix in the genetic code table. The above considerations make it necessary to re-examine the relationship linking the hydrophilicity of the dinucleoside monophosphates of anticodons and the polarity and bulkiness of amino acids. It can be concluded that this relationship seems to be meaningful only between the hydrophilicity of anticodons and the polarity of amino acids. The latter relationship is supposed to have been operative on hairpin structures, ancestors of the tRNA molecule. Moreover, it is on these very structures that the biosynthetic links between precursor and product amino acids might have been achieved, and the interaction between the hydrophilicity of anticodons and the polarity of amino acids might have had a role in the concession of codons (anticodons) from precursors to products.
Collapse
Affiliation(s)
- M Di Giulio
- International Institute of Genetics and Biophysics, CNR, Napoli, Italy
| |
Collapse
|
41
|
Abstract
A series of stages in the evolution of the genetic code is postulated, representing a chain of logical steps that leads to the present-day code. The stages described are based on translation machinery between the RNA world and that of amino acids, a model that consists of an RNA assembler strand along which RNA hairpin molecules are lined up, forming a picket-fence-like aggregate. Each hairpin carries an amino acid at the bottom of one of its legs, and the mutual proximity of amino acids achieved in this way facilitates their linkage into oligopeptides, in a sequence governed by the nucleotide sequence along the assembler strand, the code. The order in which amino acids are introduced into the code is in the approximate order of their availability, tempered by polarity and structural considerations.
Collapse
Affiliation(s)
- H Kuhn
- Max-Planck-Institut für biophysikalische Chemie, Göttingen, Germany
| | | |
Collapse
|
42
|
Abstract
Sequence data and evolutionary arguments suggest that a similarity may exist between the C-terminal end of glutaminyl-tRNA synthetase (GlnRS) and the catalytic domain of glutamine amidotransferases (GATs). If true, this would seem to imply that the amidation reaction of the Glut-tRNA(Gln) complex was the evolutionary precursor of the direct tRNA(Gln) aminoacylation pathway. Since the C-terminal end of GlnRS does not now have an important functional role, it can be concluded that this sequence contains vestiges that lead us to believe that it represents a palimpsest. This sequence still conserves the remains of the evolutionary transition: amidation reaction-->aminoacylation reaction. This may be important in deciding which mechanism gave origin to the genetic code organization. These observations, together with results obtained by Gatti and Tzagoloff [J. Mol. Biol. (1991) 218:557-568], lead to the hypothesis that the class I aminoacyl-tRNA synthetases (ARSs) may be homologous to the GATs of the trpG subfamily, while the class II ARSs may be homologous to the GATs of the purF subfamily. Overall, this seems to point to the existence of an intimate evolutionary link between the proteins involved in the primitive metabolism and aminoacyl-tRNA synthetases.
Collapse
Affiliation(s)
- M Di Giulio
- International Institute of Genetics and Biophysics, CNR, Naples, Napoli, Italy
| |
Collapse
|
43
|
Abstract
A diversification of the genetic code based on the number of codons available for the proteinous amino acids is established. Three groups of amino acids during evolution of the code are distinguished. On the basis of their chemical complexity those amino acids emerging later in a translation process are derived. Codon number and chemical complexity indicate that His, Phe, Tyr, Cys and either Lys or Asn were introduced in the second stage, whereas the number of codons alone gives evidence that Trp and Met were introduced in the third stage. The amino acids of stage 1 use purine-rich codons, while all the amino acids introduced in the second stage, in contrast, use pyrimidines in the third position of their codons. A low abundance of pyrimidines during early translation is derived. This assumption is supported by experiments on non-enzymatic replication and interactions of hairpin loops with a complementary strand. A back extrapolation concludes a high purine content of the first nucleic acids, which gradually decreased during their evolution. Amino acids independently available from prebiotic synthesis were thus correlated to purine-rich codons. Implications on the prebiotic replication are discussed also in the light of recent codon usage data.
Collapse
Affiliation(s)
- U Baumann
- Department of Biochemistry, University of Houston, Texas
| | | |
Collapse
|
44
|
|
45
|
Wong JT. Membership mutation of the genetic code: loss of fitness by tryptophan. Proc Natl Acad Sci U S A 1983; 80:6303-6. [PMID: 6413975 PMCID: PMC394285 DOI: 10.1073/pnas.80.20.6303] [Citation(s) in RCA: 70] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
Bacillus subtilis strain QB928, a tryptophan-auxotroph, was serially mutated to yield strain HR15. For QB928, tryptophan functioned as a competent amino acid and 4-fluorotryptophan as merely an inferior analogue. For HR15, these roles were reversed. The tryptophan/4-fluorotryptophan growth ratio decreased by a factor of 2 X 10(4) in the transition from QB928 to HR15.
Collapse
|
46
|
Abstract
The selective Darwinian theory of chemical evolution is critically reviewed and the tentative conclusion is reached that neither the theoretical analyses nor the experiments with phages can really prove it. An alternative proposal is put forth which considers the possibility that the biogenetic process has been driven by stochastic forces, e.g. it took place in the absence of Darwinian selection which, in turn, started only when the first protocells came into existence. The dynamics of the early self-organization of living structures should be understood in terms of self-assembly. The complexification of living matter is thus not represented as a gradual phenomenon but as a series of abrupt and relatively fast transitions consisting in the aggregation of pre-systems which had evolved by their own. The shift towards new and variegated states proposed by the bifurcation theory are not considered particularly relevant for reasons reported in the test, nor is it believed that dissipation can entirely account for the order observed in living cells.
Collapse
|
47
|
Abstract
Factors involved in the selection of the 20 protein L-alpha-amino acids during chemical evolution and the early stages of Darwinian evolution are discussed. The selection is considered on the basis of the availability in the primitive ocean, function in proteins, the stability of the amino acid and its peptides, stability to racemization, and stability on the transfer RNA. We conclude that aspartic acid, glutamic acid, arginine, lysine, serine and possibly threonine are the best choices for acidic, basic and hydroxy amino acids. The hydrophobic amino acids are reasonable choices, except for the puzzling absences of alpha-amino-n-butyric acid, norvaline and norleucine. The choices of the sulfur and aromatic amino acids seem reasonable, but are not compelling. Asparagine and glutamine are apparently not primitive. If life were to arise on another planet, we would expect that the catalysts would be poly-alpha-amino acids and that about 75% of the amino acids would be the same as on the earth.
Collapse
|
48
|
|
49
|
Wong JT. Role of minimization of chemical distances between amino acids in the evolution of the genetic code. Proc Natl Acad Sci U S A 1980; 77:1083-6. [PMID: 6928661 PMCID: PMC348428 DOI: 10.1073/pnas.77.2.1083] [Citation(s) in RCA: 72] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
The allocation of codons in the genetic code makes possible a moderate minimization of the chemical distances between pairs of neighboring amino acids in the code. However, the code is neither a global nor a local optimum with respect to distance minimization. These findings do not support the physicochemical postulate that distance minimization was a major factor shaping the evolution of the genetic code. They agree with the coevolution theory, which proposes that genetic code evolution was predominantly determined by the concession of codons from precursor to product amino acids in an expansion of the code to accommodate new varieties of amino acids, with distance minimization playing a subsidiary role in deciding the choice of codons to be acquired by the product amino acids from the codon domains of the precursor amino acids.
Collapse
|