1
|
Di Giulio M. Theories of the origin of the genetic code: Strong corroboration for the coevolution theory. Biosystems 2024; 239:105217. [PMID: 38663520 DOI: 10.1016/j.biosystems.2024.105217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 04/16/2024] [Accepted: 04/18/2024] [Indexed: 04/29/2024]
Abstract
I analyzed all the theories and models of the origin of the genetic code, and over the years, I have considered the main suggestions that could explain this origin. The conclusion of this analysis is that the coevolution theory of the origin of the genetic code is the theory that best captures the majority of observations concerning the organization of the genetic code. In other words, the biosynthetic relationships between amino acids would have heavily influenced the origin of the organization of the genetic code, as supported by the coevolution theory. Instead, the presence in the genetic code of physicochemical properties of amino acids, which have also been linked to the physicochemical properties of anticodons or codons or bases by stereochemical and physicochemical theories, would simply be the result of natural selection. More explicitly, I maintain that these correlations between codons, anticodons or bases and amino acids are in fact the result not of a real correlation between amino acids and codons, for example, but are only the effect of the intervention of natural selection. Specifically, in the genetic code table we expect, for example, that the most similar codons - that is, those that differ by only one base - will have more similar physicochemical properties. Therefore, the 64 codons of the genetic code table ordered in a certain way would also represent an ordering of some of their physicochemical properties. Now, a study aimed at clarifying which physicochemical property of amino acids has influenced the allocation of amino acids in the genetic code has established that the partition energy of amino acids has played a role decisive in this. Indeed, under some conditions, the genetic code was found to be approximately 98% optimized on its columns. In this same work, it was shown that this was most likely the result of the action of natural selection. If natural selection had truly allocated the amino acids in the genetic code in such a way that similar amino acids also have similar codons - this, not through a mechanism of physicochemical interaction between, for example, codons and amino acids - then it might turn out that even different physicochemical properties of codons (or anticodons or bases) show some correlation with the physicochemical properties of amino acids, simply because the partition energy of amino acids is correlated with other physicochemical properties of amino acids. It is very likely that this would inevitably lead to a correlation between codons (or anticodons or bases) and amino acids. In other words, since the codons (anticodons or bases) are ordered in the genetic code, that is to say, some of their physicochemical properties should also be ordered by a similar order, and given that the amino acids would also appear to have been ordered in the genetic code by selection natural, then it should inevitably turn out that there is a correlation between, for example, the hydrophobicity of anticodons and that of amino acids. Instead, the intervention of natural selection in organizing the genetic code would appear to be highly compatible with the main mechanism of structuring the genetic code as supported by the coevolution theory. This would make the coevolution theory the only plausible explanation for the origin of the genetic code.
Collapse
Affiliation(s)
- Massimo Di Giulio
- The Ionian School, Early Evolution of Life Department, Genetic Code and tRNA Origin Laboratory, Via Roma 19, 67030, Alfedena, L'Aquila, Italy.
| |
Collapse
|
2
|
Caldararo F, Di Giulio M. The genetic code is very close to a global optimum in a model of its origin taking into account both the partition energy of amino acids and their biosynthetic relationships. Biosystems 2022; 214:104613. [DOI: 10.1016/j.biosystems.2022.104613] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2021] [Revised: 01/16/2022] [Accepted: 01/17/2022] [Indexed: 01/23/2023]
|
3
|
Catalano C, AL Mughram MH, Guo Y, Kellogg GE. 3D interaction homology: Hydropathic interaction environments of serine and cysteine are strikingly different and their roles adapt in membrane proteins. Curr Res Struct Biol 2021; 3:239-256. [PMID: 34693344 PMCID: PMC8517007 DOI: 10.1016/j.crstbi.2021.09.002] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Revised: 08/23/2021] [Accepted: 09/23/2021] [Indexed: 12/12/2022] Open
Abstract
Atomic-resolution protein structural models are prerequisites for many downstream activities like structure-function studies or structure-based drug discovery. Unfortunately, this data is often unavailable for some of the most interesting and therapeutically important proteins. Thus, computational tools for building native-like structural models from less-than-ideal experimental data are needed. To this end, interaction homology exploits the character, strength and loci of the sets of interactions that define a structure. Each residue type has its own limited set of backbone angle-dependent interaction motifs, as defined by their environments. In this work, we characterize the interactions of serine, cysteine and S-bridged cysteine in terms of 3D hydropathic environment maps. As a result, we explore several intriguing questions. Are the environments different between the isosteric serine and cysteine residues? Do some environments promote the formation of cystine S-S bonds? With the increasing availability of structural data for water-insoluble membrane proteins, are there environmental differences for these residues between soluble and membrane proteins? The environments surrounding serine and cysteine residues are dramatically different: serine residues are about 50% solvent exposed, while cysteines are only 10% exposed; the latter are more involved in hydrophobic interactions although there are backbone angle-dependent differences. Our analysis suggests that one driving force for -S-S- bond formation is a rather substantial increase in burial and hydrophobic interactions in cystines. Serine and cysteine become less and more, respectively, solvent-exposed in membrane proteins. 3D hydropathic environment maps are an evolving structure analysis tool showing promise as elements in a new protein structure prediction paradigm.
Collapse
Affiliation(s)
- Claudio Catalano
- Department of Medicinal Chemistry, Virginia Commonwealth University, Richmond, VA, USA
- Institute for Structural Biology, Drug Discovery and Development, Virginia Commonwealth University, Richmond, VA, USA
| | - Mohammed H. AL Mughram
- Department of Medicinal Chemistry, Virginia Commonwealth University, Richmond, VA, USA
- Institute for Structural Biology, Drug Discovery and Development, Virginia Commonwealth University, Richmond, VA, USA
| | - Youzhong Guo
- Department of Medicinal Chemistry, Virginia Commonwealth University, Richmond, VA, USA
- Institute for Structural Biology, Drug Discovery and Development, Virginia Commonwealth University, Richmond, VA, USA
| | - Glen E. Kellogg
- Department of Medicinal Chemistry, Virginia Commonwealth University, Richmond, VA, USA
- Institute for Structural Biology, Drug Discovery and Development, Virginia Commonwealth University, Richmond, VA, USA
- Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, VA, USA
| |
Collapse
|
4
|
Zhou P, Liu Q, Wu T, Miao Q, Shang S, Wang H, Chen Z, Wang S, Wang H. Systematic Comparison and Comprehensive Evaluation of 80 Amino Acid Descriptors in Peptide QSAR Modeling. J Chem Inf Model 2021; 61:1718-1731. [DOI: 10.1021/acs.jcim.0c01370] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
- Peng Zhou
- Center for Informational Biology, University of Electronic Science and Technology of China (UESTC) at Qingshuihe Campus, Chengdu 611731, China
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Qian Liu
- Center for Informational Biology, University of Electronic Science and Technology of China (UESTC) at Qingshuihe Campus, Chengdu 611731, China
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Ting Wu
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Qingqing Miao
- Center for Informational Biology, University of Electronic Science and Technology of China (UESTC) at Qingshuihe Campus, Chengdu 611731, China
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Shuyong Shang
- College of Chemistry and Life Science, Chengdu Normal University, Chengdu 611130, China
| | - Heyi Wang
- Center for Informational Biology, University of Electronic Science and Technology of China (UESTC) at Qingshuihe Campus, Chengdu 611731, China
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Zheng Chen
- Center for Informational Biology, University of Electronic Science and Technology of China (UESTC) at Qingshuihe Campus, Chengdu 611731, China
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Shaozhou Wang
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Heyan Wang
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| |
Collapse
|
5
|
Barbieri M. Evolution of the genetic code: The ambiguity-reduction theory. Biosystems 2019; 185:104024. [DOI: 10.1016/j.biosystems.2019.104024] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Revised: 08/26/2019] [Accepted: 08/26/2019] [Indexed: 10/26/2022]
|
6
|
Di Giulio M. The key role of the elongation factors in the origin of the organization of the genetic code. Biosystems 2019; 181:20-26. [DOI: 10.1016/j.biosystems.2019.04.009] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Revised: 04/13/2019] [Accepted: 04/13/2019] [Indexed: 11/29/2022]
|
7
|
Seligmann H. Localized Context-Dependent Effects of the "Ambush" Hypothesis: More Off-Frame Stop Codons Downstream of Shifty Codons. DNA Cell Biol 2019; 38:786-795. [PMID: 31157984 DOI: 10.1089/dna.2019.4725] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The ambush hypothesis speculates that off-frame stop codons increase translational efficiency after ribosomal frameshifts by stopping early frameshifted translation. Some evidences fit this hypothesis: (1) synonymous codon usages increase with their potential contribution to off-frame stops; (2) the genetic code assigns frequent amino acids to codon families contributing to off-frame stops; (3) positive biases for off-frame stops (AT rich) occur despite adverse nucleotide (GC) biases; and (4) mitochondrial off-frame stop codon densities increase with ribosomal structural instability, potential proxy of frameshift frequencies. In this study, analyses of vertebrate mitogenes and tRNA synthetase genes from all superkingdoms and viruses test a new prediction of the ambush hypothesis: sequences immediately downstream of frameshift-inducing homopolymer codons (AAA, CCC, GGG, and TTT) are off-frame stop rich. Codons immediately downstream of homopolymer codons form more than average off-frame stops, biases are stronger than for corresponding upstream distances and for any other group of synonymous codons. Sequences downstream of that high-density region are off-frame stop depleted. This decrease suggests that off-frame stops, combined with suppressor tRNAs regulate translation of overlapping coding sequences. Results show the predictive power of the ambush hypothesis, from macroevolutionary (genetic code structure) to detailed gene sequence anatomy.
Collapse
Affiliation(s)
- Hervé Seligmann
- The National Natural History Collections, The Hebrew University of Jerusalem, Jerusalem, Israel
| |
Collapse
|
8
|
Facchiano A, Di Giulio M. The genetic code is not an optimal code in a model taking into account both the biosynthetic relationships between amino acids and their physicochemical properties. J Theor Biol 2018; 459:45-51. [DOI: 10.1016/j.jtbi.2018.09.021] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Revised: 09/04/2018] [Accepted: 09/19/2018] [Indexed: 01/22/2023]
|
9
|
Di Giulio M. A Non-neutral Origin for Error Minimization in the Origin of the Genetic Code. J Mol Evol 2018; 86:593-597. [PMID: 30361751 DOI: 10.1007/s00239-018-9871-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2018] [Accepted: 10/17/2018] [Indexed: 11/29/2022]
Abstract
Massey (J Mol Evol 67:510-516, 2008; J Theor Biol 408:237-242, 2016; Nat Comput. https://doi.org/10.1007/s11047-017-9669-3, 2018) claims that the error minimization of the genetic code is derived by means of a neutral process and was not due to the action of natural selection. Here, I argue that this neutralist hypothesis of the origin of error minimization is not based directly on any neutral process but it could be only indirectly. On the contrary, it has been natural selection that has acted during the origin of the genetic code determining the property that similar amino acids are coded by similar codons within the genetic code table.
Collapse
Affiliation(s)
- Massimo Di Giulio
- Early Evolution of Life Laboratory, Institute of Biosciences and Bioresources, CNR, Via P. Castellino, 111, 80131, Naples, Italy.
| |
Collapse
|
10
|
Bywater RP. Why twenty amino acid residue types suffice(d) to support all living systems. PLoS One 2018; 13:e0204883. [PMID: 30321190 PMCID: PMC6188899 DOI: 10.1371/journal.pone.0204883] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2017] [Accepted: 09/17/2018] [Indexed: 11/21/2022] Open
Abstract
It is well known that proteins are built up from an alphabet of 20 different amino acid types. These suffice to enable the protein to fold into its operative form relevant to its required functional roles. For carrying out these allotted functions, there may in some cases be a need for post-translational modifications and it has been established that an additional three types of amino acid have at some point been recruited into this process. But it still remains the case that the 20 residue types referred to are the major building blocks in all terrestrial proteins, and probably "universally". Given this fact, it is surprising that no satisfactory answer has been given to the two questions: "why 20?" and "why just these 20?". Furthermore, a suggestion is made as to how these 20 map to the codon repertoire which in principle has the capacity to cater for 64 different residue types. Attempts are made in this paper to answer these questions by employing a combination of quantum chemical and chemoinformatic tools which are applied to the standard 20 amino acid types as well as 3 “non-standard” types found in nature, a set of fictitious but feasible analog structures designed to test the need for greater coverage of function space and the collection of candidate alternative structures found either on meteorites or in experiments designed to reconstruct pre-life scenarios.
Collapse
|
11
|
Di Giulio M. A discriminative test among the different theories proposed to explain the origin of the genetic code: The coevolution theory finds additional support. Biosystems 2018; 169-170:1-4. [DOI: 10.1016/j.biosystems.2018.05.002] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2018] [Revised: 04/26/2018] [Accepted: 05/07/2018] [Indexed: 11/29/2022]
|
12
|
Di Giulio M. The aminoacyl-tRNA synthetases had only a marginal role in the origin of the organization of the genetic code: Evidence in favor of the coevolution theory. J Theor Biol 2017; 432:14-24. [DOI: 10.1016/j.jtbi.2017.08.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2017] [Revised: 08/01/2017] [Accepted: 08/03/2017] [Indexed: 10/19/2022]
|
13
|
Nemzer LR. A binary representation of the genetic code. Biosystems 2017; 155:10-19. [PMID: 28300609 DOI: 10.1016/j.biosystems.2017.03.001] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2016] [Revised: 03/03/2017] [Accepted: 03/06/2017] [Indexed: 12/23/2022]
Abstract
This article introduces a novel binary representation of the canonical genetic code based on both the structural similarities of the nucleotides, as well as the physicochemical properties of the encoded amino acids. Each of the four mRNA bases is assigned a unique 2-bit identifier, so that the 64 triplet codons are each indexed by a 6-bit label. The ordering of the bits reflects the hierarchical organization manifested by the DNA replication/repair and tRNA translation systems. In this system, transition and transversion mutations are naturally expressed as binary operations, and the severities of the different point mutations can be analyzed. Using a principal component analysis, it is shown that the physicochemical properties of amino acids related to protein folding also correlate with certain bit positions of their respective labels. Thus, the likelihood for a point mutation to be conservative, and less likely to cause a change in protein functionality, can be estimated.
Collapse
Affiliation(s)
- Louis R Nemzer
- Department of Chemistry and Physics, Halmos College of Natural Sciences and Oceanography, Nova Southeastern University, Davie, FL, USA.
| |
Collapse
|
14
|
Some pungent arguments against the physico-chemical theories of the origin of the genetic code and corroborating the coevolution theory. J Theor Biol 2017; 414:1-4. [DOI: 10.1016/j.jtbi.2016.11.014] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2016] [Revised: 10/26/2016] [Accepted: 11/16/2016] [Indexed: 10/20/2022]
|
15
|
Nemzer LR. Shannon information entropy in the canonical genetic code. J Theor Biol 2017; 415:158-170. [DOI: 10.1016/j.jtbi.2016.12.010] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2016] [Revised: 11/30/2016] [Accepted: 12/12/2016] [Indexed: 11/15/2022]
|
16
|
Di Giulio M. The lack of foundation in the mechanism on which are based the physico-chemical theories for the origin of the genetic code is counterposed to the credible and natural mechanism suggested by the coevolution theory. J Theor Biol 2016; 399:134-40. [PMID: 27067244 DOI: 10.1016/j.jtbi.2016.04.005] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2016] [Revised: 03/29/2016] [Accepted: 04/01/2016] [Indexed: 11/25/2022]
Abstract
I analyze the mechanism on which are based the majority of theories that put to the center of the origin of the genetic code the physico-chemical properties of amino acids. As this mechanism is based on excessive mutational steps, I conclude that it could not have been operative or if operative it would not have allowed a full realization of predictions of these theories, because this mechanism contained, evidently, a high indeterminacy. I make that disapproving the four-column theory of the origin of the genetic code (Higgs, 2009) and reply to the criticism that was directed towards the coevolution theory of the origin of the genetic code. In this context, I suggest a new hypothesis that clarifies the mechanism by which the domains of codons of the precursor amino acids would have evolved, as predicted by the coevolution theory. This mechanism would have used particular elongation factors that would have constrained the evolution of all amino acids belonging to a given biosynthetic family to the progenitor pre-tRNA, that for first recognized, the first codons that evolved in a certain codon domain of a determined precursor amino acid. This happened because the elongation factors recognized two characteristics of the progenitor pre-tRNAs of precursor amino acids, which prevented the elongation factors from recognizing the pre-tRNAs belonging to biosynthetic families of different precursor amino acids. Finally, I analyze by means of Fisher's exact test, the distribution, within the genetic code, of the biosynthetic classes of amino acids and the ones of polarity values of amino acids. This analysis would seem to support the biosynthetic classes of amino acids over the ones of polarity values, as the main factor that led to the structuring of the genetic code, with the physico-chemical properties of amino acids playing only a subsidiary role in this evolution. As a whole, the full analysis brings to the conclusion that the coevolution theory of the origin of the genetic code would be a theory highly corroborated.
Collapse
Affiliation(s)
- Massimo Di Giulio
- Early Evolution of Life Laboratory, Institute of Biosciences and Bioresources, CNR, Via P. Castellino, 111, 80131 Naples, Italy.
| |
Collapse
|
17
|
Frappat L, Sciarrino A, Sorba P. Prediction of physical-chemical properties of amino acids from genetic code. J Biol Phys 2013; 28:17-26. [PMID: 23345754 DOI: 10.1023/a:1016274329603] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Using the crystal basis model of the genetic code, a set of relations between the physical-chemical properties of the amino acids are derived and compared with the experimental data. A prediction for the not yet measured thermodynamical parameters of three amino acids is done.
Collapse
|
18
|
Abriata LA, Salverda MLM, Tomatis PE. Sequence-function-stability relationships in proteins from datasets of functionally annotated variants: the case of TEM β-lactamases. FEBS Lett 2012; 586:3330-5. [PMID: 22850115 DOI: 10.1016/j.febslet.2012.07.010] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2012] [Revised: 07/02/2012] [Accepted: 07/04/2012] [Indexed: 11/19/2022]
Abstract
A dataset of TEM lactamase variants with different substrate and inhibition profiles was compiled and analyzed. Trends show that loops are the main evolvable regions in these enzymes, gradually accumulating mutations to generate increasingly complex functions. Notably, many mutations present in evolved enzymes are also found in simpler variants, probably originating functional promiscuity. Following a function-stability tradeoff, the increase in functional complexity driven by accumulation of mutations fosters the incorporation of other stability-restoring substitutions, although our analysis suggests they might not be as "global" as generally accepted and seem instead specific to different networks of protein sites. Finally, we show how this dataset can be used to model functional changes in TEMs based on the physicochemical properties of the amino acids.
Collapse
Affiliation(s)
- Luciano A Abriata
- Instituto de Biología Molecular y Celular de Rosario, Rosario, Argentina.
| | | | | |
Collapse
|
19
|
Zhang Z, Yu J. On the organizational dynamics of the genetic code. GENOMICS PROTEOMICS & BIOINFORMATICS 2011; 9:21-9. [PMID: 21641559 PMCID: PMC5054158 DOI: 10.1016/s1672-0229(11)60004-1] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/30/2010] [Accepted: 10/26/2010] [Indexed: 11/23/2022]
Abstract
The organization of the canonical genetic code needs to be thoroughly illuminated. Here we reorder the four nucleotides—adenine, thymine, guanine and cytosine—according to their emergence in evolution, and apply the organizational rules to devising an algebraic representation for the canonical genetic code. Under a framework of the devised code, we quantify codon and amino acid usages from a large collection of 917 prokaryotic genome sequences, and associate the usages with its intrinsic structure and classification schemes as well as amino acid physicochemical properties. Our results show that the algebraic representation of the code is structurally equivalent to a content-centric organization of the code and that codon and amino acid usages under different classification schemes were correlated closely with GC content, implying a set of rules governing composition dynamics across a wide variety of prokaryotic genome sequences. These results also indicate that codons and amino acids are not randomly allocated in the code, where the six-fold degenerate codons and their amino acids have important balancing roles for error minimization. Therefore, the content-centric code is of great usefulness in deciphering its hitherto unknown regularities as well as the dynamics of nucleotide, codon, and amino acid compositions.
Collapse
Affiliation(s)
- Zhang Zhang
- Plant Stress Genomics Research Center, Division of Chemical and Life Sciences and Engineering, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | | |
Collapse
|
20
|
Tlusty T. A colorful origin for the genetic code: Information theory, statistical mechanics and the emergence of molecular codes. Phys Life Rev 2010; 7:362-76. [DOI: 10.1016/j.plrev.2010.06.002] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2009] [Revised: 01/25/2010] [Accepted: 02/06/2010] [Indexed: 10/19/2022]
|
21
|
2-Adic clustering of the PAM matrix. J Theor Biol 2009; 261:396-406. [DOI: 10.1016/j.jtbi.2009.08.014] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2008] [Revised: 06/25/2009] [Accepted: 08/04/2009] [Indexed: 11/21/2022]
|
22
|
Sun FJ, Caetano-Anollés G. Evolutionary patterns in the sequence and structure of transfer RNA: a window into early translation and the genetic code. PLoS One 2008; 3:e2799. [PMID: 18665254 PMCID: PMC2474678 DOI: 10.1371/journal.pone.0002799] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2008] [Accepted: 07/02/2008] [Indexed: 01/06/2023] Open
Abstract
Transfer RNA (tRNA) molecules play vital roles during protein synthesis. Their acceptor arms are aminoacylated with specific amino acid residues while their anticodons delimit codon specificity. The history of these two functions has been generally linked in evolutionary studies of the genetic code. However, these functions could have been differentially recruited as evolutionary signatures were left embedded in tRNA molecules. Here we built phylogenies derived from the sequence and structure of tRNA, we forced taxa into monophyletic groups using constraint analyses, tested competing evolutionary hypotheses, and generated timelines of amino acid charging and codon discovery. Charging of Sec, Tyr, Ser and Leu appeared ancient, while specificities related to Asn, Met, and Arg were derived. The timelines also uncovered an early role of the second and then first codon bases, identified codons for Ala and Pro as the most ancient, and revealed important evolutionary take-overs related to the loss of the long variable arm in tRNA. The lack of correlation between ancestries of amino acid charging and encoding indicated that the separate discoveries of these functions reflected independent histories of recruitment. These histories were probably curbed by co-options and important take-overs during early diversification of the living world.
Collapse
Affiliation(s)
- Feng-Jie Sun
- Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Gustavo Caetano-Anollés
- Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| |
Collapse
|
23
|
|
24
|
Franke R, Gruska A, Devillers J, Chessel D, Dunn WJ, Wold S, Lewi PJ, Ford MG, Salt DW, van de Waterbeemd H, McFarland JW, Gans DJ. Multivariate Data Analysis of Chemical and Biological Data. ACTA ACUST UNITED AC 2008. [DOI: 10.1002/9783527615452.ch4] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
25
|
Scott JH, O'Brien DM, Emerson D, Sun H, McDonald GD, Salgado A, Fogel ML. An examination of the carbon isotope effects associated with amino acid biosynthesis. ASTROBIOLOGY 2006; 6:867-80. [PMID: 17155886 DOI: 10.1089/ast.2006.6.867] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Stable carbon isotope ratios (delta(13)C) were determined for alanine, proline, phenylalanine, valine, leucine, isoleucine, aspartate (aspartic acid and asparagine), glutamate (glutamic acid and glutamine), lysine, serine, glycine, and threonine from metabolically diverse microorganisms. The microorganisms examined included fermenting bacteria, organotrophic, chemolithotrophic, phototrophic, methylotrophic, methanogenic, acetogenic, acetotrophic, and naturally occurring cryptoendolithic communities from the Dry Valleys of Antarctica. Here we demonstrated that reactions involved in amino acid biosynthesis can be used to distinguish amino acids formed by life from those formed by nonbiological processes. The unique patterns of delta(13)C imprinted by life on amino acids produced a biological bias. We also showed that, by applying discriminant function analysis to the delta(13)C value of a pool of amino acids formed by biological activity, it was possible to identify key aspects of intermediary carbon metabolism in the microbial world. In fact, microorganisms examined in this study could be placed within one of three metabolic groups: (1) heterotrophs that grow by oxidizing compounds containing three or more carbon-to-carbon bonds (fermenters and organotrophs), (2) autotrophs that grow by taking up carbon dioxide (chemolitotrophs and phototrophs), and (3) acetoclastic microbes that grow by assimilation of formaldehyde or acetate (methylotrophs, methanogens, acetogens, and acetotrophs). Furthermore, we demonstrated that cryptoendolithic communities from Antarctica grouped most closely with the autotrophs, which indicates that the dominant metabolic pathways in these communities are likely those utilized for CO(2 )fixation. We propose that this technique can be used to determine the dominant metabolic types in a community and reveal the overall flow of carbon in a complex ecosystem.
Collapse
Affiliation(s)
- James H Scott
- Department of Earth Sciences, Dartmouth College, Hanover, New Hampshire 03755, USA.
| | | | | | | | | | | | | |
Collapse
|
26
|
Saftalov L, Smith PA, Friedman AM, Bailey-Kellogg C. Site-directed combinatorial construction of chimaeric genes: general method for optimizing assembly of gene fragments. Proteins 2006; 64:629-42. [PMID: 16783818 DOI: 10.1002/prot.20984] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Site-directed construction of chimaeric genes by in vitro recombination "mixes-and-matches" precise building blocks from multiple parent proteins, generating libraries of hybrids to be tested for structure-function relationships and/or screened for favorable properties and novel enzymatic activities. A direct annealing and ligation method can construct chimaeric genes without requiring sequence identity between parents, except for the short (approximately 3 nt) sequences of the fragment overhangs used for specific ligation. Careful planning of the assembly process is necessary, though, in order to ensure effective construction of desired fragment assemblies and to avoid undesired assemblies (e.g., repetition of fragments, fragments out of order). We develop algorithms for specific planned ligation of short overhangs (SPLISO) that efficiently explore possible assembly plans, varying the fragment overhangs and the order of ligation steps in the assembly pathway. While there is a combinatorial explosion in the number of possible assembly plans as the number of breakpoints and parent genes increases, we employ a dynamic programming approach to find globally optimal ones in low-order polynomial time (in practice, taking only seconds for basic assembly plans). We demonstrate the effectiveness of our algorithms in planning the assembly of hybrid libraries, under a variety of experimental options and restrictions, including flexibility in the position and amino acid sequence of breakpoints. Our method promises to enable more effective application of site-directed recombination to protein investigation and engineering.
Collapse
Affiliation(s)
- Liz Saftalov
- Department of Computer Science, Purdue University, West Lafayette, Indiana 47907, USA
| | | | | | | |
Collapse
|
27
|
Wrabl JO, Grishin NV. Grouping of amino acid types and extraction of amino acid properties from multiple sequence alignments using variance maximization. Proteins 2006; 61:523-34. [PMID: 16184599 DOI: 10.1002/prot.20648] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Understanding of amino acid type co-occurrence in trusted multiple sequence alignments is a prerequisite for improved sequence alignment and remote homology detection algorithms. Two objective approaches were used to investigate co-occurrence, both based on variance maximization of the weighted residue frequencies in columns taken from a large alignment database. The first approach discretely grouped amino acid types, and the second approach extracted orthogonal properties of amino acids using principal components analysis. The grouping results corresponded to amino acid physical properties such as side chain hydrophobicity, size, or backbone flexibility, and an optimal arrangement of approximately eight groups was observed. However, interpretation of the orthogonal properties was more complex. Although the principal components accounting for the largest variances exhibited modest correlations with hydrophobicity and conservation of glycine, in general principal components did not correspond to physical properties of amino acids. Although not intuitive, these amino acid mathematical properties were demonstrated to be robust and to improve local pairwise alignment accuracy, relative to 20 amino acid frequencies alone, for a simple test case.
Collapse
Affiliation(s)
- James O Wrabl
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas 75390-9050, USA
| | | |
Collapse
|
28
|
Yang CM. On the structural regularity in nucleobases and amino acids and relationship to the origin and evolution of the genetic code. ORIGINS LIFE EVOL B 2005; 35:275-95. [PMID: 16228642 DOI: 10.1007/s11084-005-1078-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2003] [Revised: 02/19/2004] [Accepted: 02/19/2004] [Indexed: 10/25/2022]
Abstract
To explore how chemical structures of both nucleobases and amino acids may have played a role in shaping the genetic code, numbers of sp2 hybrid nitrogen atoms in nucleobases were taken as a determinative measure for empirical stereo-electronic property to analyze the genetic code. Results revealed that amino acid hydropathy correlates strongly with the sp2 nitrogen atom numbers in nucleobases rather than with the overall electronic property such as redox potentials of the bases, reflecting that stereo-electronic property of bases may play a role. In the rearranged code, five simple but stereo-structurally distinctive amino acids (Gly, Pro, Val, Thr and Ala) and their codon quartets form a crossed intersection "core". Secondly, a re-categorization of the amino acids according to their beta-carbon stereochemistry, verified by charge density (at beta-carbon) calculation, results in five groups of stereo-structurally distinctive amino acids, the group leaders of which are Gly, Pro, Val, Thr and Ala, remarkably overlapping the above "core". These two lines of independent observations provide empirical arguments for a contention that a seemingly "frozen" "core" could have formed at a certain evolutionary stage. The possible existence of this codon "core" is in conformity with a previous evolutionary model whereby stereochemical interactions may have shaped the code. Moreover, the genetic code listed in UCGA succession together with this codon "core" has recently facilitated an identification of the unprecedented icosikaioctagon symmetry and bi-pyramidal nature of the genetic code.
Collapse
Affiliation(s)
- Chi Ming Yang
- Neurochemistry and System Chemical Biology, Nankai University, Tian Jin, 300071, China.
| |
Collapse
|
29
|
Di Giulio M. The origin of the genetic code: theories and their relationships, a review. Biosystems 2004; 80:175-84. [PMID: 15823416 DOI: 10.1016/j.biosystems.2004.11.005] [Citation(s) in RCA: 97] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2004] [Revised: 11/12/2004] [Accepted: 11/18/2004] [Indexed: 10/26/2022]
Abstract
A review of the main theories proposed to explain the origin of the genetic code is presented. I analyze arguments and data in favour of different theories proposed to explain the origin of the organization of the genetic code. It is possible to suggest a mechanism that makes compatible the different theories of the origin of the code, even if these are based on a historical or physicochemical determinism and thus appear incompatible by definition. Finally, I discuss the question of why a given number of synonymous codons was attributed to the amino acids in the genetic code.
Collapse
Affiliation(s)
- Massimo Di Giulio
- Institute of Genetics and Biophysics Adriano Buzzati-Traverso, CNR, Naples, Italy
| |
Collapse
|
30
|
|
31
|
Abstract
The current theory of the origin of life by random polymerisation and selection of nucleic acids is challenged by the hypothesis that the primitive enzymatic sites would have been formed by abiotic polymerisation of aminoacids, specifically gathered (by saline, hydrogen, or hydrophobic interactions), around the different substrates. The information contained in these proteinoids would have been transferred to messenger-like RNAs by a mechanism reverse of that of the present protein synthesis, and then to DNA. The interactions between aminoacids and nucleotidic sequences would have been at the origin of the genetic code, as hypothesized by several authors. We propose that the specificity of the bindings would have been enhanced and 'frozen' by ternary associations with specific proteinoids (future aminoacyl tRNA synthetases). The role of chance would have been limited to the supply of the products and to the determination of the conditions of reaction. Thermodynamic considerations (dissipation of the free enthalpy through enzymatic activities) may explain the emergence of the biological systems.
Collapse
Affiliation(s)
- G Berger
- 14 Impasse des Carpeaux, Perigny Sur Yerres, France
| |
Collapse
|
32
|
Abstract
The first information system emerged on the earth as primordial version of the genetic code and genetic texts. The natural appearance of arithmetic power in such a linguistic milieu is theoretically possible and practical for producing information systems of extremely high efficiency. In this case, the arithmetic symbols should be incorporated into an alphabet, i.e. the genetic code. A number is the fundamental arithmetic symbol produced by the system of numeration. If the system of numeration were detected inside the genetic code, it would be natural to expect that its purpose is arithmetic calculation e.g., for the sake of control, safety, and precise alteration of the genetic texts. The nucleons of amino acids and the bases of nucleic acids seem most suitable for embodiments of digits. These assumptions were used for the analyzing the genetic code. The compressed, life-size, and split representation of the Escherichia coli and Euplotes octocarinatus code versions were considered simultaneously. An exact equilibration of the nucleon sums of the amino acid standard blocks and/or side chains was found repeatedly within specified sets of the genetic code. Moreover, the digital notations of the balanced sums acquired, in decimal representation, the unique form 111, 222...., 999. This form is a consequence of the criterion of divisibility by 037. The criterion could simplify some computing mechanism of a cell if any and facilitate its computational procedure. The cooperative symmetry of the genetic code demonstrates that possibly a zero was invented and used by this mechanism. Such organization of the genetic code could be explained by activities of some hypothetical molecular organelles working as natural biocomputers of digital genetic texts. It is well known that if mutation replaces an amino acid, the change of hydrophobicity is generally weak, while that of size is strong. The antisymmetrical correlation between the amino acid size and the degeneracy number is known as well. It is shown that these and some other familiar properties may be a physicochemical effect of arithmetic inside the genetic code. The "frozen accident" model, giving unlimited freedom to the mapping function, could optimally support the appearance of both arithmetic symbols and physicochemical protection inside the genetic code.
Collapse
Affiliation(s)
- Vladimir I shCherbak
- Department of Applied Mathematics, al-Faraby Kazakh National University, 71 al-Faraby Avenue, Almaty 480078, Kazakhstan CIS.
| |
Collapse
|
33
|
Biro JC, Benyó B, Sansom C, Szlávecz A, Fördös G, Micsik T, Benyó Z. A common periodic table of codons and amino acids. Biochem Biophys Res Commun 2003; 306:408-415. [PMID: 12804578 DOI: 10.1016/s0006-291x(03)00974-4] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
A periodic table of codons has been designed where the codons are in regular locations. The table has four fields (16 places in each) one with each of the four nucleotides (A, U, G, C) in the central codon position. Thus, AAA (lysine), UUU (phenylalanine), GGG (glycine), and CCC (proline) were placed into the corners of the fields as the main codons (and amino acids) of the fields. They were connected to each other by six axes. The resulting nucleic acid periodic table showed perfect axial symmetry for codons. The corresponding amino acid table also displaced periodicity regarding the biochemical properties (charge and hydropathy) of the 20 amino acids and the position of the stop signals. The table emphasizes the importance of the central nucleotide in the codons and predicts that purines control the charge while pyrimidines determine the polarity of the amino acids. This prediction was experimentally tested.
Collapse
Affiliation(s)
- J C Biro
- Karolinska Institute, Stockholm, Sweden.
| | | | | | | | | | | | | |
Collapse
|
34
|
Chiusano ML, Alvarez-Valin F, Di Giulio M, D'Onofrio G, Ammirato G, Colonna G, Bernardi G. Second codon positions of genes and the secondary structures of proteins. Relationships and implications for the origin of the genetic code. Gene 2000; 261:63-9. [PMID: 11164038 DOI: 10.1016/s0378-1119(00)00521-7] [Citation(s) in RCA: 64] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The nucleotide frequencies in the second codon positions of genes are remarkably different for the coding regions that correspond to different secondary structures in the encoded proteins, namely, helix, beta-strand and aperiodic structures. Indeed, hydrophobic and hydrophilic amino acids are encoded by codons having U or A, respectively, in their second position. Moreover, the beta-strand structure is strongly hydrophobic, while aperiodic structures contain more hydrophilic amino acids. The relationship between nucleotide frequencies and protein secondary structures is associated not only with the physico-chemical properties of these structures but also with the organisation of the genetic code. In fact, this organisation seems to have evolved so as to preserve the secondary structures of proteins by preventing deleterious amino acid substitutions that could modify the physico-chemical properties required for an optimal structure.
Collapse
Affiliation(s)
- M L Chiusano
- Laboratorio di Evoluzione Molecolare, Stazione Zoologica Anton Dohrn, Villa Comunale, I-80121, Naples, Italy
| | | | | | | | | | | | | |
Collapse
|
35
|
Abstract
The systematics of indices of physico-chemical properties of codons and amino acids across the genetic code are examined. Using a simple numerical labelling scheme for nucleic acid bases, A=(-1,0), C=(0,-1), G=(0,1), U=(1,0), data can be fitted as low order polynomials of the six coordinates in the 64-dimensional codon weight space. The work confirms and extends the recent studies by Siemion et al. (1995. BioSystems 36, 231-238) of the conformational parameters. Fundamental patterns in the data such as codon periodicities, and related harmonics and reflection symmetries, are here associated with the structure of the set of basis monomials chosen for fitting. Results are plotted using the Siemion one-step mutation ring scheme, and variants thereof. The connections between the present work, and recent studies of the genetic code structure using dynamical symmetry algebras, are pointed out.
Collapse
Affiliation(s)
- J D Bashford
- Centre for the Structure of Subatomic Matter, University of Adelaide, Adelaide, SA 5005, Australia
| | | |
Collapse
|
36
|
Abstract
The evolutionary forces that produced the canonical genetic code before the last universal ancestor remain obscure. One hypothesis is that the arrangement of amino acid/codon assignments results from selection to minimize the effects of errors (e.g., mistranslation and mutation) on resulting proteins. If amino acid similarity is measured as polarity, the canonical code does indeed outperform most theoretical alternatives. However, this finding does not hold for other amino acid properties, ignores plausible restrictions on possible code structure, and does not address the naturally occurring nonstandard genetic codes. Finally, other analyses have shown that significantly better code structures are possible. Here, we show that if theoretically possible code structures are limited to reflect plausible biological constraints, and amino acid similarity is quantified using empirical data of substitution frequencies, the canonical code is at or very close to a global optimum for error minimization across plausible parameter space. This result is robust to variation in the methods and assumptions of the analysis. Although significantly better codes do exist under some assumptions, they are extremely rare and thus consistent with reports of an adaptive code: previous analyses which suggest otherwise derive from a misleading metric. However, all extant, naturally occurring, secondarily derived, nonstandard genetic codes do appear less adaptive. The arrangement of amino acid assignments to the codons of the standard genetic code appears to be a direct product of natural selection for a system that minimizes the phenotypic impact of genetic error. Potential criticisms of previous analyses appear to be without substance. That known variants of the standard genetic code appear less adaptive suggests that different evolutionary factors predominated before and after fixation of the canonical code. While the evidence for an adaptive code is clear, the process by which the code achieved this optimization requires further attention.
Collapse
Affiliation(s)
- S J Freeland
- Department of Ecology, Princeton University, University of Bath, Bath, England
| | | | | | | |
Collapse
|
37
|
Sowerby SJ, Stockwell PA, Heckl WM, Petersen GB. Self-programmable, self-assembling two-dimensional genetic matter. ORIGINS LIFE EVOL B 2000; 30:81-99. [PMID: 10836266 DOI: 10.1023/a:1006616725062] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Putative two-dimensional coding systems can be constructed from aqueous solutions of purine and pyrimidine nucleic acid bases evaporated at moderate temperatures on the surfaces of inorganic solids. The resultant structures are monolayers which are formed spontaneously by molecular self-assembly and they have been observed with molecular resolution by scanning tunnelling microscopy (STM). When formed from solutions of a single base, the monolayers of adenine and uracil have crystalline characteristics and the STM images can be interpreted in terms of the geometrical placement of planar arranged molecules that interact laterally by intermolecular hydrogen bonding. When formed from solutions containing a mixture of adenine and uracil, the monolayers have aperiodic structures. Small crystalline domains within these monolayers can be interpreted in terms of the single phase configurations of the molecules and the remaining aperiodic structures can presumably be interpreted, geometrically, in terms of the 21 theoretically possible adenine-adenine, uracil-uracil and adenine-uracil hydrogen bonding interactions. We propose that combinatorial arrangements of planar arranged purine and pyrimidine bases could provide the necessary complexity to act as a primitive genetic mechanism and may have relevance to the origin of life.
Collapse
Affiliation(s)
- S J Sowerby
- Department of Biochemistry and Centre for Gene Research, University of Otago, Dunedin, New Zealand.
| | | | | | | |
Collapse
|
38
|
Abstract
Comparative path lengths in amino acid biosynthesis and other molecular indicators of the timing of codon assignment were examined to reconstruct the main stages of code evolution. The codon tree obtained was rooted in the 4 N-fixing amino acids (Asp, Glu, Asn, Gln) and 16 triplets of the NAN set. This small, locally phased (commaless) code evidently arose from ambiguous translation on a poly(A) collector strand, in a surface reaction network. Copolymerisation of these amino acids yields polyanionic peptide chains, which could anchor uncharged amide residues to a positively charged mineral surface. From RNA virus structure and replication in vitro, the first genes seemed to be RNA segments spliced into tRNA. Expansion of the code reduced the risk of mutation to an unreadable codon. This step was conditional on initiation at the 5'-codon of a translated sequence. Incorporation of increasingly hydrophobic amino acids accompanied expansion. As codons of the NUN set were assigned most slowly, they received the most nonpolar amino acids. The origin of ferredoxin and Gln synthetase was traced to mid-expansion phase. Surface metabolism ceased by the end of code expansion, as cells bounded by a proteo-phospholipid membrane, with a protoATPase, had emerged. Incorporation of positively charged and aromatic amino acids followed. They entered the post-expansion code by codon capture. Synthesis of efficient enzymes with acid-base catalysis was then possible. Both types of aminoacyl-tRNA synthetases were attributed to this stage. tRNA sequence diversity and error rates in RNA replication indicate the code evolved within 20 million yr in the preIsuan era. These findings on the genetic code provide empirical evidence, from a contemporaneous source, that a surface reaction network, centred on C-fixing autocatalytic cycles, rapidly led to cellular life on Earth.
Collapse
Affiliation(s)
- B K Davis
- Research Foundation of Southern California Inc., La Jolla 92037, USA
| |
Collapse
|
39
|
Abstract
We propose the existence of a relationship of stereochemical complementarity between gene sequences that code for interacting components: nucleic acid-nucleic acid, protein-protein and protein-nucleic acid. Such a relationship would impose evolutionary constraints on the DNA sequences themselves, thus retaining these sequences and governing the direction of the evolutionary process. Therefore, we propose that prebiotic, template-directed autocatalytic synthesis of mutally cognate peptides and polynucleotides resulted in their amplification and evolutionary conservation in contemporary prokaryotic and eukaryotic organisms as a genetic regulatory apparatus. If this proposal is correct, then the relationships between the sequences in DNA coding for these interactions constitute a life code of which the genetic code is only one aspect of the many related interactions encoded in DNA.
Collapse
Affiliation(s)
- L F Harris
- David F. Hickok Memorial Cancer Research Laboratory, Abbott Northwestern Hospital, Minneapolis, MN 55407, USA.
| | | | | |
Collapse
|
40
|
Matter H. A validation study of molecular descriptors for the rational design of peptide libraries. THE JOURNAL OF PEPTIDE RESEARCH : OFFICIAL JOURNAL OF THE AMERICAN PEPTIDE SOCIETY 1998; 52:305-14. [PMID: 9832309 DOI: 10.1111/j.1399-3011.1998.tb01245.x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Important molecular descriptors used for establishing quantitative structure-activity relationships are investigated to classify similar versus dissimilar peptides. When searching new lead structures, synthesizing and testing compounds which are too similar wastes time and resources. In contrast, any lead optimization program requires the investigation of similar compounds to that lead. Thus, it is important to maximize or minimize the structural diversity of peptides to design useful compound libraries for lead finding or lead refinement projects. If a molecular descriptor is a useful measure of similarity for the design of peptide libraries, small differences in this descriptor for a pair of molecules should only translate into small biological differences. Using this paradigm as a basis for descriptor validation, it was possible to rank different molecular descriptors. Those physicochemical descriptors are 2D fingerprints and five experimentally or theoretically derived principal property scales. Some theoretically derived metrics are obtained by computing interaction energies or similarity indices on predefined 3D grid points using canonical conformations for individual amino acids. The resulting 3D data matrices are analyzed using a principal component analysis leading to three principal properties for CoMFA (Comparative Molecular Field Analysis) or CoMSIA (Comparative Molecular Similarity Index Analysis) derived molecular fields. The descriptor validation results reveal the applicability of design tools on peptide data sets. Experimentally derived descriptors, in general, are more acceptable than computationally derived metrics, while the latter provide a statistically valid alternative to characterize novel building blocks. The CoMSIA metrics perform slightly better than the CoMFA-based principal properties, while GRID-based descriptors are always less acceptable.
Collapse
Affiliation(s)
- H Matter
- Hoechst Marion Roussel AG, Computational Chemistry, Core Research Functions, Frankfurt am Main, Germany.
| |
Collapse
|
41
|
Di Giulio M. The beta-sheets of proteins, the biosynthetic relationships between amino acids, and the origin of the genetic code. ORIGINS LIFE EVOL B 1996; 26:589-609. [PMID: 9008882 DOI: 10.1007/bf01808222] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Two forces are generally hypothesised as being responsible for conditioning the origin of the organization of the genetic code: the physicochemical properties of amino acids and their biosynthetic relationships (relationships between precursor and product amino acids). If we assume that the biosynthetic relationships between amino acids were fundamental in defining the genetic code, then it is reasonable to expect that the distribution of physicochemical properties among the amino acids in precursor-product relationships cannot be random but must, rather, be affected by some selective constraints imposed by the structure of primitive proteins. Analysis shows that measurements representing the 'size' of amino acids, e.g. bulkiness, are specifically associated to the pairs of amino acids in precurso-product relationships. However, the size of amino acids cannot have been selected per se but, rather, because it reflects the beta-sheets of proteins which are, therefore, identified as the main adaptive theme promoting the origin of genetic code organization. Whereas there are no traces of the alpha-helix in the genetic code table. The above considerations make it necessary to re-examine the relationship linking the hydrophilicity of the dinucleoside monophosphates of anticodons and the polarity and bulkiness of amino acids. It can be concluded that this relationship seems to be meaningful only between the hydrophilicity of anticodons and the polarity of amino acids. The latter relationship is supposed to have been operative on hairpin structures, ancestors of the tRNA molecule. Moreover, it is on these very structures that the biosynthetic links between precursor and product amino acids might have been achieved, and the interaction between the hydrophilicity of anticodons and the polarity of amino acids might have had a role in the concession of codons (anticodons) from precursors to products.
Collapse
Affiliation(s)
- M Di Giulio
- International Institute of Genetics and Biophysics, CNR, Napoli, Italy
| |
Collapse
|
42
|
Di Giulio M. The phylogeny of tRNAs seems to confirm the predictions of the coevolution theory of the origin of the genetic code. ORIGINS LIFE EVOL B 1995; 25:549-64. [PMID: 7494635 DOI: 10.1007/bf01582024] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
An extensive analysis of the evolutionary relationships existing between transfer RNAs, performed using parsimony algorithms, is presented. After building up an estimate of the tRNA ancestral sequences, these sequences are then compared using certain methods. The results seem to suggest that the coevolution hypothesis (Wong, J.T., 1975, Proc. Natl. Acad. Sci. USA 72, 1909-1912) that sees the genetic code as a map of the biosynthetic relationships between amino acids is further supported by these results, as compared to the hypotheses that see the physicochemical properties of amino acids as the main adaptative theme that led to the structuring of the genetic code.
Collapse
Affiliation(s)
- M Di Giulio
- International Institute of Genetics and Biophysics, CNR, Napoli, Italy
| |
Collapse
|
43
|
The regularities of the changes of amino acid physico-chemical properties within the genetic code. Amino Acids 1995; 8:1-13. [DOI: 10.1007/bf00806539] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/1993] [Accepted: 01/25/1994] [Indexed: 10/26/2022]
|
44
|
Abstract
The evolutionary relationships between transfer RNA (tRNA) molecules are analyzed by parsimony algorithms. The position of the topologies expected on the basis of the hypotheses made to explain the origin of the genetic code, on the frequency distribution of all the possible tree topologies of the evolutionary relationships between tRNAs seems to lead to the following conclusion: The hypothesis (Wong, J. T., Proc. Natl. Acad. Sci. USA, 1975, 72: 1909-1912) that sees the genetic code as a map of the biosynthetic relationships between amino acids seems to occupy a statistically significant position on these frequency distributions, thus reflecting a significant part of the tRNA phylogeny.
Collapse
Affiliation(s)
- M Di Giulio
- International Institute of Genetics and Biophysics, CNR, Naples, Italy
| |
Collapse
|
45
|
Szathmáry E. Coding coenzyme handles: a hypothesis for the origin of the genetic code. Proc Natl Acad Sci U S A 1993; 90:9916-20. [PMID: 8234335 PMCID: PMC47683 DOI: 10.1073/pnas.90.21.9916] [Citation(s) in RCA: 87] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
The coding coenzyme handle hypothesis suggests that useful coding preceded translation. Early adapters, the ancestors of present-day anticodons, were charged with amino acids acting as coenzymes of ribozymes in a metabolically complex RNA world. The ancestral aminoacyl-adapter synthetases could have been similar to present-day self-splicing tRNA introns. A codon-anticodon-discriminator base complex embedded in these synthetases could have played an important role in amino acid recognition. Extension of the genetic code proceeded through the take-over of nonsense codons by novel amino acids, related to already coded ones either through precursor-product relationship or physicochemical similarity. The hypothesis is open for experimental tests.
Collapse
Affiliation(s)
- E Szathmáry
- Institute for Advanced Study Berlin, Germany
| |
Collapse
|
46
|
LaBean TH, Kauffman SA. Design of synthetic gene libraries encoding random sequence proteins with desired ensemble characteristics. Protein Sci 1993; 2:1249-54. [PMID: 8401210 PMCID: PMC2142438 DOI: 10.1002/pro.5560020807] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Libraries of random sequence polypeptides are useful as sources of unevolved proteins, novel ligands, and potential lead compounds for the development of vaccines and therapeutics. The expression of small random peptides has been achieved previously using DNA synthesized with equimolar mixtures of nucleotides. For many potential uses of random polypeptide libraries, concerns such as avoiding termination codons and matching target amino acid compositions make more complex designs necessary. In this study, three mixtures of nucleotides, corresponding to the three positions in the codon, were designed such that semirandom DNA synthesized by repeated cycles of the three mixtures created an open reading frame encoding random sequence polypeptides with desired ensemble characteristics. Two methods were used to design the nucleotide mixtures: the manual use of a spreadsheet and a refining grid search algorithm. Using design targets of less than or equal to 1% stop codons and an amino acid composition based on the average ratios observed in natural, globular proteins, the search methods yielded similar nucleotide ratios, Semirandom DNA, synthesized with a designed, three-residue repeat pattern, can encode libraries of very high diversity and represents an important tool for the construction of random polypeptide libraries.
Collapse
Affiliation(s)
- T H LaBean
- Department of Biochemistry and Biophysics, University of Pennsylvania, Philadelphia 19104
| | | |
Collapse
|
47
|
Harris LF, Sullivan MR, Hickok DF. Conservation of genetic information: a code for site-specific DNA recognition. Proc Natl Acad Sci U S A 1993; 90:5534-8. [PMID: 8516297 PMCID: PMC46755 DOI: 10.1073/pnas.90.12.5534] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open
Abstract
We present findings of genetic information conservation between the glucocorticoid response element (GRE) DNA and the cDNA encoding the glucocorticoid receptor (GR) DNA-binding domain (DBD). The regions of nucleotide sub-sequence similarity to the GRE in the GR DBD occur specifically at nucleotide sequences on the ends of exons 3,4, and 5 at their splice junction sites. These sequences encode the DNA recognition helix on exon 3, a beta-strand on exon 4, and a putative alpha-helix on exon 5, respectively. The nucleotide sequence of exon 5 that encodes the putative alpha-helix located on the carboxyl terminus of the GR DBD shares sequence similarity with the flanking nucleotide regions of the GRE. We generated a computer model of the GR DBD using atomic coordinates derived from nuclear magnetic resonance spectroscopy to which we attached the exon 5-encoded putative alpha-helix. We docked this GR DBD structure at the 39-base-pair nucleotide sequence containing the GRE binding site and flanking nucleotides, which contained conserved genetic information. We observed that amino acids of the DNA recognition helix, the beta-strand, and the putative alpha-helix are spatially aligned with trinucleotides identical to their cognate codons within the GRE and its flanking nucleotides.
Collapse
Affiliation(s)
- L F Harris
- Abbott Northwestern Hospital Cancer Research Laboratory, Minneapolis, MN 55407-3799
| | | | | |
Collapse
|
48
|
Lacey JC, Wickramasinghe NS, Cook GW. Experimental studies on the origin of the genetic code and the process of protein synthesis: a review update. ORIGINS LIFE EVOL B 1992; 22:243-75. [PMID: 1454353 DOI: 10.1007/bf01810856] [Citation(s) in RCA: 43] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
This article is an update of our earlier review (Lacey and Mullins, 1983) in this journal on the origin of the genetic code and the process of protein synthesis. It is our intent to discuss only experimental evidence published since then although there is the necessity to mention the old enough to place the new in context. We do not include theoretical nor hypothetical treatments of the code or protein synthesis. Relevant data regarding the evolution of tRNAs and the recognition of tRNAs by aminoacyl-tRNA-synthetases are discussed. Our present belief is that the code arose based on a core of early assignments which were made on a physico-chemical and anticodonic basis and this was expanded with new assignments later. These late assignments do not necessarily show an amino acid-anticodon relatedness. In spite of the fact that most data suggest a code origin based on amino acid-anticodon relationships, some new data suggesting preferential binding of Arg to its codons are discussed. While information regarding coding is not increasing very rapidly, information regarding the basic chemistry of the process of protein synthesis has increased significantly, principally relating to aminoacylation of mono- and polyribonucleotides. Included in those studies are several which show stereoselective reactions of L-amino acids with nucleotides having D-sugars. Hydrophobic interactions definitely play a role in the preferences which have been observed.
Collapse
Affiliation(s)
- J C Lacey
- Department of Biochemistry, University of Alabama, Birminghanm 35294
| | | | | |
Collapse
|
49
|
Di Giulio M. The evolution of aminoacyl-tRNA synthetases, the biosynthetic pathways of amino acids and the genetic code. ORIGINS LIFE EVOL B 1992; 22:309-19. [PMID: 1454354 DOI: 10.1007/bf01810859] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
In this paper the partition metric is used to compare binary trees deriving from (i) the study of the evolutionary relationships between aminoacyl-tRNA synthetases, (ii) the physicochemical properties of amino acids and (iii) the biosynthetic relationships between amino acids. If the tree defining the evolutionary relationships between aminoacyl-tRNA synthetases is assumed to be a manifestation of the mechanism that originated the organization of the genetic code, then the results appear to indicate the following: the hypothesis that regards the genetic code as a map of the biosynthetic relationships between amino acids seems to explain the organization of the genetic code, at least as plausibly as the hypotheses that consider the physicochemical properties of amino acids as the main adaptive theme that lead to the structuring of the code.
Collapse
Affiliation(s)
- M Di Giulio
- International Institute of Genetics and Biophysics, CNR, Naples, Italy
| |
Collapse
|
50
|
Arkin AP, Youvan DC. An algorithm for protein engineering: simulations of recursive ensemble mutagenesis. Proc Natl Acad Sci U S A 1992; 89:7811-5. [PMID: 1502200 PMCID: PMC49801 DOI: 10.1073/pnas.89.16.7811] [Citation(s) in RCA: 24] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
An algorithm for protein engineering, termed recursive ensemble mutagenesis, has been developed to produce diverse populations of phenotypically related mutants whose members differ in amino acid sequence. This method uses a feedback mechanism to control successive rounds of combinatorial cassette mutagenesis. Starting from partially randomized "wild-type" DNA sequences, a highly parallel search of sequence space for peptides fitting an experimenter's criteria is performed. Each iteration uses information gained from the previous rounds to search the space more efficiently. Simulations of the technique indicate that, under a variety of conditions, the algorithm can rapidly produce a diverse population of proteins fitting specific criteria. In the experimental analog, genetic selection or screening applied during recursive ensemble mutagenesis should force the evolution of an ensemble of mutants to a targeted cluster of related phenotypes.
Collapse
Affiliation(s)
- A P Arkin
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge 02139
| | | |
Collapse
|