1
|
Štambuk N, Fimmel E, Konjevoda P, Brčić-Kostić K, Gračanin A, Saleh H. Novel amino acid distance matrices based on conductance measure. Biosystems 2024; 246:105355. [PMID: 39424124 DOI: 10.1016/j.biosystems.2024.105355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2024] [Revised: 09/20/2024] [Accepted: 10/15/2024] [Indexed: 10/21/2024]
Abstract
Ancestral relationships among biological species are often represented and analyzed by means of phylogenetic trees. Substitution and distance matrices are two main types of matrices that are used in phylogeny analyses. Substitution matrices describe a frequency change of amino acids in nucleotide or protein sequence over time, while distance matrices estimate phylogeny using a matrix of pairwise distances based on a particular code or analytical concept. Recent investigation by Elena Fimmel and coworkers (Life 11:1338, 2021) showed that: 1. the robustness of a genetic code against point mutations can be described using the conductance measure, and 2. all possible point mutations of the genetic code can be represented as a weighted graph with weights that correspond to the probabilities of these mutations. In this article, we constructed and tested three novel distance matrices based on conductance measure, that take into account the point mutation robustness of the Standard Genetic Code (SGC). These distance matrices are based on maximum (CMAX), average (CAVG), and minimum (CMIN) conductance-optimized distances between codons coding for individual amino acids. The performance of those distance matrices was tested on a dataset of RecA proteins in Bacteria, Archaea (RadA homolog) and Eukarya (Rad51 homolog). RecA protein and its functional homologs were selected for this investigation since they are essential for the repair and maintenance of DNA, and consequently well conserved and present in all domains of life. PAM250 and BLOSUM62 matrices were usually used as a standard for distance matrix testing. PAM250 and BLOSUM62 substitution matrices specified accurately three biological domains of life according to Carl Woese and George Fox (Proc Natl Acad Sci U S A 74:5088, 1977). An identical result was obtained using three novel distance matrices (CMIN, CMAX, CAVG). This result supports the applicability of novel distance matrices based on the conductance method and suggests that further investigations based on this approach are justified.
Collapse
Affiliation(s)
- Nikola Štambuk
- Centre for Nuclear Magnetic Resonance, Ruđer Bošković Institute, Bijenička cesta 54, HR-10000 Zagreb, Croatia.
| | - Elena Fimmel
- Center for Algorithmic and Mathematical Methods in Medicine, Biology, and Biotechnology, Mannheim University of Applied Sciences, Paul Wittsack Str. 10, 68163 Mannheim, Germany.
| | - Paško Konjevoda
- Laboratory for Epigenomics, Division of Molecular Medicine, Ruđer Bošković Institute, Bijenička cesta 54, HR-10000 Zagreb, Croatia
| | - Krunoslav Brčić-Kostić
- Laboratory of Evolutionary Genetics, Division of Molecular Biology, Ruđer Bošković Institute, Bijenička cesta 54, HR-10000 Zagreb, Croatia
| | - Antonija Gračanin
- School of Medicine, University of Zagreb, Šalata 3, HR-10000 Zagreb, Croatia
| | - Hadi Saleh
- Mannheim University of Applied Sciences, Paul Wittsack Str. 10, 68163 Mannheim, Germany
| |
Collapse
|
2
|
Di Giulio M. Theories of the origin of the genetic code: Strong corroboration for the coevolution theory. Biosystems 2024; 239:105217. [PMID: 38663520 DOI: 10.1016/j.biosystems.2024.105217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 04/16/2024] [Accepted: 04/18/2024] [Indexed: 04/29/2024]
Abstract
I analyzed all the theories and models of the origin of the genetic code, and over the years, I have considered the main suggestions that could explain this origin. The conclusion of this analysis is that the coevolution theory of the origin of the genetic code is the theory that best captures the majority of observations concerning the organization of the genetic code. In other words, the biosynthetic relationships between amino acids would have heavily influenced the origin of the organization of the genetic code, as supported by the coevolution theory. Instead, the presence in the genetic code of physicochemical properties of amino acids, which have also been linked to the physicochemical properties of anticodons or codons or bases by stereochemical and physicochemical theories, would simply be the result of natural selection. More explicitly, I maintain that these correlations between codons, anticodons or bases and amino acids are in fact the result not of a real correlation between amino acids and codons, for example, but are only the effect of the intervention of natural selection. Specifically, in the genetic code table we expect, for example, that the most similar codons - that is, those that differ by only one base - will have more similar physicochemical properties. Therefore, the 64 codons of the genetic code table ordered in a certain way would also represent an ordering of some of their physicochemical properties. Now, a study aimed at clarifying which physicochemical property of amino acids has influenced the allocation of amino acids in the genetic code has established that the partition energy of amino acids has played a role decisive in this. Indeed, under some conditions, the genetic code was found to be approximately 98% optimized on its columns. In this same work, it was shown that this was most likely the result of the action of natural selection. If natural selection had truly allocated the amino acids in the genetic code in such a way that similar amino acids also have similar codons - this, not through a mechanism of physicochemical interaction between, for example, codons and amino acids - then it might turn out that even different physicochemical properties of codons (or anticodons or bases) show some correlation with the physicochemical properties of amino acids, simply because the partition energy of amino acids is correlated with other physicochemical properties of amino acids. It is very likely that this would inevitably lead to a correlation between codons (or anticodons or bases) and amino acids. In other words, since the codons (anticodons or bases) are ordered in the genetic code, that is to say, some of their physicochemical properties should also be ordered by a similar order, and given that the amino acids would also appear to have been ordered in the genetic code by selection natural, then it should inevitably turn out that there is a correlation between, for example, the hydrophobicity of anticodons and that of amino acids. Instead, the intervention of natural selection in organizing the genetic code would appear to be highly compatible with the main mechanism of structuring the genetic code as supported by the coevolution theory. This would make the coevolution theory the only plausible explanation for the origin of the genetic code.
Collapse
Affiliation(s)
- Massimo Di Giulio
- The Ionian School, Early Evolution of Life Department, Genetic Code and tRNA Origin Laboratory, Via Roma 19, 67030, Alfedena, L'Aquila, Italy.
| |
Collapse
|
3
|
Štambuk N, Konjevoda P, Štambuk A. How ambiguity codes specify molecular descriptors and information flow in Code Biology. Biosystems 2023; 233:105034. [PMID: 37739308 DOI: 10.1016/j.biosystems.2023.105034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 09/12/2023] [Accepted: 09/12/2023] [Indexed: 09/24/2023]
Abstract
The article presents IUPAC ambiguity codes for incomplete nucleic acid specification, and their use in Code Biology. It is shown how to use this nomenclature in order to extract accurate information on different properties of the biological systems. We investigated the use of ambiguity codes, as mathematical and logical operators and truth table elements, for the encoding of amino acids by means of the Standard Genetic Code. It is explained how to use ambiguity codes and truth functions in order to obtain accurate information on different properties of the biological systems. Nucleotide ambiguity codes could be applied to: 1. encoding descriptive information of nucleotides, amino acids and proteins (e.g., of polarity, relative solvent accessibility, atom depth, etc.), and 2. system modelling ranging from standard bioinformatics tools to classic evolutionary models (i.e. from Miyazawa-Jernigan statistical potential to Kimura three-substitution-type model, respectively). It is shown that the algorithms based on IUPAC ambiguity codes, Boolean functions and truth table, Probabilistic Square of Opposition/Semiotic Square and Klein 4-groups-could be used for the bioinformatics analyses and Relational data modelling in natural science. Underlying mathematical, logical and semiotic concepts of interest are presented and addressed.
Collapse
Affiliation(s)
- Nikola Štambuk
- Centre for Nuclear Magnetic Resonance, Ruđer Bošković Institute, Bijenička cesta 54, HR-10000, Zagreb, Croatia.
| | - Paško Konjevoda
- Laboratory for Epigenomics, Division of Molecular Medicine, Ruđer Bošković Institute, Bijenička cesta 54, HR-10000, Zagreb, Croatia.
| | - Albert Štambuk
- Faculty of Kinesiology, University of Zagreb, Horvaćanski zavoj 15, HR-10000 Zagreb, Croatia
| |
Collapse
|
4
|
Štambuk N, Konjevoda P, Brčić-Kostić K, Baković J, Štambuk A. New algorithm for the analysis of nucleotide and amino acid evolutionary relationships based on Klein four-group. Biosystems 2023; 233:105030. [PMID: 37717902 DOI: 10.1016/j.biosystems.2023.105030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 09/10/2023] [Accepted: 09/10/2023] [Indexed: 09/19/2023]
Abstract
Phylogenetics is the study of ancestral relationships among biological species. Such sequence analyses are often represented as phylogenetic trees. The branching pattern of each tree and its topology reflect the evolutionary relatedness between analyzed sequences. We present a Klein four-group algorithm (K4A) for the evolutionary analysis of nucleotide and amino acid sequences. Klein four-group set of operators consists of: identity e (U), and three elements-a = transition (C), b = transversion (G) and c = transition-transversion or complementarity (A). We generated Klein four-group based distance matrices of: 1. Cayley table (CK4), 2. Table rows (K4R), 3. Table columns (K4C), and 4. Euclidean 2D distance (K4E). The performance of the matrices was tested on a dataset of RecA proteins in bacteria, eukaryotes (Rad51 homolog) and archaea (RadA homolog). RecA and its functional homologs are found in all species, and are essential for the repair and maintenance of DNA. Consequently, they represent a good model for the study of evolutionary relationship of protein and nucleotide sequences. The ancestral relationship between the sequences was correctly classified by all K4A matrices concerning general topology. All distance matrices exhibited small variations among species, and overall results of tree classification were in agreement with the general patterns obtained by standard BLOSUM and PAM substitution matrices. During the evolution of a code there is a phase of optimization of system rules, the ambiguity of a code is eliminated, and the system starts producing specific components. Klein four-group algorithm is consistent with the concept of ambiguity reduction. It also enables the use of different genetic code table variants optimized for particular transitions in evolution based on biological specificity.
Collapse
Affiliation(s)
- Nikola Štambuk
- Centre for Nuclear Magnetic Resonance, Ruđer Bošković Institute, Bijenička cesta 54, HR-10000, Zagreb, Croatia.
| | - Paško Konjevoda
- Laboratory for Epigenomics, Division of Molecular Medicine, Ruđer Bošković Institute, Bijenička cesta 54, HR-10000, Zagreb, Croatia.
| | - Krunoslav Brčić-Kostić
- Laboratory of Evolutionary Genetics, Division of Molecular Biology, Ruđer Bošković Institute, Bijenička cesta 54, HR-10000, Zagreb, Croatia
| | - Josip Baković
- University Hospital Dubrava, Department of Surgery, Avenija Gojka Šuška 6, HR-10000, Zagreb, Croatia
| | - Albert Štambuk
- Faculty of Kinesiology, University of Zagreb, Horvaćanski zavoj 15, HR-10000 Zagreb, Croatia
| |
Collapse
|
5
|
Caldararo F, Di Giulio M. The genetic code is very close to a global optimum in a model of its origin taking into account both the partition energy of amino acids and their biosynthetic relationships. Biosystems 2022; 214:104613. [DOI: 10.1016/j.biosystems.2022.104613] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2021] [Revised: 01/16/2022] [Accepted: 01/17/2022] [Indexed: 01/23/2023]
|
6
|
The Mutational Robustness of the Genetic Code and Codon Usage in Environmental Context: A Non-Extremophilic Preference? Life (Basel) 2021; 11:life11080773. [PMID: 34440517 PMCID: PMC8398314 DOI: 10.3390/life11080773] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Revised: 07/23/2021] [Accepted: 07/28/2021] [Indexed: 12/12/2022] Open
Abstract
The genetic code was evolved, to some extent, to minimize the effects of mutations. The effects of mutations depend on the amino acid repertoire, the structure of the genetic code and frequencies of amino acids in proteomes. The amino acid compositions of proteins and corresponding codon usages are still under selection, which allows us to ask what kind of environment the standard genetic code is adapted to. Using simple computational models and comprehensive datasets comprising genomic and environmental data from all three domains of Life, we estimate the expected severity of non-synonymous genomic mutations in proteins, measured by the change in amino acid physicochemical properties. We show that the fidelity in these physicochemical properties is expected to deteriorate with extremophilic codon usages, especially in thermophiles. These findings suggest that the genetic code performs better under non-extremophilic conditions, which not only explains the low substitution rates encountered in halophiles and thermophiles but the revealed relationship between the genetic code and habitat allows us to ponder on earlier phases in the history of Life.
Collapse
|
7
|
The Ancient Operational Code is Embedded in the Amino Acid Substitution Matrix and aaRS Phylogenies. J Mol Evol 2019; 88:136-150. [PMID: 31781936 DOI: 10.1007/s00239-019-09918-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Accepted: 11/14/2019] [Indexed: 10/25/2022]
Abstract
The underlying structure of the canonical amino acid substitution matrix (aaSM) is examined by considering stepwise improvements in the differential recognition of amino acids according to their chemical properties during the branching history of the two aminoacyl-tRNA synthetase (aaRS) superfamilies. The evolutionary expansion of the genetic code is described by a simple parameterization of the aaSM, in which (i) the number of distinguishable amino acid types, (ii) the matrix dimension and (iii) the number of parameters, each increases by one for each bifurcation in an aaRS phylogeny. Parameterized matrices corresponding to trees in which the size of an amino acid sidechain is the only discernible property behind its categorization as a substrate, exclusively for a Class I or II aaRS, provide a significantly better fit to empirically determined aaSM than trees with random bifurcation patterns. A second split between polar and nonpolar amino acids in each Class effects a vastly greater further improvement. The earliest Class-separated epochs in the phylogenies of the aaRS reflect these enzymes' capability to distinguish tRNAs through the recognition of acceptor stem identity elements via the minor (Class I) and major (Class II) helical grooves, which is how the ancient operational code functioned. The advent of tRNA recognition using the anticodon loop supports the evolution of the optimal map of amino acid chemistry found in the later genetic code, an essentially digital categorization, in which polarity is the major functional property, compensating for the unrefined, haphazard differentiation of amino acids achieved by the operational code.
Collapse
|
8
|
Many alternative and theoretical genetic codes are more robust to amino acid replacements than the standard genetic code. J Theor Biol 2019; 464:21-32. [DOI: 10.1016/j.jtbi.2018.12.030] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2018] [Revised: 12/17/2018] [Accepted: 12/19/2018] [Indexed: 02/07/2023]
|
9
|
Wnętrzak M, Błażej P, Mackiewicz D, Mackiewicz P. The optimality of the standard genetic code assessed by an eight-objective evolutionary algorithm. BMC Evol Biol 2018; 18:192. [PMID: 30545289 PMCID: PMC6293558 DOI: 10.1186/s12862-018-1304-0] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Accepted: 11/22/2018] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND The standard genetic code (SGC) is a unique set of rules which assign amino acids to codons. Similar amino acids tend to have similar codons indicating that the code evolved to minimize the costs of amino acid replacements in proteins, caused by mutations or translational errors. However, if such optimization in fact occurred, many different properties of amino acids must have been taken into account during the code evolution. Therefore, this problem can be reformulated as a multi-objective optimization task, in which the selection constraints are represented by measures based on various amino acid properties. RESULTS To study the optimality of the SGC we applied a multi-objective evolutionary algorithm and we used the representatives of eight clusters, which grouped over 500 indices describing various physicochemical properties of amino acids. Thanks to that we avoided an arbitrary choice of amino acid features as optimization criteria. As a consequence, we were able to conduct a more general study on the properties of the SGC than the ones presented so far in other papers on this topic. We considered two models of the genetic code, one preserving the characteristic codon blocks structure of the SGC and the other without this restriction. The results revealed that the SGC could be significantly improved in terms of error minimization, hereby it is not fully optimized. Its structure differs significantly from the structure of the codes optimized to minimize the costs of amino acid replacements. On the other hand, using newly defined quality measures that placed the SGC in the global space of theoretical genetic codes, we showed that the SGC is definitely closer to the codes that minimize the costs of amino acids replacements than those maximizing them. CONCLUSIONS The standard genetic code represents most likely only partially optimized systems, which emerged under the influence of many different factors. Our findings can be useful to researchers involved in modifying the genetic code of the living organisms and designing artificial ones.
Collapse
Affiliation(s)
- Małgorzata Wnętrzak
- Department of Genomics, Faculty of Biotechnology, University of Wrocław, ul. Joliot-Curie 14a, 50-383, Wrocław, Poland
| | - Paweł Błażej
- Department of Genomics, Faculty of Biotechnology, University of Wrocław, ul. Joliot-Curie 14a, 50-383, Wrocław, Poland
| | - Dorota Mackiewicz
- Department of Genomics, Faculty of Biotechnology, University of Wrocław, ul. Joliot-Curie 14a, 50-383, Wrocław, Poland
| | - Paweł Mackiewicz
- Department of Genomics, Faculty of Biotechnology, University of Wrocław, ul. Joliot-Curie 14a, 50-383, Wrocław, Poland.
| |
Collapse
|
10
|
Di Giulio M. A discriminative test among the different theories proposed to explain the origin of the genetic code: The coevolution theory finds additional support. Biosystems 2018; 169-170:1-4. [DOI: 10.1016/j.biosystems.2018.05.002] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2018] [Revised: 04/26/2018] [Accepted: 05/07/2018] [Indexed: 11/29/2022]
|
11
|
Di Giulio M. The aminoacyl-tRNA synthetases had only a marginal role in the origin of the organization of the genetic code: Evidence in favor of the coevolution theory. J Theor Biol 2017; 432:14-24. [DOI: 10.1016/j.jtbi.2017.08.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2017] [Revised: 08/01/2017] [Accepted: 08/03/2017] [Indexed: 10/19/2022]
|
12
|
Kuruoglu EE, Arndt PF. The information capacity of the genetic code: Is the natural code optimal? J Theor Biol 2017; 419:227-237. [PMID: 28163008 DOI: 10.1016/j.jtbi.2017.01.046] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2016] [Revised: 01/25/2017] [Accepted: 01/31/2017] [Indexed: 10/20/2022]
Abstract
We envision the molecular evolution process as an information transfer process and provide a quantitative measure for information preservation in terms of the channel capacity according to the channel coding theorem of Shannon. We calculate Information capacities of DNA on the nucleotide (for non-coding DNA) and the amino acid (for coding DNA) level using various substitution models. We extend our results on coding DNA to a discussion about the optimality of the natural codon-amino acid code. We provide the results of an adaptive search algorithm in the code domain and demonstrate the existence of a large number of genetic codes with higher information capacity. Our results support the hypothesis of an ancient extension from a 2-nucleotide codon to the current 3-nucleotide codon code to encode the various amino acids.
Collapse
Affiliation(s)
- Ercan E Kuruoglu
- Institute of Information Science and Technologies, "A. Faedo", CNR, via G Moruzzi 1, 56124 Pisa, Italy.
| | - Peter F Arndt
- Max Planck Institute for Molecular Genetics, Department of Computational Molecular Biology, Ihnestr. 63/73, 14195 Berlin, Germany
| |
Collapse
|
13
|
Nemzer LR. A binary representation of the genetic code. Biosystems 2017; 155:10-19. [PMID: 28300609 DOI: 10.1016/j.biosystems.2017.03.001] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2016] [Revised: 03/03/2017] [Accepted: 03/06/2017] [Indexed: 12/23/2022]
Abstract
This article introduces a novel binary representation of the canonical genetic code based on both the structural similarities of the nucleotides, as well as the physicochemical properties of the encoded amino acids. Each of the four mRNA bases is assigned a unique 2-bit identifier, so that the 64 triplet codons are each indexed by a 6-bit label. The ordering of the bits reflects the hierarchical organization manifested by the DNA replication/repair and tRNA translation systems. In this system, transition and transversion mutations are naturally expressed as binary operations, and the severities of the different point mutations can be analyzed. Using a principal component analysis, it is shown that the physicochemical properties of amino acids related to protein folding also correlate with certain bit positions of their respective labels. Thus, the likelihood for a point mutation to be conservative, and less likely to cause a change in protein functionality, can be estimated.
Collapse
Affiliation(s)
- Louis R Nemzer
- Department of Chemistry and Physics, Halmos College of Natural Sciences and Oceanography, Nova Southeastern University, Davie, FL, USA.
| |
Collapse
|
14
|
Some pungent arguments against the physico-chemical theories of the origin of the genetic code and corroborating the coevolution theory. J Theor Biol 2017; 414:1-4. [DOI: 10.1016/j.jtbi.2016.11.014] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2016] [Revised: 10/26/2016] [Accepted: 11/16/2016] [Indexed: 10/20/2022]
|
15
|
Nemzer LR. Shannon information entropy in the canonical genetic code. J Theor Biol 2017; 415:158-170. [DOI: 10.1016/j.jtbi.2016.12.010] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2016] [Revised: 11/30/2016] [Accepted: 12/12/2016] [Indexed: 11/15/2022]
|
16
|
Massey SE. The neutral emergence of error minimized genetic codes superior to the standard genetic code. J Theor Biol 2016; 408:237-242. [PMID: 27544417 DOI: 10.1016/j.jtbi.2016.08.022] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2016] [Revised: 08/01/2016] [Accepted: 08/15/2016] [Indexed: 10/21/2022]
Abstract
The standard genetic code (SGC) assigns amino acids to codons in such a way that the impact of point mutations is reduced, this is termed 'error minimization' (EM). The occurrence of EM has been attributed to the direct action of selection, however it is difficult to explain how the searching of alternative codes for an error minimized code can occur via codon reassignments, given that these are likely to be disruptive to the proteome. An alternative scenario is that EM has arisen via the process of genetic code expansion, facilitated by the duplication of genes encoding charging enzymes and adaptor molecules. This is likely to have led to similar amino acids being assigned to similar codons. Strikingly, we show that if during code expansion the most similar amino acid to the parent amino acid, out of the set of unassigned amino acids, is assigned to codons related to those of the parent amino acid, then genetic codes with EM superior to the SGC easily arise. This scheme mimics code expansion via the gene duplication of charging enzymes and adaptors. The result is obtained for a variety of different schemes of genetic code expansion and provides a mechanistically realistic manner in which EM has arisen in the SGC. These observations might be taken as evidence for self-organization in the earliest stages of life.
Collapse
Affiliation(s)
- Steven E Massey
- Department of Biology, University of Puerto Rico - Rio Piedras, San Juan, PR 00931, USA.
| |
Collapse
|
17
|
Di Giulio M. The lack of foundation in the mechanism on which are based the physico-chemical theories for the origin of the genetic code is counterposed to the credible and natural mechanism suggested by the coevolution theory. J Theor Biol 2016; 399:134-40. [PMID: 27067244 DOI: 10.1016/j.jtbi.2016.04.005] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2016] [Revised: 03/29/2016] [Accepted: 04/01/2016] [Indexed: 11/25/2022]
Abstract
I analyze the mechanism on which are based the majority of theories that put to the center of the origin of the genetic code the physico-chemical properties of amino acids. As this mechanism is based on excessive mutational steps, I conclude that it could not have been operative or if operative it would not have allowed a full realization of predictions of these theories, because this mechanism contained, evidently, a high indeterminacy. I make that disapproving the four-column theory of the origin of the genetic code (Higgs, 2009) and reply to the criticism that was directed towards the coevolution theory of the origin of the genetic code. In this context, I suggest a new hypothesis that clarifies the mechanism by which the domains of codons of the precursor amino acids would have evolved, as predicted by the coevolution theory. This mechanism would have used particular elongation factors that would have constrained the evolution of all amino acids belonging to a given biosynthetic family to the progenitor pre-tRNA, that for first recognized, the first codons that evolved in a certain codon domain of a determined precursor amino acid. This happened because the elongation factors recognized two characteristics of the progenitor pre-tRNAs of precursor amino acids, which prevented the elongation factors from recognizing the pre-tRNAs belonging to biosynthetic families of different precursor amino acids. Finally, I analyze by means of Fisher's exact test, the distribution, within the genetic code, of the biosynthetic classes of amino acids and the ones of polarity values of amino acids. This analysis would seem to support the biosynthetic classes of amino acids over the ones of polarity values, as the main factor that led to the structuring of the genetic code, with the physico-chemical properties of amino acids playing only a subsidiary role in this evolution. As a whole, the full analysis brings to the conclusion that the coevolution theory of the origin of the genetic code would be a theory highly corroborated.
Collapse
Affiliation(s)
- Massimo Di Giulio
- Early Evolution of Life Laboratory, Institute of Biosciences and Bioresources, CNR, Via P. Castellino, 111, 80131 Naples, Italy.
| |
Collapse
|
18
|
Kumar B, Saini S. Analysis of the optimality of the standard genetic code. MOLECULAR BIOSYSTEMS 2016; 12:2642-51. [DOI: 10.1039/c6mb00262e] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Many theories have been proposed attempting to explain the origin of the genetic code. In this work, we compare performance of the standard genetic code against millions of randomly generated codes. On left, ability of genetic codes to encode additional information and their robustness to frameshift mutations.
Collapse
Affiliation(s)
- Balaji Kumar
- Department of Chemical Engineering
- Indian Institute of Technology Bombay
- Mumbai – 400 076
- India
| | - Supreet Saini
- Department of Chemical Engineering
- Indian Institute of Technology Bombay
- Mumbai – 400 076
- India
| |
Collapse
|
19
|
Abstract
A pattern in which nucleotide transitions are favored several fold over transversions is common in molecular evolution. When this pattern occurs among amino acid replacements, explanations often invoke an effect of selection, on the grounds that transitions are more conservative in their effects on proteins. However, the underlying hypothesis of conservative transitions has never been tested directly. Here we assess support for this hypothesis using direct evidence: the fitness effects of mutations in actual proteins measured via individual or paired growth experiments. We assembled data from 8 published studies, ranging in size from 24 to 757 single-nucleotide mutations that change an amino acid. Every study has the statistical power to reveal significant effects of amino acid exchangeability, and most studies have the power to discern a binary conservative-vs-radical distinction. However, only one study suggests that transitions are significantly more conservative than transversions. In the combined set of 1,239 replacements (544 transitions, 695 transversions), the chance that a transition is more conservative than a transversion is 53 % (95 % confidence interval 50 to 56) compared with the null expectation of 50 %. We show that this effect is not large compared with that of most biochemical factors, and is not large enough to explain the several-fold bias observed in evolution. In short, the available data have the power to verify the “conservative transitions” hypothesis if true, but suggest instead that selection on proteins plays at best a minor role in the observed bias.
Collapse
Affiliation(s)
- Arlin Stoltzfus
- Institute for Bioscience and Biotechnology Research, Rockville, MD Genome-scale Measurements Group, National Institute of Standards and Technology, Gaithersburg, MD
| | - Ryan W Norris
- Department of Evolution, Ecology and Organismal Biology, The Ohio State University
| |
Collapse
|
20
|
Massey SE. Genetic code evolution reveals the neutral emergence of mutational robustness, and information as an evolutionary constraint. Life (Basel) 2015; 5:1301-32. [PMID: 25919033 PMCID: PMC4500140 DOI: 10.3390/life5021301] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Revised: 04/02/2015] [Accepted: 04/03/2015] [Indexed: 01/09/2023] Open
Abstract
The standard genetic code (SGC) is central to molecular biology and its origin and evolution is a fundamental problem in evolutionary biology, the elucidation of which promises to reveal much about the origins of life. In addition, we propose that study of its origin can also reveal some fundamental and generalizable insights into mechanisms of molecular evolution, utilizing concepts from complexity theory. The first is that beneficial traits may arise by non-adaptive processes, via a process of "neutral emergence". The structure of the SGC is optimized for the property of error minimization, which reduces the deleterious impact of point mutations. Via simulation, it can be shown that genetic codes with error minimization superior to the SGC can emerge in a neutral fashion simply by a process of genetic code expansion via tRNA and aminoacyl-tRNA synthetase duplication, whereby similar amino acids are added to codons related to that of the parent amino acid. This process of neutral emergence has implications beyond that of the genetic code, as it suggests that not all beneficial traits have arisen by the direct action of natural selection; we term these "pseudaptations", and discuss a range of potential examples. Secondly, consideration of genetic code deviations (codon reassignments) reveals that these are mostly associated with a reduction in proteome size. This code malleability implies the existence of a proteomic constraint on the genetic code, proportional to the size of the proteome (P), and that its reduction in size leads to an "unfreezing" of the codon - amino acid mapping that defines the genetic code, consistent with Crick's Frozen Accident theory. The concept of a proteomic constraint may be extended to propose a general informational constraint on genetic fidelity, which may be used to explain variously, differences in mutation rates in genomes with differing proteome sizes, differences in DNA repair capacity and genome GC content between organisms, a selective pressure in the evolution of sexual reproduction, and differences in translational fidelity. Lastly, the utility of the concept of an informational constraint to other diverse fields of research is explored.
Collapse
Affiliation(s)
- Steven E Massey
- Biology Department, PO Box 23360, University of Puerto Rico-Rio Piedras, San Juan, PR 00931, USA.
| |
Collapse
|
21
|
On How Many Fundamental Kinds of Cells are Present on Earth: Looking for Phylogenetic Traits that Would Allow the Identification of the Primary Lines of Descent. J Mol Evol 2014; 78:313-20. [DOI: 10.1007/s00239-014-9626-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2014] [Accepted: 05/21/2014] [Indexed: 11/26/2022]
|
22
|
Stephenson JD, Freeland SJ. Unearthing the root of amino acid similarity. J Mol Evol 2013; 77:159-69. [PMID: 23743923 PMCID: PMC6763418 DOI: 10.1007/s00239-013-9565-0] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2013] [Accepted: 05/08/2013] [Indexed: 12/31/2022]
Abstract
Similarities and differences between amino acids define the rates at which they substitute for one another within protein sequences and the patterns by which these sequences form protein structures. However, there exist many ways to measure similarity, whether one considers the molecular attributes of individual amino acids, the roles that they play within proteins, or some nuanced contribution of each. One popular approach to representing these relationships is to divide the 20 amino acids of the standard genetic code into groups, thereby forming a simplified amino acid alphabet. Here, we develop a method to compare or combine different simplified alphabets, and apply it to 34 simplified alphabets from the scientific literature. We use this method to show that while different suggestions vary and agree in non-intuitive ways, they combine to reveal a consensus view of amino acid similarity that is clearly rooted in physico-chemistry.
Collapse
Affiliation(s)
- James D Stephenson
- NASA Astrobiology Institute, University of Hawaii, Honolulu, HI, 96822, USA,
| | | |
Collapse
|
23
|
Morgens DW, Cavalcanti ARO. An alternative look at code evolution: using non-canonical codes to evaluate adaptive and historic models for the origin of the genetic code. J Mol Evol 2013; 76:71-80. [PMID: 23344715 DOI: 10.1007/s00239-013-9542-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2012] [Accepted: 01/15/2013] [Indexed: 10/27/2022]
Abstract
The canonical code has been shown many times to be highly robust against point mutations; that is, mutations that change a single nucleotide tend to result in similar amino acids more often than expected by chance. There are two major types of models for the origin of the code, which explain how this sophisticated structure evolved. Adaptive models state that the primitive code was specifically selected for error minimization, while historic models hypothesize that the robustness of the code is an artifact or by-product of the mechanism of code evolution. In this paper, we evaluated the levels of robustness in existing non-canonical codes as well as codes that differ in only one codon assignment from the standard code. We found that the level of robustness of many of these codes is comparable or better than that of the standard code. Although these results do not preclude an adaptive origin of the genetic code, they suggest that the code was not selected for minimizing the effects of point mutations.
Collapse
Affiliation(s)
- David W Morgens
- Department of Biology, Pomona College, 175 W 6th Street, Claremont, CA, USA
| | | |
Collapse
|
24
|
Caporaso JG, Knight R. New insight into the diversity of life's building blocks: evenness, not variance. ASTROBIOLOGY 2011; 11:197-198. [PMID: 21417743 DOI: 10.1089/ast.2011.2280] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
|
25
|
Santos J, Monteagudo A. Simulated evolution applied to study the genetic code optimality using a model of codon reassignments. BMC Bioinformatics 2011; 12:56. [PMID: 21338505 PMCID: PMC3053255 DOI: 10.1186/1471-2105-12-56] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2010] [Accepted: 02/21/2011] [Indexed: 11/29/2022] Open
Abstract
Background As the canonical code is not universal, different theories about its origin and organization have appeared. The optimization or level of adaptation of the canonical genetic code was measured taking into account the harmful consequences resulting from point mutations leading to the replacement of one amino acid for another. There are two basic theories to measure the level of optimization: the statistical approach, which compares the canonical genetic code with many randomly generated alternative ones, and the engineering approach, which compares the canonical code with the best possible alternative. Results Here we used a genetic algorithm to search for better adapted hypothetical codes and as a method to guess the difficulty in finding such alternative codes, allowing to clearly situate the canonical code in the fitness landscape. This novel proposal of the use of evolutionary computing provides a new perspective in the open debate between the use of the statistical approach, which postulates that the genetic code conserves amino acid properties far better than expected from a random code, and the engineering approach, which tends to indicate that the canonical genetic code is still far from optimal. We used two models of hypothetical codes: one that reflects the known examples of codon reassignment and the model most used in the two approaches which reflects the current genetic code translation table. Although the standard code is far from a possible optimum considering both models, when the more realistic model of the codon reassignments was used, the evolutionary algorithm had more difficulty to overcome the efficiency of the canonical genetic code. Conclusions Simulated evolution clearly reveals that the canonical genetic code is far from optimal regarding its optimization. Nevertheless, the efficiency of the canonical code increases when mistranslations are taken into account with the two models, as indicated by the fact that the best possible codes show the patterns of the standard genetic code. Our results are in accordance with the postulates of the engineering approach and indicate that the main arguments of the statistical approach are not enough to its assertion of the extreme efficiency of the canonical genetic code.
Collapse
Affiliation(s)
- José Santos
- Department of Computer Science, University of A Coruña, Campus de Elviña s/n, 15071 A Coruña, Spain.
| | | |
Collapse
|
26
|
Santos J, Monteagudo Á. Study of the genetic code adaptability by means of a genetic algorithm. J Theor Biol 2010; 264:854-65. [DOI: 10.1016/j.jtbi.2010.02.041] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2009] [Revised: 01/05/2010] [Accepted: 02/23/2010] [Indexed: 11/30/2022]
|
27
|
Searching of Code Space for an Error-Minimized Genetic Code Via Codon Capture Leads to Failure, or Requires At Least 20 Improving Codon Reassignments Via the Ambiguous Intermediate Mechanism. J Mol Evol 2010; 70:106-15. [DOI: 10.1007/s00239-009-9313-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2009] [Accepted: 12/07/2009] [Indexed: 10/19/2022]
|
28
|
Abstract
The genetic code is nearly universal, and the arrangement of the codons in the standard codon table is highly nonrandom. The three main concepts on the origin and evolution of the code are the stereochemical theory, according to which codon assignments are dictated by physicochemical affinity between amino acids and the cognate codons (anticodons); the coevolution theory, which posits that the code structure coevolved with amino acid biosynthesis pathways; and the error minimization theory under which selection to minimize the adverse effect of point mutations and translation errors was the principal factor of the code's evolution. These theories are not mutually exclusive and are also compatible with the frozen accident hypothesis, that is, the notion that the standard code might have no special properties but was fixed simply because all extant life forms share a common ancestor, with subsequent changes to the code, mostly, precluded by the deleterious effect of codon reassignment. Mathematical analysis of the structure and possible evolutionary trajectories of the code shows that it is highly robust to translational misreading but there are numerous more robust codes, so the standard code potentially could evolve from a random code via a short sequence of codon series reassignments. Thus, much of the evolution that led to the standard code could be a combination of frozen accident with selection for error minimization although contributions from coevolution of the code with metabolic pathways and weak affinities between amino acids and nucleotide triplets cannot be ruled out. However, such scenarios for the code evolution are based on formal schemes whose relevance to the actual primordial evolution is uncertain. A real understanding of the code origin and evolution is likely to be attainable only in conjunction with a credible scenario for the evolution of the coding principle itself and the translation system.
Collapse
Affiliation(s)
- Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
| | | |
Collapse
|
29
|
Massey SE. A Neutral Origin for Error Minimization in the Genetic Code. J Mol Evol 2008; 67:510-6. [DOI: 10.1007/s00239-008-9167-4] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2008] [Revised: 09/03/2008] [Accepted: 09/03/2008] [Indexed: 10/21/2022]
|
30
|
Foltan JS. tRNA genes and the genetic code. J Theor Biol 2008; 253:469-82. [PMID: 18501928 DOI: 10.1016/j.jtbi.2008.03.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2007] [Revised: 02/04/2008] [Accepted: 03/05/2008] [Indexed: 11/27/2022]
Abstract
The genetic code describes translational assignments between codons and amino acids. tRNAs and aminoacyl-tRNA synthetases (aaRSs) are those molecules by means of which these assignments are established. Any aaRS recognizes its tRNAs according to some of their nucleotides called identity elements (IEs). Let a 1Mut-similarity Sim (1Mut) be the average similarity between such tRNA genes whose codons differ by one point mutation. We showed that: (1) a global maximum of Sim (1Mut) is reached at the standard genetic code 27 times for 4 sets of IEs of tRNA genes of eukaryotic species, while it is so only 5 times for similarities Sim (C&R) between all tRNA genes whose codons lie in the same column or row of the code. Therefore, point mutations of anticodons were tested by nature to recruit tRNAs from one isoaccepting group to another, (2) because plain similarities Sim (all) between tRNA genes of species within any of the three domains of life are higher than between tRNA genes of species belonging to different domains, tRNA genes retained information about early evolution of cells, (3) we searched the order of tRNAs in which they were most probably assigned to their codons and amino acids. The beginning Ala, (Val), Pro, Ile, Lys, Arg, Trp, Met, Asp, Cys, (Ser) of our resulting chronology lies under a plateau on a graph of Sim (1Mut,IE)(univ.ancestors) plotted over this chronology for a set S(IE) of all IEs of tRNA genes, whose universal ancestors were separately computed for each codon. This plateau has remained preserved along the whole line of evolution of the code and is consistent with observations of Ribas de Pouplana and Schimmel [2001. Aminoacy1-tRNA synthetases: potential markers of genetic code development. Trends Biochem. Sci. 26, 591-598] that specific pairs of aaRSs-one from each of their two classes-can be docked simultaneously onto the acceptor stem of tRNA and hence an interaction existed between their ancestors using a reduced code, (4) sharpness of a local maximum of Sim (1Mut) at the standard code is almost 100% along our chronologies.
Collapse
Affiliation(s)
- Jaromir S Foltan
- Department of Nuclear Physics and Biophysics, Comenius University, Mlynska dolina, 842 48 Bratislava, Slovakia
| |
Collapse
|
31
|
Gutfraind A, Kempf A. Error-reducing structure of the genetic code indicates code origin in non-thermophile organisms. ORIGINS LIFE EVOL B 2008; 38:75-85. [PMID: 17554636 DOI: 10.1007/s11084-007-9071-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2006] [Revised: 03/28/2007] [Accepted: 04/03/2007] [Indexed: 10/23/2022]
Abstract
During the RNA World, organisms experienced high rates of genetic errors, which implies that there was strong evolutionary pressure to reduce the errors' phenotypical impact by suitably structuring the still-evolving genetic code. Therefore, the relative rates of the various types of genetic errors should have left characteristic imprints in the structure of the genetic code. Here, we show that, therefore, it is possible to some extent to reconstruct those error rates, as well as the nucleotide frequencies, for the time when the code was fixed. We find evidence indicating that the frequencies of G and C in the genome were not elevated. Since, for thermodynamic reasons, RNA in thermophiles tends to possess elevated G+C content, this result indicates that the fixation of the genetic code occurred in organisms which were either not thermophiles or that the code's fixation occurred after the rise of DNA.
Collapse
Affiliation(s)
- Alexander Gutfraind
- Center for Applied Mathematics, Cornell University, Ithaca, New York 14853, USA.
| | | |
Collapse
|
32
|
Novozhilov AS, Wolf YI, Koonin EV. Evolution of the genetic code: partial optimization of a random code for robustness to translation error in a rugged fitness landscape. Biol Direct 2007; 2:24. [PMID: 17956616 PMCID: PMC2211284 DOI: 10.1186/1745-6150-2-24] [Citation(s) in RCA: 85] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2007] [Accepted: 10/23/2007] [Indexed: 11/30/2022] Open
Abstract
Background The standard genetic code table has a distinctly non-random structure, with similar amino acids often encoded by codons series that differ by a single nucleotide substitution, typically, in the third or the first position of the codon. It has been repeatedly argued that this structure of the code results from selective optimization for robustness to translation errors such that translational misreading has the minimal adverse effect. Indeed, it has been shown in several studies that the standard code is more robust than a substantial majority of random codes. However, it remains unclear how much evolution the standard code underwent, what is the level of optimization, and what is the likely starting point. Results We explored possible evolutionary trajectories of the genetic code within a limited domain of the vast space of possible codes. Only those codes were analyzed for robustness to translation error that possess the same block structure and the same degree of degeneracy as the standard code. This choice of a small part of the vast space of possible codes is based on the notion that the block structure of the standard code is a consequence of the structure of the complex between the cognate tRNA and the codon in mRNA where the third base of the codon plays a minimum role as a specificity determinant. Within this part of the fitness landscape, a simple evolutionary algorithm, with elementary evolutionary steps comprising swaps of four-codon or two-codon series, was employed to investigate the optimization of codes for the maximum attainable robustness. The properties of the standard code were compared to the properties of four sets of codes, namely, purely random codes, random codes that are more robust than the standard code, and two sets of codes that resulted from optimization of the first two sets. The comparison of these sets of codes with the standard code and its locally optimized version showed that, on average, optimization of random codes yielded evolutionary trajectories that converged at the same level of robustness to translation errors as the optimization path of the standard code; however, the standard code required considerably fewer steps to reach that level than an average random code. When evolution starts from random codes whose fitness is comparable to that of the standard code, they typically reach much higher level of optimization than the standard code, i.e., the standard code is much closer to its local minimum (fitness peak) than most of the random codes with similar levels of robustness. Thus, the standard genetic code appears to be a point on an evolutionary trajectory from a random point (code) about half the way to the summit of the local peak. The fitness landscape of code evolution appears to be extremely rugged, containing numerous peaks with a broad distribution of heights, and the standard code is relatively unremarkable, being located on the slope of a moderate-height peak. Conclusion The standard code appears to be the result of partial optimization of a random code for robustness to errors of translation. The reason the code is not fully optimized could be the trade-off between the beneficial effect of increasing robustness to translation errors and the deleterious effect of codon series reassignment that becomes increasingly severe with growing complexity of the evolving system. Thus, evolution of the code can be represented as a combination of adaptation and frozen accident. Reviewers This article was reviewed by David Ardell, Allan Drummond (nominated by Laura Landweber), and Rob Knight. Open Peer Review This article was reviewed by David Ardell, Allan Drummond (nominated by Laura Landweber), and Rob Knight.
Collapse
Affiliation(s)
- Artem S Novozhilov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | | | | |
Collapse
|
33
|
Stoltzfus A, Yampolsky LY. Amino acid exchangeability and the adaptive code hypothesis. J Mol Evol 2007; 65:456-62. [PMID: 17896070 DOI: 10.1007/s00239-007-9026-8] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2007] [Revised: 05/18/2007] [Accepted: 08/23/2007] [Indexed: 11/26/2022]
Abstract
Since the genetic code first was determined, many have claimed that it is organized adaptively, so as to assign similar codons to similar amino acids. This claim has proved difficult to establish due to the absence of relevant comparative data on alternative primordial codes and of objective measures of amino acid exchangeability. Here we use a recently developed measure of exchangeability to evaluate a null hypothesis and two alternative hypotheses about the adaptiveness of the genetic code. The null hypothesis that there is no tendency for exchangeable amino acids to be assigned to similar codons can be excluded here as expected from earlier work. The first alternative hypothesis is that any such correlation between codon distance and amino acid distance is due to incremental mechanisms of code evolution, and not to adaptation to reduce deleterious effects of future mutations. More specifically, new codon assignments that occur by ambiguity reduction or by codon capture will tend to give rise to correlations, whether due to the condition of amino acid ambiguity, or to the condition of similarity between a new tRNA synthetase (or tRNA) and its parent. The second alternative hypothesis, the adaptive hypothesis, then may be defined as an excess relative to what may be expected given the incremental nature of evolution, reflecting true adaptation for robustness rather than an incidental effect. The results reported here indicate that most of the nonrandomness in the amino acids to codon assignments can be explained by incremental code evolution, with a small residue of orderliness that may reflect code adaptation.
Collapse
Affiliation(s)
- Arlin Stoltzfus
- Center for Advanced Research in Biotechnology, 9600 Gudelsky Drive, Rockville, MD 20850, USA.
| | | |
Collapse
|
34
|
Goodarzi H, Katanforoush A, Torabi N, Najafabadi HS. Solvent accessibility, residue charge and residue volume, the three ingredients of a robust amino acid substitution matrix. J Theor Biol 2007; 245:715-25. [PMID: 17240399 DOI: 10.1016/j.jtbi.2006.12.014] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2006] [Revised: 10/31/2006] [Accepted: 12/08/2006] [Indexed: 11/25/2022]
Abstract
Cost measure matrices or different amino acid indices have been widely used for studies in many fields of biology. One major criticism of these studies might be based on the unavailability of an unbiased and yet effective amino acid substitution matrix. Throughout this study we have devised a cost measure matrix based on the solvent accessibility, residue charge, and residue volume indices. Performed analyses on this novel substitution matrix (i.e. solvent accessibility charge volume (SCV) matrix) support the uncontaminated nature of this matrix regarding the genetic code. Although highly similar to a number of previously available cost measure matrices, the SCV matrix results in a more significant optimality in the error-buffering capacity of the genetic code when compared to many other amino acid substitution matrices. Besides, a method to compare an SCV-based scoring matrix with a number of widely used matrices has been devised, the results of which highlights the robustness of this matrix in protein family discrimination.
Collapse
Affiliation(s)
- Hani Goodarzi
- Molecular Biology Department, Princeton University, Princeton, NJ, USA.
| | | | | | | |
Collapse
|
35
|
Bulka B, desJardins M, Freeland SJ. An interactive visualization tool to explore the biophysical properties of amino acids and their contribution to substitution matrices. BMC Bioinformatics 2006; 7:329. [PMID: 16817972 PMCID: PMC1524819 DOI: 10.1186/1471-2105-7-329] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2005] [Accepted: 07/03/2006] [Indexed: 11/26/2022] Open
Abstract
Background Quantitative descriptions of amino acid similarity, expressed as probabilistic models of evolutionary interchangeability, are central to many mainstream bioinformatic procedures such as sequence alignment, homology searching, and protein structural prediction. Here we present a web-based, user-friendly analysis tool that allows any researcher to quickly and easily visualize relationships between these bioinformatic metrics and to explore their relationships to underlying indices of amino acid molecular descriptors. Results We demonstrate the three fundamental types of question that our software can address by taking as a specific example the connections between 49 measures of amino acid biophysical properties (e.g., size, charge and hydrophobicity), a generalized model of amino acid substitution (as represented by the PAM74-100 matrix), and the mutational distance that separates amino acids within the standard genetic code (i.e., the number of point mutations required for interconversion during protein evolution). We show that our software allows a user to recapture the insights from several key publications on these topics in just a few minutes. Conclusion Our software facilitates rapid, interactive exploration of three interconnected topics: (i) the multidimensional molecular descriptors of the twenty proteinaceous amino acids, (ii) the correlation of these biophysical measurements with observed patterns of amino acid substitution, and (iii) the causal basis for differences between any two observed patterns of amino acid substitution. This software acts as an intuitive bioinformatic exploration tool that can guide more comprehensive statistical analyses relating to a diverse array of specific research questions.
Collapse
Affiliation(s)
- Blazej Bulka
- Department of Computer Science and Electrical Engineering, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA
- Department of Biological Sciences, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA
| | - Marie desJardins
- Department of Computer Science and Electrical Engineering, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA
| | - Stephen J Freeland
- Department of Biological Sciences, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA
| |
Collapse
|
36
|
Goodarzi H, Shateri Najafabadi H, Torabi N. On the coevolution of genes and genetic code. Gene 2005; 362:133-40. [PMID: 16213111 DOI: 10.1016/j.gene.2005.08.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2005] [Revised: 07/17/2005] [Accepted: 08/03/2005] [Indexed: 10/25/2022]
Abstract
The canonical genetic code acts efficiently in minimizing the effects of mistranslations and point mutations. In the work presented we have also considered the effects of single nucleotide insertions and deletions on the optimality of the genetic code. Our results suggest that the canonical genetic code compensates for the ins/del mutations as well as mistranslations and point mutations. On the other hand, we highlighted the point that ins/del mutations have a lesser impact on the selected genes of Saccharomyces cerevisiae compared to randomly generated ones. We hypothesized that the codon usage preferences in S. cerevisiae genes are responsible for the higher efficiency of translation machinery in this organism. Our results support the conjecture that codon usage preferences render the genetic code more effective in minimizing the effects of ins/del mutations.
Collapse
Affiliation(s)
- Hani Goodarzi
- Department of Biotechnology, Faculty of Science, University of Tehran, Tehran, Iran.
| | | | | |
Collapse
|
37
|
Goodarzi H, Najafabadi HS, Torabi N. Designing a neural network for the constraint optimization of the fitness functions devised based on the load minimization of the genetic code. Biosystems 2005; 81:91-100. [PMID: 15936137 DOI: 10.1016/j.biosystems.2005.02.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2004] [Revised: 01/16/2005] [Accepted: 02/02/2005] [Indexed: 11/20/2022]
Abstract
Nonrandom patterns in codon assignments are supported by many statistical and biochemical studies in the last two decades. The canonical genetic code is known to be highly efficient in minimizing the effects of mistranslational errors and point mutations, an ability, which in term is designated "load minimization". Prior studies have included many attempts at quantitative estimation of the fraction of randomly generated codes, which in terms of load minimization, score higher than the canonical genetic code. In this study, a neural network, which estimates a highly optimized genetic code in a relatively short period of time has been devised. Several fitness functions were used throughout this text. Meanwhile, we have made use of two cost measure matrices, PAM74-100 and mutation matrix.
Collapse
Affiliation(s)
- Hani Goodarzi
- Department of Biotechnology, Faculty of Science, University of Tehran, Enghelab St., Tehran, Iran.
| | | | | |
Collapse
|
38
|
Goodarzi H, Najafabadi HS, Hassani K, Nejad HA, Torabi N. On the optimality of the genetic code, with the consideration of coevolution theory by comparison of prominent cost measure matrices. J Theor Biol 2005; 235:318-25. [PMID: 15882694 DOI: 10.1016/j.jtbi.2005.01.012] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2004] [Revised: 01/20/2005] [Accepted: 01/24/2005] [Indexed: 11/22/2022]
Abstract
Statistical and biochemical studies have revealed non-random patterns in codon assignments. The canonical genetic code is known to be highly efficient in minimizing the effects of mistranslation errors and point mutations, since it is known that when an amino acid is converted to another due to error, the biochemical properties of the resulted amino acid are usually very similar to those of the original one. In this study, using altered forms of the fitness functions used in the prior studies, we have optimized the parameters involved in the calculation of the error minimizing property of the genetic code so that the genetic code outscores the random codes as much as possible. This work also compares two prominent matrices, the Mutation Matrix and Point Accepted Mutations 74-100 (PAM(74-100)). It has been resulted that the hypothetical properties of the coevolution theory of the genetic code are already considered in PAM(74-100), giving more evidence on the existence of bias towards the genetic code in this matrix. Furthermore, our results indicate that PAM(74-100) is biased towards the single base mistranslation occurrences in second codon position as well as the frequency of amino acids. Thus PAM(74-100) is not a suitable substitution matrix for the studies conducted on the evolution of the genetic code.
Collapse
Affiliation(s)
- Hani Goodarzi
- Department of Biotechnology, Faculty of Science, University of Tehran, Enghelab Ave., Tehran, Iran.
| | | | | | | | | |
Collapse
|
39
|
Abstract
Given the structure of the genetic code, synonymous codons differ in their capacity to minimize the effects of errors due to mutation or mistranslation. I suggest that this may lead, in protein-coding genes, to a preference for codons that minimize the impact of errors at the protein level. I develop a theoretical measure of error minimization for each codon, based on amino acid similarity. This measure is used to calculate the degree of error minimization for 82 genes of Drosophila melanogaster and 432 rodent genes and to study its relationship with CG content, the degree of codon usage bias, and the rate of nucleotide substitution. I show that (i) Drosophila and rodent genes tend to prefer codons that minimize errors; (ii) this cannot be merely the effect of mutation bias; (iii) the degree of error minimization is correlated with the degree of codon usage bias; (iv) the amino acids that contribute more to codon usage bias are the ones for which synonymous codons differ more in the capacity to minimize errors; and (v) the degree of error minimization is correlated with the rate of nonsynonymous substitution. These results suggest that natural selection for error minimization at the protein level plays a role in the evolution of coding sequences in Drosophila and rodents.
Collapse
Affiliation(s)
- Marco Archetti
- Département de Biologie, Section Ecologie et Evolution, Université de Fribourg, Chemin du Musée 10, 1700, Fribourg, Switzerland.
| |
Collapse
|
40
|
Di Giulio M. The origin of the genetic code: theories and their relationships, a review. Biosystems 2004; 80:175-84. [PMID: 15823416 DOI: 10.1016/j.biosystems.2004.11.005] [Citation(s) in RCA: 97] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2004] [Revised: 11/12/2004] [Accepted: 11/18/2004] [Indexed: 10/26/2022]
Abstract
A review of the main theories proposed to explain the origin of the genetic code is presented. I analyze arguments and data in favour of different theories proposed to explain the origin of the organization of the genetic code. It is possible to suggest a mechanism that makes compatible the different theories of the origin of the code, even if these are based on a historical or physicochemical determinism and thus appear incompatible by definition. Finally, I discuss the question of why a given number of synonymous codons was attributed to the amino acids in the genetic code.
Collapse
Affiliation(s)
- Massimo Di Giulio
- Institute of Genetics and Biophysics Adriano Buzzati-Traverso, CNR, Naples, Italy
| |
Collapse
|
41
|
Goodarzi H, Nejad HA, Torabi N. On the optimality of the genetic code, with the consideration of termination codons. Biosystems 2004; 77:163-73. [PMID: 15527955 DOI: 10.1016/j.biosystems.2004.05.031] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2004] [Revised: 05/09/2004] [Accepted: 05/25/2004] [Indexed: 11/18/2022]
Abstract
The existence of nonrandom patterns in codon assignments is supported by many statistical and biochemical studies. The canonical genetic code is known to be highly efficient in minimizing the effects of mistranslation errors and point mutations. For example, it is known that when an error induces the conversion of an amino acid to another, the biochemical properties of the resulting amino acid are usually very similar to that of the original. Prior studies include many attempts at quantitative estimation of the fraction of randomly generated codes which, based upon load minimization, score higher than the canonical genetic code. In this study, we took into consideration both the relative frequencies of amino acids and nonsense mistranslations, factors which had been previously ignored. Incorporation of these parameters, resulted in a fitness function (phi) which rendered the canonical genetic code to be highly optimized with respect to load minimization. Considering termination codons, we applied a biosynthetic version of the coevolution theory, however, with low significance. We employed a revised cost for the precursor-product pairs of amino acids and showed that the significance of this approach depends on the cost measure matrix used by the researcher. Thus, we have compared the two prominent matrices, point accepted mutations 74-100 (PAM(74-100)) and mutation matrix in our study.
Collapse
Affiliation(s)
- Hani Goodarzi
- Department of Biotechnology, Faculty of Science, University of Tehran, Enghelab Avenue, Tehran, Iran.
| | | | | |
Collapse
|
42
|
Archetti M. Codon Usage Bias and Mutation Constraints Reduce the Level of ErrorMinimization of the Genetic Code. J Mol Evol 2004; 59:258-66. [PMID: 15486699 DOI: 10.1007/s00239-004-2620-0] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2003] [Accepted: 02/12/2004] [Indexed: 11/28/2022]
Abstract
Studies on the origin of the genetic code compare measures of the degree of error minimization of the standard code with measures produced by random variant codes but do not take into account codon usage, which was probably highly biased during the origin of the code. Codon usage bias could play an important role in the minimization of the chemical distances between amino acids because the importance of errors depends also on the frequency of the different codons. Here I show that when codon usage is taken into account, the degree of error minimization of the standard code may be dramatically reduced, and shifting to alternative codes often increases the degree of error minimization. This is especially true with a high CG content, which was probably the case during the origin of the code. I also show that the frequency of codes that perform better than the standard code, in terms of relative efficiency, is much higher in the neighborhood of the standard code itself, even when not considering codon usage bias; therefore alternative codes that differ only slightly from the standard code are more likely to evolve than some previous analyses suggested. My conclusions are that the standard genetic code is far from being an optimum with respect to error minimization and must have arisen for reasons other than error minimization.
Collapse
Affiliation(s)
- Marco Archetti
- Département de Biologie, Section Ecologie et Evolution, Université de Fribourg, Chemin du Musée 10, 1700 Fribourg, Switzerland.
| |
Collapse
|
43
|
|
44
|
Abstract
Since discovering the pattern by which amino acids are assigned to codons within the standard genetic code, investigators have explored the idea that natural selection placed biochemically similar amino acids near to one another in coding space so as to minimize the impact of mutations and/or mistranslations. The analytical evidence to support this theory has grown in sophistication and strength over the years, and counterclaims questioning its plausibility and quantitative support have yet to transcend some significant weaknesses in their approach. These weaknesses are illustrated here by means of a simple simulation model for adaptive genetic code evolution. There remain ill explored facets of the 'error minimizing' code hypothesis, however, including the mechanism and pathway by which an adaptive pattern of codon assignments emerged, the extent to which natural selection created synonym redundancy, its role in shaping the amino acid and nucleotide languages, and even the correct interpretation of the adaptive codon assignment pattern: these represent fertile areas for future research.
Collapse
Affiliation(s)
- Stephen J Freeland
- Department of Biology, University of Maryland, Baltimore County, Catonsville, MD, USA.
| | | | | |
Collapse
|