1
|
Wehbi S, Wheeler A, Morel B, Manepalli N, Minh BQ, Lauretta DS, Masel J. Order of amino acid recruitment into the genetic code resolved by last universal common ancestor's protein domains. Proc Natl Acad Sci U S A 2024; 121:e2410311121. [PMID: 39665745 DOI: 10.1073/pnas.2410311121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Accepted: 11/13/2024] [Indexed: 12/13/2024] Open
Abstract
The current "consensus" order in which amino acids were added to the genetic code is based on potentially biased criteria, such as the absence of sulfur-containing amino acids from the Urey-Miller experiment which lacked sulfur. More broadly, abiotic abundance might not reflect biotic abundance in the organisms in which the genetic code evolved. Here, we instead identify which protein domains date to the last universal common ancestor (LUCA) and then infer the order of recruitment from deviations of their ancestrally reconstructed amino acid frequencies from the still-ancient post-LUCA controls. We find that smaller amino acids were added to the code earlier, with no additional predictive power in the previous consensus order. Metal-binding (cysteine and histidine) and sulfur-containing (cysteine and methionine) amino acids were added to the genetic code much earlier than previously thought. Methionine and histidine were added to the code earlier than expected from their molecular weights and glutamine later. Early methionine availability is compatible with inferred early use of S-adenosylmethionine and early histidine with its purine-like structure and the demand for metal binding. Even more ancient protein sequences-those that had already diversified into multiple distinct copies prior to LUCA-have significantly higher frequencies of aromatic amino acids (tryptophan, tyrosine, phenylalanine, and histidine) and lower frequencies of valine and glutamic acid than single-copy LUCA sequences. If at least some of these sequences predate the current code, then their distinct enrichment patterns provide hints about earlier, alternative genetic codes.
Collapse
Affiliation(s)
- Sawsan Wehbi
- Genetics Graduate Interdisciplinary Program, University of Arizona, Tucson, AZ 85721
| | - Andrew Wheeler
- Genetics Graduate Interdisciplinary Program, University of Arizona, Tucson, AZ 85721
| | - Benoit Morel
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Nandini Manepalli
- Department of Molecular and Cellular Biology, University of Arizona, Tucson, AZ 85721
| | - Bui Quang Minh
- School of Computing, Australian National University, Canberra, ACT, Australia
| | - Dante S Lauretta
- Lunar and Planetary Laboratory, University of Arizona, Tucson, AZ 85721
| | - Joanna Masel
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721
| |
Collapse
|
2
|
Wehbi S, Wheeler A, Morel B, Manepalli N, Minh BQ, Lauretta DS, Masel J. Order of amino acid recruitment into the genetic code resolved by Last Universal Common Ancestor's protein domains. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.13.589375. [PMID: 38659899 PMCID: PMC11042313 DOI: 10.1101/2024.04.13.589375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
The current "consensus" order in which amino acids were added to the genetic code is based on potentially biased criteria, such as absence of sulfur-containing amino acids from the Urey-Miller experiment which lacked sulfur. More broadly, abiotic abundance might not reflect biotic abundance in the organisms in which the genetic code evolved. Here, we instead identify which protein domains date to the last universal common ancestor (LUCA), then infer the order of recruitment from deviations of their ancestrally reconstructed amino acid frequencies from the still-ancient post-LUCA controls. We find that smaller amino acids were added to the code earlier, with no additional predictive power in the previous "consensus" order. Metal-binding (cysteine and histidine) and sulfur-containing (cysteine and methionine) amino acids were added to the genetic code much earlier than previously thought. Methionine and histidine were added to the code earlier than expected from their molecular weights, and glutamine later. Early methionine availability is compatible with inferred early use of S-adenosylmethionine, and early histidine with its purine-like structure and the demand for metal-binding. Even more ancient protein sequences - those that had already diversified into multiple distinct copies prior to LUCA - have significantly higher frequencies of aromatic amino acids (tryptophan, tyrosine, phenylalanine and histidine), and lower frequencies of valine and glutamic acid than single copy LUCA sequences. If at least some of these sequences predate the current code, then their distinct enrichment patterns provide hints about earlier, alternative genetic codes.
Collapse
Affiliation(s)
- Sawsan Wehbi
- Genetics Graduate Interdisciplinary Program, University of Arizona, Tucson, Arizona, 85721, USA
| | - Andrew Wheeler
- Genetics Graduate Interdisciplinary Program, University of Arizona, Tucson, Arizona, 85721, USA
| | - Benoit Morel
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Nandini Manepalli
- Department of Molecular and Cellular Biology, University of Arizona, Tucson, AZ, 85721, USA
| | - Bui Quang Minh
- School of Computing, Australian National University, Canberra, ACT, Australia
| | - Dante S. Lauretta
- Lunar and Planetary Laboratory, University of Arizona, Tucson, AZ 85721, USA
| | - Joanna Masel
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, 85721, USA
| |
Collapse
|
3
|
Factors in Protobiomonomer Selection for the Origin of the Standard Genetic Code. Acta Biotheor 2021; 69:745-767. [PMID: 34283307 DOI: 10.1007/s10441-021-09420-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Accepted: 07/01/2021] [Indexed: 10/20/2022]
Abstract
Natural selection of specific protobiomonomers during abiogenic development of the prototype genetic code is hindered by the diversity of structural, spatial, and rotational isomers that have identical elemental composition and molecular mass (M), but can vary significantly in their physicochemical characteristics, such as the melting temperature Tm, the Tm:M ratio, and the solubility in water, due to different positions of atoms in the molecule. These parameters differ between cis- and trans-isomers of dicarboxylic acids, spatial monosaccharide isomers, and structural isomers of α-, β-, and γ-amino acids. The stable planar heterocyclic molecules of the major nucleobases comprise four (C, H, N, O) or three (C, H, N) elements and contain a single -C=C bond and two nitrogen atoms in each heterocycle involved in C-N and C=N bonds. They exist as isomeric resonance hybrids of single and double bonds and as a mixture of tautomer forms due to the presence of -C=O and/or -NH2 side groups. They are thermostable, insoluble in water, and exhibit solid-state stability, which is of central importance for DNA molecules as carriers of genetic information. In M-Tm diagrams, proteinogenic amino acids and the corresponding codons are distributed fairly regularly relative to the distinct clusters of purine and pyrimidine bases, reflecting the correspondence between codons and amino acids that was established in different periods of genetic code development. The body of data on the evolution of the genetic code system indicates that the elemental composition and molecular structure of protobiomonomers, and their M, Tm, photostability, and aqueous solubility determined their selection in the emergence of the standard genetic code.
Collapse
|
4
|
Many alternative and theoretical genetic codes are more robust to amino acid replacements than the standard genetic code. J Theor Biol 2019; 464:21-32. [DOI: 10.1016/j.jtbi.2018.12.030] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2018] [Revised: 12/17/2018] [Accepted: 12/19/2018] [Indexed: 02/07/2023]
|
5
|
Seligmann H, Warthi G. Genetic Code Optimization for Cotranslational Protein Folding: Codon Directional Asymmetry Correlates with Antiparallel Betasheets, tRNA Synthetase Classes. Comput Struct Biotechnol J 2017; 15:412-424. [PMID: 28924459 PMCID: PMC5591391 DOI: 10.1016/j.csbj.2017.08.001] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2017] [Revised: 07/20/2017] [Accepted: 08/05/2017] [Indexed: 12/14/2022] Open
Abstract
A new codon property, codon directional asymmetry in nucleotide content (CDA), reveals a biologically meaningful genetic code dimension: palindromic codons (first and last nucleotides identical, codon structure XZX) are symmetric (CDA = 0), codons with structures ZXX/XXZ are 5'/3' asymmetric (CDA = - 1/1; CDA = - 0.5/0.5 if Z and X are both purines or both pyrimidines, assigning negative/positive (-/+) signs is an arbitrary convention). Negative/positive CDAs associate with (a) Fujimoto's tetrahedral codon stereo-table; (b) tRNA synthetase class I/II (aminoacylate the 2'/3' hydroxyl group of the tRNA's last ribose, respectively); and (c) high/low antiparallel (not parallel) betasheet conformation parameters. Preliminary results suggest CDA-whole organism associations (body temperature, developmental stability, lifespan). Presumably, CDA impacts spatial kinetics of codon-anticodon interactions, affecting cotranslational protein folding. Some synonymous codons have opposite CDA sign (alanine, leucine, serine, and valine), putatively explaining how synonymous mutations sometimes affect protein function. Correlations between CDA and tRNA synthetase classes are weaker than between CDA and antiparallel betasheet conformation parameters. This effect is stronger for mitochondrial genetic codes, and potentially drives mitochondrial codon-amino acid reassignments. CDA reveals information ruling nucleotide-protein relations embedded in reversed (not reverse-complement) sequences (5'-ZXX-3'/5'-XXZ-3').
Collapse
Affiliation(s)
- Hervé Seligmann
- Aix-Marseille Univ, Unité de Recherche sur les Maladies Infectieuses et Tropicales Emergentes, UM 63, CNRS UMR7278, IRD 198, INSERM U1095, Institut Hospitalo-Universitaire Méditerranée-Infection, Marseille, Postal code 13385, France
- Dept. Ecol Evol Behav, Alexander Silberman Inst Life Sci, The Hebrew University of Jerusalem, IL-91904 Jerusalem, Israel
| | - Ganesh Warthi
- Aix-Marseille Univ, Unité de Recherche sur les Maladies Infectieuses et Tropicales Emergentes, UM 63, CNRS UMR7278, IRD 198, INSERM U1095, Institut Hospitalo-Universitaire Méditerranée-Infection, Marseille, Postal code 13385, France
| |
Collapse
|
6
|
Massey SE. The neutral emergence of error minimized genetic codes superior to the standard genetic code. J Theor Biol 2016; 408:237-242. [PMID: 27544417 DOI: 10.1016/j.jtbi.2016.08.022] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2016] [Revised: 08/01/2016] [Accepted: 08/15/2016] [Indexed: 10/21/2022]
Abstract
The standard genetic code (SGC) assigns amino acids to codons in such a way that the impact of point mutations is reduced, this is termed 'error minimization' (EM). The occurrence of EM has been attributed to the direct action of selection, however it is difficult to explain how the searching of alternative codes for an error minimized code can occur via codon reassignments, given that these are likely to be disruptive to the proteome. An alternative scenario is that EM has arisen via the process of genetic code expansion, facilitated by the duplication of genes encoding charging enzymes and adaptor molecules. This is likely to have led to similar amino acids being assigned to similar codons. Strikingly, we show that if during code expansion the most similar amino acid to the parent amino acid, out of the set of unassigned amino acids, is assigned to codons related to those of the parent amino acid, then genetic codes with EM superior to the SGC easily arise. This scheme mimics code expansion via the gene duplication of charging enzymes and adaptors. The result is obtained for a variety of different schemes of genetic code expansion and provides a mechanistically realistic manner in which EM has arisen in the SGC. These observations might be taken as evidence for self-organization in the earliest stages of life.
Collapse
Affiliation(s)
- Steven E Massey
- Department of Biology, University of Puerto Rico - Rio Piedras, San Juan, PR 00931, USA.
| |
Collapse
|
7
|
Aggarwal N, Bandhu AV, Sengupta S. Finite population analysis of the effect of horizontal gene transfer on the origin of an universal and optimal genetic code. Phys Biol 2016; 13:036007. [DOI: 10.1088/1478-3975/13/3/036007] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
8
|
Massey SE. Genetic code evolution reveals the neutral emergence of mutational robustness, and information as an evolutionary constraint. Life (Basel) 2015; 5:1301-32. [PMID: 25919033 PMCID: PMC4500140 DOI: 10.3390/life5021301] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Revised: 04/02/2015] [Accepted: 04/03/2015] [Indexed: 01/09/2023] Open
Abstract
The standard genetic code (SGC) is central to molecular biology and its origin and evolution is a fundamental problem in evolutionary biology, the elucidation of which promises to reveal much about the origins of life. In addition, we propose that study of its origin can also reveal some fundamental and generalizable insights into mechanisms of molecular evolution, utilizing concepts from complexity theory. The first is that beneficial traits may arise by non-adaptive processes, via a process of "neutral emergence". The structure of the SGC is optimized for the property of error minimization, which reduces the deleterious impact of point mutations. Via simulation, it can be shown that genetic codes with error minimization superior to the SGC can emerge in a neutral fashion simply by a process of genetic code expansion via tRNA and aminoacyl-tRNA synthetase duplication, whereby similar amino acids are added to codons related to that of the parent amino acid. This process of neutral emergence has implications beyond that of the genetic code, as it suggests that not all beneficial traits have arisen by the direct action of natural selection; we term these "pseudaptations", and discuss a range of potential examples. Secondly, consideration of genetic code deviations (codon reassignments) reveals that these are mostly associated with a reduction in proteome size. This code malleability implies the existence of a proteomic constraint on the genetic code, proportional to the size of the proteome (P), and that its reduction in size leads to an "unfreezing" of the codon - amino acid mapping that defines the genetic code, consistent with Crick's Frozen Accident theory. The concept of a proteomic constraint may be extended to propose a general informational constraint on genetic fidelity, which may be used to explain variously, differences in mutation rates in genomes with differing proteome sizes, differences in DNA repair capacity and genome GC content between organisms, a selective pressure in the evolution of sexual reproduction, and differences in translational fidelity. Lastly, the utility of the concept of an informational constraint to other diverse fields of research is explored.
Collapse
Affiliation(s)
- Steven E Massey
- Biology Department, PO Box 23360, University of Puerto Rico-Rio Piedras, San Juan, PR 00931, USA.
| |
Collapse
|
9
|
de Oliveira LL, de Oliveira PSL, Tinós R. A multiobjective approach to the genetic code adaptability problem. BMC Bioinformatics 2015; 16:52. [PMID: 25879480 PMCID: PMC4341243 DOI: 10.1186/s12859-015-0480-9] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2014] [Accepted: 01/27/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The organization of the canonical code has intrigued researches since it was first described. If we consider all codes mapping the 64 codes into 20 amino acids and one stop codon, there are more than 1.51×10(84) possible genetic codes. The main question related to the organization of the genetic code is why exactly the canonical code was selected among this huge number of possible genetic codes. Many researchers argue that the organization of the canonical code is a product of natural selection and that the code's robustness against mutations would support this hypothesis. In order to investigate the natural selection hypothesis, some researches employ optimization algorithms to identify regions of the genetic code space where best codes, according to a given evaluation function, can be found (engineering approach). The optimization process uses only one objective to evaluate the codes, generally based on the robustness for an amino acid property. Only one objective is also employed in the statistical approach for the comparison of the canonical code with random codes. We propose a multiobjective approach where two or more objectives are considered simultaneously to evaluate the genetic codes. RESULTS In order to test our hypothesis that the multiobjective approach is useful for the analysis of the genetic code adaptability, we implemented a multiobjective optimization algorithm where two objectives are simultaneously optimized. Using as objectives the robustness against mutation with the amino acids properties polar requirement (objective 1) and robustness with respect to hydropathy index or molecular volume (objective 2), we found solutions closer to the canonical genetic code in terms of robustness, when compared with the results using only one objective reported by other authors. CONCLUSIONS Using more objectives, more optimal solutions are obtained and, as a consequence, more information can be used to investigate the adaptability of the genetic code. The multiobjective approach is also more natural, because more than one objective was adapted during the evolutionary process of the canonical genetic code. Our results suggest that the evaluation function employed to compare genetic codes should consider simultaneously more than one objective, in contrast to what has been done in the literature.
Collapse
Affiliation(s)
| | | | - Renato Tinós
- Department of Computing and Mathematics, University of São Paulo, Ribeirão Preto, Brazil.
| |
Collapse
|