1
|
Paremskaia AI, Kogan AA, Murashkina A, Naumova DA, Satish A, Abramov IS, Feoktistova SG, Mityaeva ON, Deviatkin AA, Volchkov PY. Codon-optimization in gene therapy: promises, prospects and challenges. Front Bioeng Biotechnol 2024; 12:1371596. [PMID: 38605988 PMCID: PMC11007035 DOI: 10.3389/fbioe.2024.1371596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Accepted: 03/19/2024] [Indexed: 04/13/2024] Open
Abstract
Codon optimization has evolved to enhance protein expression efficiency by exploiting the genetic code's redundancy, allowing for multiple codon options for a single amino acid. Initially observed in E. coli, optimal codon usage correlates with high gene expression, which has propelled applications expanding from basic research to biopharmaceuticals and vaccine development. The method is especially valuable for adjusting immune responses in gene therapies and has the potenial to create tissue-specific therapies. However, challenges persist, such as the risk of unintended effects on protein function and the complexity of evaluating optimization effectiveness. Despite these issues, codon optimization is crucial in advancing gene therapeutics. This study provides a comprehensive review of the current metrics for codon-optimization, and its practical usage in research and clinical applications, in the context of gene therapy.
Collapse
Affiliation(s)
- Anastasiia Iu Paremskaia
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, Moscow, Russia
| | - Anna A. Kogan
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, Moscow, Russia
| | - Anastasiia Murashkina
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, Moscow, Russia
| | - Daria A. Naumova
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, Moscow, Russia
| | - Anakha Satish
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, Moscow, Russia
| | - Ivan S. Abramov
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, Moscow, Russia
- The MCSC named after A. S. Loginov, Moscow, Russia
| | - Sofya G. Feoktistova
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, Moscow, Russia
| | - Olga N. Mityaeva
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, Moscow, Russia
| | - Andrei A. Deviatkin
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, Moscow, Russia
| | - Pavel Yu Volchkov
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, Moscow, Russia
- The MCSC named after A. S. Loginov, Moscow, Russia
| |
Collapse
|
2
|
Cuevas-Zuviría B, Adam ZR, Goldman AD, Kaçar B. Informatic Capabilities of Translation and Its Implications for the Origins of Life. J Mol Evol 2023; 91:567-569. [PMID: 37526692 DOI: 10.1007/s00239-023-10125-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Accepted: 06/22/2023] [Indexed: 08/02/2023]
Abstract
The ability to encode and convert heritable information into molecular function is a defining feature of life as we know it. The conversion of information into molecular function is performed by the translation process, in which triplets of nucleotides in a nucleic acid polymer (mRNA) encode specific amino acids in a protein polymer that folds into a three-dimensional structure. The folded protein then performs one or more molecular activities, often as one part of a complex and coordinated physiological network. Prebiotic systems, lacking the ability to explicitly translate information between genotype and phenotype, would have depended upon either chemosynthetic pathways to generate its components-constraining its complexity and evolvability- or on the ambivalence of RNA as both carrier of information and of catalytic functions-a possibility which is still supported by a very limited set of catalytic RNAs. Thus, the emergence of translation during early evolutionary history may have allowed life to unmoor from the setting of its origin. The origin of translation machinery also represents an entirely novel and distinct threshold of behavior for which there is no abiotic counterpart-it could be the only known example of computing that emerged naturally at the chemical level. Here we describe translation machinery's decoding system as the basis of cellular translation's information-processing capabilities, and the four operation types that find parallels in computer systems engineering that this biological machinery exhibits.
Collapse
Affiliation(s)
- Bruno Cuevas-Zuviría
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, USA.
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid, Madrid, Spain.
| | - Zachary R Adam
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, USA
- Department of Geosciences, University of Wisconsin-Madison, Madison, WI, USA
| | | | - Betül Kaçar
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, USA
| |
Collapse
|
3
|
Kak S. Self-similarity and the maximum entropy principle in the genetic code. Theory Biosci 2023; 142:205-210. [PMID: 37402087 DOI: 10.1007/s12064-023-00396-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Accepted: 06/16/2023] [Indexed: 07/05/2023]
Abstract
This paper addresses the relationship between information and structure of the genetic code. The code has two puzzling anomalies: First, when viewed as 64 sub-cubes of a [Formula: see text] cube, the codons for serine (S) are not contiguous, and there are amino acid codons with zero redundancy, which goes counter to the objective of error correction. To make sense of this, the paper shows that the genetic code must be viewed not only on stereochemical, co-evolution, and error-correction considerations, but also on two additional factors of significance to natural systems, that of an information-theoretic dimensionality of the code data, and the principle of maximum entropy. One implication of non-integer dimensionality associated with data dimensions is self-similarity to different scales, and it is shown that the genetic code does satisfy this property, and it is further shown that the maximum entropy principle operates through the scrambling of the elements in the sense of maximum algorithmic information complexity, generated by an appropriate exponentiation mapping. It is shown that the new considerations and the use of maximum entropy transformation create new constraints that are likely the reasons for the non-uniform codon groups and codons with no redundancy.
Collapse
Affiliation(s)
- Subhash Kak
- Chapman University, Orange, CA, 92866, USA.
- Oklahoma State University, Stillwater, OK, 74078, USA.
| |
Collapse
|
4
|
Omachi Y, Saito N, Furusawa C. Rare-event sampling analysis uncovers the fitness landscape of the genetic code. PLoS Comput Biol 2023; 19:e1011034. [PMID: 37068098 PMCID: PMC10138212 DOI: 10.1371/journal.pcbi.1011034] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 04/27/2023] [Accepted: 03/16/2023] [Indexed: 04/18/2023] Open
Abstract
The genetic code refers to a rule that maps 64 codons to 20 amino acids. Nearly all organisms, with few exceptions, share the same genetic code, the standard genetic code (SGC). While it remains unclear why this universal code has arisen and been maintained during evolution, it may have been preserved under selection pressure. Theoretical studies comparing the SGC and numerically created hypothetical random genetic codes have suggested that the SGC has been subject to strong selection pressure for being robust against translation errors. However, these prior studies have searched for random genetic codes in only a small subspace of the possible code space due to limitations in computation time. Thus, how the genetic code has evolved, and the characteristics of the genetic code fitness landscape, remain unclear. By applying multicanonical Monte Carlo, an efficient rare-event sampling method, we efficiently sampled random codes from a much broader random ensemble of genetic codes than in previous studies, estimating that only one out of every 1020 random codes is more robust than the SGC. This estimate is significantly smaller than the previous estimate, one in a million. We also characterized the fitness landscape of the genetic code that has four major fitness peaks, one of which includes the SGC. Furthermore, genetic algorithm analysis revealed that evolution under such a multi-peaked fitness landscape could be strongly biased toward a narrow peak, in an evolutionary path-dependent manner.
Collapse
Affiliation(s)
- Yuji Omachi
- Graduate School of Sciences, The University of Tokyo, Hongo, Tokyo, Japan
| | - Nen Saito
- Graduate School of Integrated Sciences for Life, Hiroshima University, Higashi-Hiroshima City, Hiroshima, Japan
- Exploratory Research Center on Life and Living Systems, National Institutes of Natural Sciences, Okazaki, Aichi, Japan
- Universal Biology Institute, The University of Tokyo, Hongo, Tokyo, Japan
| | - Chikara Furusawa
- Graduate School of Sciences, The University of Tokyo, Hongo, Tokyo, Japan
- Universal Biology Institute, The University of Tokyo, Hongo, Tokyo, Japan
- Center for Biosystems Dynamics Research, RIKEN, Suita, Osaka, Japan
| |
Collapse
|
5
|
Tsz Long Wong D, Norman H, Creedy TJ, Jordaens K, Moran KM, Young A, Mengual X, Skevington JH, Vogler AP. The phylogeny and evolutionary ecology of hoverflies (Diptera: Syrphidae) inferred from mitochondrial genomes. Mol Phylogenet Evol 2023; 184:107759. [PMID: 36921697 DOI: 10.1016/j.ympev.2023.107759] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 03/01/2023] [Accepted: 03/08/2023] [Indexed: 03/16/2023]
Abstract
Hoverflies (Diptera: Syrphidae) are a diverse group of pollinators and a major research focus in ecology, but their phylogenetic relationships remain incompletely known. Using a genome skimming approach we generated mitochondrial genomes for 91 species, capturing a wide taxonomic diversity of the family. To reduce the required amount of input DNA and overall cost of the library construction, sequencing and assembly was conducted on mixtures of specimens, which raises the problem of chimera formation of mitogenomes. We present a novel chimera detection test based on gene tree incongruence, but identified only a single mitogenome of chimeric origin. Together with existing data for a final set of 127 taxa, phylogenetic analysis on nucleotide and amino acid sequences using Maximum Likelihood and Bayesian Inference revealed a basal split of Microdontinae from all other syrphids. The remainder consists of several deep clades assigned to the subfamily Eristalinae in the current classification, including a clade comprising the subfamily Syrphinae (plus Pipizinae). These findings call for a re-definition of subfamilies, but basal nodes had insufficient support to allow such action. Molecular-clock dating placed the origin of the Syrphidae crown group in the mid-Cretaceous while the Eristalinae-Syrphinae clade likely originated near the K/Pg boundary. Transformation of larval life history characters on the tree suggests that Syrphidae initially had sap feeding larvae, which diversified greatly in diet and habitat association during the Eocene and Oligocene, coinciding with the diversification of angiosperms and the evolution of various insect groups used as larval host, prey, or mimicry models. Mitogenomes proved to be a powerful phylogenetic marker for studies of Syrphidae at subfamily and tribe levels, allowing dense taxon sampling that provided insight into the great ecological diversity and rapid evolution of larval life history traits of the hoverflies.
Collapse
Affiliation(s)
- Daniel Tsz Long Wong
- Department of Life Sciences, Imperial College London, Exhibition Road, London, SW7 2BX, U.K; Department of Life Sciences, Natural History Museum, Cromwell Road, London, SW7 5BD, U.K.
| | - Hannah Norman
- Department of Life Sciences, Imperial College London, Exhibition Road, London, SW7 2BX, U.K; Department of Life Sciences, Natural History Museum, Cromwell Road, London, SW7 5BD, U.K.
| | - Thomas J Creedy
- Department of Life Sciences, Imperial College London, Exhibition Road, London, SW7 2BX, U.K; Department of Life Sciences, Natural History Museum, Cromwell Road, London, SW7 5BD, U.K.
| | - Kurt Jordaens
- Department of Biology-Invertebrates Unit, Royal Museum for Central Africa, Joint Experimental Molecular Unit Leuvensesteenweg 13, B-3080 Tervuren, Belgium.
| | - Kevin M Moran
- Canadian National Collection of Insects, Arachnids and Nematodes, Agriculture and Agri-Food Canada, K.W. Neatby Building, 960 Carling Avenue, Ottawa, Ontario, ON K1A 0C6, Canada; Department of Biology, Carleton University, 1125 Colonel By Drive, Ottawa, Ontario, ON K1S 5B6, Canada.
| | - Andrew Young
- School of Environmental Sciences, University of Guelph, Guelph, Ontario, ON N1G 2W1, Canada.
| | - Ximo Mengual
- Zoologisches Forschungsmuseum Alexander Koenig, Leibniz Institute for the Analysis of Biodiversity Change, Adenauerallee 127, 53113 Bonn, Germany.
| | - Jeffrey H Skevington
- Canadian National Collection of Insects, Arachnids and Nematodes, Agriculture and Agri-Food Canada, K.W. Neatby Building, 960 Carling Avenue, Ottawa, Ontario, ON K1A 0C6, Canada; Department of Biology, Carleton University, 1125 Colonel By Drive, Ottawa, Ontario, ON K1S 5B6, Canada.
| | - Alfried P Vogler
- Department of Life Sciences, Imperial College London, Exhibition Road, London, SW7 2BX, U.K; Department of Life Sciences, Natural History Museum, Cromwell Road, London, SW7 5BD, U.K.
| |
Collapse
|
6
|
Yarus M. A crescendo of competent coding (c3) contains the Standard Genetic Code. RNA (NEW YORK, N.Y.) 2022; 28:1337-1347. [PMID: 35868841 PMCID: PMC9479743 DOI: 10.1261/rna.079275.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 07/18/2022] [Indexed: 06/15/2023]
Abstract
The Standard Genetic Code (SGC) can arise by fusion of partial codes evolved in different individuals, perhaps for differing prior tasks. Such code fragments can be unified into an SGC after later evolution of accurate third-position Crick wobble. Late wobble advent fills in the coding table, leaving only later development of translational initiation and termination to reach the SGC in separated domains of life. This code fusion mechanism is computationally implemented here. Late Crick wobble after C3 fusion (c3-lCw) is tested for its ability to evolve the SGC. Compared with previously studied isolated coding tables, or with increasing numbers of parallel, but nonfusing codes, c3-lCw reaches the SGC sooner, is successful in a smaller population, and presents accurate and complete codes more frequently. Notably, a long crescendo of SGC-like codes is exposed for selection of superior translation. c3-lCw also effectively suppresses varied disordered assignments, thus converging on a unified code. Such merged codes closely approach the SGC, making its selection plausible. For example: Under routine conditions, ≈1 of 22 c3-lCw environments evolves codes with ≥20 assignments and ≤3 differences from the SGC, notably including codes identical to the Standard Genetic Code.
Collapse
Affiliation(s)
- Michael Yarus
- Department of Molecular, Cellular and Developmental Biology, University of Colorado, Boulder, Colorado 80309-0347, USA
| |
Collapse
|
7
|
Wang X, Dong Q, Chen G, Zhang J, Liu Y, Cai Y. Frameshift and wild-type proteins are often highly similar because the genetic code and genomes were optimized for frameshift tolerance. BMC Genomics 2022; 23:416. [PMID: 35655139 PMCID: PMC9164415 DOI: 10.1186/s12864-022-08435-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Accepted: 03/02/2022] [Indexed: 11/10/2022] Open
Abstract
Frameshift mutations have been considered of significant importance for the molecular evolution of proteins and their coding genes, while frameshift protein sequences encoded in the alternative reading frames of coding genes have been considered to be meaningless. However, functional frameshifts have been found widely existing. It was puzzling how a frameshift protein kept its structure and functionality while substantial changes occurred in its primary amino-acid sequence. This study shows that the similarities among frameshifts and wild types are higher than random similarities and are determined at different levels. Frameshift substitutions are more conservative than random substitutions in the standard genetic code (SGC). The frameshift substitutions score of SGC ranks in the top 2.0-3.5% of alternative genetic codes, showing that SGC is nearly optimal for frameshift tolerance. In many genes and certain genomes, frameshift-resistant codons and codon pairs appear more frequently than expected, suggesting that frameshift tolerance is achieved through not only the optimality of the genetic code but, more importantly, the further optimization of a specific gene or genome through the usages of codons/codon pairs, which sheds light on the role of frameshift mutations in molecular and genomic evolution.
Collapse
Affiliation(s)
- Xiaolong Wang
- Department of Biotechnology, College of Marine Life Sciences, Ocean University of China, No. 5 Yushan Road, Shandong, Qingdao, 266003, P. R. China.
| | - Quanjiang Dong
- Qingdao Municipal Hospital, Qingdao, Shandong, 266003, P. R. China
| | - Gang Chen
- Department of Biotechnology, College of Marine Life Sciences, Ocean University of China, No. 5 Yushan Road, Shandong, Qingdao, 266003, P. R. China
| | - Jianye Zhang
- Department of Biotechnology, College of Marine Life Sciences, Ocean University of China, No. 5 Yushan Road, Shandong, Qingdao, 266003, P. R. China
| | - Yongqiang Liu
- Department of Biotechnology, College of Marine Life Sciences, Ocean University of China, No. 5 Yushan Road, Shandong, Qingdao, 266003, P. R. China
| | - Yujia Cai
- Department of Biotechnology, College of Marine Life Sciences, Ocean University of China, No. 5 Yushan Road, Shandong, Qingdao, 266003, P. R. China
| |
Collapse
|
8
|
Model of Genetic Code Structure Evolution under Various Types of Codon Reading. Int J Mol Sci 2022; 23:ijms23031690. [PMID: 35163612 PMCID: PMC8835785 DOI: 10.3390/ijms23031690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Revised: 01/23/2022] [Accepted: 01/25/2022] [Indexed: 11/28/2022] Open
Abstract
The standard genetic code (SGC) is a set of rules according to which 64 codons are assigned to 20 canonical amino acids and stop coding signal. As a consequence, the SGC is redundant because there is a greater number of codons than the number of encoded labels. This redundancy implies the existence of codons that encode the same genetic information. The size and organization of such synonymous codon blocks are important characteristics of the SGC structure whose evolution is still unclear. Therefore, we studied possible evolutionary mechanisms of the codon block structure. We conducted computer simulations assuming that coding systems at early stages of the SGC evolution were sets of ambiguous codon assignments with high entropy. We included three types of reading systems characterized by different inaccuracy and pattern of codon recognition. In contrast to the previous study, we allowed for evolution of the reading systems and their competition. The simulations performed under minimization of translational errors and reduction of coding ambiguity produced the coding system resistant to these errors. The reading system similar to that present in the SGC dominated the others very quickly. The survived system was also characterized by low entropy and possessed properties similar to that in the SGC. Our simulation show that the unambiguous SGC could emerged from a code with a lower level of ambiguity and the number of tRNAs increased during the evolution.
Collapse
|
9
|
Average and Standard Deviation of the Error Function for Random Genetic Codes with Standard Stop Codons. Acta Biotheor 2021; 70:7. [PMID: 34919168 DOI: 10.1007/s10441-021-09427-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Accepted: 09/27/2021] [Indexed: 10/19/2022]
Abstract
The origin of the genetic code has been attributed in part to an accidental assignment of codons to amino acids. Although several lines of evidence indicate the subsequent expansion and improvement of the genetic code, the hypothesis of Francis Crick concerning a frozen accident occurring at the early stage of genetic code evolution is still widely accepted. Considering Crick's hypothesis, mathematical descriptions of hypothetical scenarios involving a huge number of possible coexisting random genetic codes could be very important to explain the origin and evolution of a selected genetic code. This work aims to contribute in this regard, that is, it provides a theoretical framework in which statistical parameters of error functions are calculated. Given a genetic code and an amino acid property, the functional code robustness is estimated by means of a known error function. In this work, using analytical calculations, general expressions for the average and standard deviation of the error function distributions of completely random codes with standard stop codons were obtained. As a possible biological application of these results, any set of amino acids and any pure or mixed amino acid properties can be used in the calculations, such that, in case of having to select a set of amino acids to create a genetic code, possible advantages of natural selection of the genetic codes could be discussed.
Collapse
|
10
|
Fimmel E, Gumbel M, Starman M, Strüngmann L. Computational Analysis of Genetic Code Variations Optimized for the Robustness against Point Mutations with Wobble-like Effects. Life (Basel) 2021; 11:1338. [PMID: 34947869 PMCID: PMC8707135 DOI: 10.3390/life11121338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Revised: 11/26/2021] [Accepted: 11/27/2021] [Indexed: 11/17/2022] Open
Abstract
It is believed that the codon-amino acid assignments of the standard genetic code (SGC) help to minimize the negative effects caused by point mutations. All possible point mutations of the genetic code can be represented as a weighted graph with weights that correspond to the probabilities of these mutations. The robustness of a code against point mutations can be described then by means of the so-called conductance measure. This paper quantifies the wobble effect, which was investigated previously by applying the weighted graph approach, and seeks optimal weights using an evolutionary optimization algorithm to maximize the code's robustness. One result of our study is that the robustness of the genetic code is least influenced by mutations in the third position-like with the wobble effect. Moreover, the results clearly demonstrate that point mutations in the first, and even more importantly, in the second base of a codon have a very large influence on the robustness of the genetic code. These results were compared to single nucleotide variants (SNV) in coding sequences which support our findings. Additionally, it was analyzed which structure of a genetic code evolves from random code tables when the robustness is maximized. Our calculations show that the resulting code tables are very close to the standard genetic code. In conclusion, the results illustrate that the robustness against point mutations seems to be an important factor in the evolution of the standard genetic code.
Collapse
|
11
|
Pawlak K, Wnetrzak M, Mackiewicz D, Mackiewicz P, Błażej P. Models of genetic code structure evolution with variable number of coded labels. Biosystems 2021; 210:104528. [PMID: 34492316 DOI: 10.1016/j.biosystems.2021.104528] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Revised: 08/26/2021] [Accepted: 08/27/2021] [Indexed: 10/20/2022]
Abstract
It is assumed that at the early stage of cell evolution its translation machinery was characterized by high noise, i.e. ambiguous assignment of codons to amino acids in the genetic code, which initially encoded only few amino acids. Next, during its evolution new amino acids were added to this code. Taking into account this facts, we investigated theoretical models of genetic code's structure, which evolved from a set of ambiguous codons assignments into a coding system with a low level of uncertainty. We considered three types of translational inaccuracies assuming a different number of fixed codon positions. We applied a modified version of evolutionary algorithm for finding the genetic codes that the most effectively reduced the initial uncertainty in the assignment of codons to encoded labels, i.e. amino acids and a stop translation signal. We examined codes with the number of labels from four to 22. Our results indicated that the quality of genetic code structure is strongly dependent on the number of encoded labels as well as the type of translational mechanism. The more strict assignments of codon to the labels was preferred by the codes encoding more number of labels. The results showed that a smaller degeneracy of codes evolved from a more tolerant coding with the stepwise addition of coded amino acids to the genetic code. The distribution of codon groups in the standard genetic code corresponds well to the translation model assuming two fixed codon positions, whereas the six-codon groups can be relics form previous stages of evolution when the code characterized by a greater uncertainty.
Collapse
Affiliation(s)
- Konrad Pawlak
- Department of Bioinformatics and Genomics, Faculty of Biotechnology, University of Wrocław, ul. Joliot-Curie 14a, Wrocław, Poland
| | - Małgorzata Wnetrzak
- Department of Bioinformatics and Genomics, Faculty of Biotechnology, University of Wrocław, ul. Joliot-Curie 14a, Wrocław, Poland
| | - Dorota Mackiewicz
- Department of Bioinformatics and Genomics, Faculty of Biotechnology, University of Wrocław, ul. Joliot-Curie 14a, Wrocław, Poland
| | - Paweł Mackiewicz
- Department of Bioinformatics and Genomics, Faculty of Biotechnology, University of Wrocław, ul. Joliot-Curie 14a, Wrocław, Poland
| | - Paweł Błażej
- Department of Bioinformatics and Genomics, Faculty of Biotechnology, University of Wrocław, ul. Joliot-Curie 14a, Wrocław, Poland.
| |
Collapse
|
12
|
Nowak K, Błażej P, Wnetrzak M, Mackiewicz D, Mackiewicz P. Some theoretical aspects of reprogramming the standard genetic code. Genetics 2021; 218:6169163. [PMID: 33711098 PMCID: PMC8128387 DOI: 10.1093/genetics/iyab040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Accepted: 02/11/2021] [Indexed: 11/12/2022] Open
Abstract
Reprogramming of the standard genetic code to include non-canonical amino acids (ncAAs) opens new prospects for medicine, industry, and biotechnology. There are several methods of code engineering, which allow us for storing new genetic information in DNA sequences and producing proteins with new properties. Here, we provided a theoretical background for the optimal genetic code expansion, which may find application in the experimental design of the genetic code. We assumed that the expanded genetic code includes both canonical and non-canonical information stored in 64 classical codons. What is more, the new coding system is robust to point mutations and minimizes the possibility of reversion from the new to old information. In order to find such codes, we applied graph theory to analyze the properties of optimal codon sets. We presented the formal procedure in finding the optimal codes with various number of vacant codons that could be assigned to new amino acids. Finally, we discussed the optimal number of the newly incorporated ncAAs and also the optimal size of codon groups that can be assigned to ncAAs.
Collapse
Affiliation(s)
- Kuba Nowak
- Faculty of Mathematics and Computer Science, University of Wrocław, ul. F. Joliot-Curie 15, 50-383 Wrocław, Poland
| | - Paweł Błażej
- Department of Bioinformatics and Genomics, Faculty of Biotechnology, University of Wrocław, ul F. Joliot-Curie 14a, 50-383 Wrocław, Poland
| | - Małgorzata Wnetrzak
- Department of Bioinformatics and Genomics, Faculty of Biotechnology, University of Wrocław, ul F. Joliot-Curie 14a, 50-383 Wrocław, Poland
| | - Dorota Mackiewicz
- Department of Bioinformatics and Genomics, Faculty of Biotechnology, University of Wrocław, ul F. Joliot-Curie 14a, 50-383 Wrocław, Poland
| | - Paweł Mackiewicz
- Department of Bioinformatics and Genomics, Faculty of Biotechnology, University of Wrocław, ul F. Joliot-Curie 14a, 50-383 Wrocław, Poland
| |
Collapse
|
13
|
Demongeot J, Moreira A, Seligmann H. Negative CG dinucleotide bias: An explanation based on feedback loops between Arginine codon assignments and theoretical minimal RNA rings. Bioessays 2020; 43:e2000071. [PMID: 33319381 DOI: 10.1002/bies.202000071] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Revised: 11/23/2020] [Accepted: 11/26/2020] [Indexed: 01/05/2023]
Abstract
Theoretical minimal RNA rings are candidate primordial genes evolved for non-redundant coding of the genetic code's 22 coding signals (one codon per biogenic amino acid, a start and a stop codon) over the shortest possible length: 29520 22-nucleotide-long RNA rings solve this min-max constraint. Numerous RNA ring properties are reminiscent of natural genes. Here we present analyses showing that all RNA rings lack dinucleotide CG (a mutable, chemically instable dinucleotide coding for Arginine), bearing a resemblance to known CG-depleted genomes. CG in "incomplete" RNA rings (not coding for all coding signals, with only 3-12 nucleotides) gradually decreases towards CG absence in complete, 22-nucleotide-long RNA rings. Presumably, feedback loops during RNA ring growth during evolution (when amino acid assignment fixed the genetic code) assigned Arg to codons lacking CG (AGR) to avoid CG. Hence, as a chemical property of base pairs, CG mutability restructured the genetic code, thereby establishing itself as genetically encoded biological information.
Collapse
Affiliation(s)
- Jacques Demongeot
- Laboratory AGEIS EA 7407, Team Tools for e-Gnosis Medical & Labcom CNRS/UGA/OrangeLabs Telecom4Health, Faculty of Medicine, Université Grenoble Alpes, La Tronche, France
| | - Andrés Moreira
- Departamento de Informática, Universidad Técnica Federico Santa María, Santiago, Chile
| | - Hervé Seligmann
- Laboratory AGEIS EA 7407, Team Tools for e-Gnosis Medical & Labcom CNRS/UGA/OrangeLabs Telecom4Health, Faculty of Medicine, Université Grenoble Alpes, La Tronche, France.,The National Natural History Collections, The Hebrew University of Jerusalem, Jerusalem, Israel.,Institute of Microstructure Technology, Karlsruhe Institute of Technology (KIT), Eggenstein-Leopoldshafen, Germany
| |
Collapse
|
14
|
Evolution of Life on Earth: tRNA, Aminoacyl-tRNA Synthetases and the Genetic Code. Life (Basel) 2020; 10:life10030021. [PMID: 32131473 PMCID: PMC7151597 DOI: 10.3390/life10030021] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2020] [Revised: 02/13/2020] [Accepted: 02/27/2020] [Indexed: 02/07/2023] Open
Abstract
Life on Earth and the genetic code evolved around tRNA and the tRNA anticodon. We posit that the genetic code initially evolved to synthesize polyglycine as a cross-linking agent to stabilize protocells. We posit that the initial amino acids to enter the code occupied larger sectors of the code that were then invaded by incoming amino acids. Displacements of amino acids follow selection rules. The code sectored from a glycine code to a four amino acid code to an eight amino acid code to an ~16 amino acid code to the standard 20 amino acid code with stops. The proposed patterns of code sectoring are now most apparent from patterns of aminoacyl-tRNA synthetase evolution. The Elongation Factor-Tu GTPase anticodon-codon latch that checks the accuracy of translation appears to have evolved at about the eight amino acid to ~16 amino acid stage. Before evolution of the EF-Tu latch, we posit that both the 1st and 3rd anticodon positions were wobble positions. The genetic code evolved via tRNA charging errors and via enzymatic modifications of amino acids joined to tRNAs, followed by tRNA and aminoacyl-tRNA synthetase differentiation. Fidelity mechanisms froze the code by inhibiting further innovation.
Collapse
|
15
|
Błażej P, Wnetrzak M, Mackiewicz D, Mackiewicz P. Basic principles of the genetic code extension. ROYAL SOCIETY OPEN SCIENCE 2020; 7:191384. [PMID: 32257313 PMCID: PMC7062095 DOI: 10.1098/rsos.191384] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Accepted: 01/09/2020] [Indexed: 05/08/2023]
Abstract
Compounds including non-canonical amino acids (ncAAs) or other artificially designed molecules can find a lot of applications in medicine, industry and biotechnology. They can be produced thanks to the modification or extension of the standard genetic code (SGC). Such peptides or proteins including the ncAAs can be constantly delivered in a stable way by organisms with the customized genetic code. Among several methods of engineering the code, using non-canonical base pairs is especially promising, because it enables generating many new codons, which can be used to encode any new amino acid. Since even one pair of new bases can extend the SGC up to 216 codons generated by a six-letter nucleotide alphabet, the extension of the SGC can be achieved in many ways. Here, we proposed a stepwise procedure of the SGC extension with one pair of non-canonical bases to minimize the consequences of point mutations. We reported relationships between codons in the framework of graph theory. All 216 codons were represented as nodes of the graph, whereas its edges were induced by all possible single nucleotide mutations occurring between codons. Therefore, every set of canonical and newly added codons induces a specific subgraph. We characterized the properties of the induced subgraphs generated by selected sets of codons. Thanks to that, we were able to describe a procedure for incremental addition of the set of meaningful codons up to the full coding system consisting of three pairs of bases. The procedure of gradual extension of the SGC makes the whole system robust to changing genetic information due to mutations and is compatible with the views assuming that codons and amino acids were added successively to the primordial SGC, which evolved minimizing harmful consequences of mutations or mistranslations of encoded proteins.
Collapse
Affiliation(s)
- Paweł Błażej
- Department of Bioinformatics and Genomics, Faculty of Biotechnology, University of Wrocław, ul. Joliot-Curie 14a, Wrocław, Poland
| | | | | | | |
Collapse
|
16
|
Determining amino acid scores of the genetic code table: Complementarity, structure, function and evolution. Biosystems 2020; 187:104026. [DOI: 10.1016/j.biosystems.2019.104026] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Accepted: 08/28/2019] [Indexed: 11/22/2022]
|
17
|
Ismail SNFB, Baharum SN, Fazry S, Low CF. Comparative genome analysis reveals a distinct influence of nucleotide composition on virus-host species-specific interaction of prawn-infecting nodavirus. JOURNAL OF FISH DISEASES 2019; 42:1761-1772. [PMID: 31637743 DOI: 10.1111/jfd.13093] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/26/2019] [Revised: 08/21/2019] [Accepted: 08/26/2019] [Indexed: 06/10/2023]
Abstract
Discovery of species-specific interaction between the host and virus has drawn the interest of many researchers to study the evolution of the newly emerged virus. Comparative genome analysis provides insights of the virus functional genome evolution and the underlying mechanisms of virus-host interactions. The analysis of nucleotide composition signified the evolution of nodavirus towards host specialization in a host-specific mutation manner. GC-rich genome of betanodavirus was significantly deficient in UpA and UpU dinucleotides composition, whilst the AU-rich genome of gammanodavirus was deficient in CpG dinucleotide. The capsid of MrNV and PvNV of gammanodavirus retains the highest abundance of adenine and uracil at the second codon position, respectively, which were found to be very distinctive from the other genera. ENC-GC3 plot inferred the influence of natural selection and mutational pressure in shaping the evolution of MrNV RdRp and capsid, respectively. Furthermore, CAI/eCAI analysis predicts a comparable adaptability of MrNV in squid, Sepia officinalis than its natural host, Macrobrachium rosenbergii. Thus, further study is warranted to investigate the capacity of MrNV replication in S. officinalis owing to its high codon adaptation index.
Collapse
Affiliation(s)
| | | | - Shazrul Fazry
- Tasik Chini Research Center, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Selangor, Bangi, Malaysia
| | - Chen Fei Low
- Institute of Systems Biology, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia
| |
Collapse
|
18
|
Genetic codes optimized as a traveling salesman problem. PLoS One 2019; 14:e0224552. [PMID: 31658301 PMCID: PMC6816573 DOI: 10.1371/journal.pone.0224552] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Accepted: 10/16/2019] [Indexed: 11/19/2022] Open
Abstract
The Standard Genetic Code (SGC) is robust to mutational errors such that frequently occurring mutations minimally alter the physio-chemistry of amino acids. The apparent correlation between the evolutionary distances among codons and the physio-chemical distances among their cognate amino acids suggests an early co-diversification between the codons and amino acids. Here we formulated the co-minimization of evolutionary distances between codons and physio-chemical distances between amino acids as a Traveling Salesman Problem (TSP) and solved it with a Hopfield neural network. In this unsupervised learning algorithm, macromolecules (e.g., tRNAs and aminoacyl-tRNA synthetases) associating codons with amino acids were considered biological analogs of Hopfield neurons associating "tour cities" with "tour positions". The Hopfield network efficiently yielded an abundance of genetic codes that were more error-minimizing than SGC and could thus be used to design artificial genetic codes. We further argue that as a self-optimization algorithm, the Hopfield neural network provides a model of origin of SGC and other adaptive molecular systems through evolutionary learning.
Collapse
|
19
|
The Quality of Genetic Code Models in Terms of Their Robustness Against Point Mutations. Bull Math Biol 2019; 81:2239-2257. [DOI: 10.1007/s11538-019-00603-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Accepted: 03/25/2019] [Indexed: 11/29/2022]
|
20
|
BłaŻej P, Wnetrzak M, Mackiewicz D, Mackiewicz P. The influence of different types of translational inaccuracies on the genetic code structure. BMC Bioinformatics 2019; 20:114. [PMID: 30841864 PMCID: PMC6404327 DOI: 10.1186/s12859-019-2661-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2018] [Accepted: 01/29/2019] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND The standard genetic code is a recipe for assigning unambiguously 21 labels, i.e. amino acids and stop translation signal, to 64 codons. However, at early stages of the translational machinery development, the codons did not have to be read unambiguously and the early genetic codes could have contained some ambiguous assignments of codons to amino acids. Therefore, the goal of this work was to obtain the genetic code structures which could have evolved assuming different types of inaccuracy of the translational machinery starting from unambiguous assignments of codons to amino acids. RESULTS We developed a theoretical model assuming that the level of uncertainty of codon assignments can gradually decrease during the simulations. Since it is postulated that the standard code has evolved to be robust against point mutations and mistranslations, we developed three simulation scenarios assuming that such errors can influence one, two or three codon positions. The simulated codes were selected using the evolutionary algorithm methodology to decrease coding ambiguity and increase their robustness against mistranslation. CONCLUSIONS The results indicate that the typical codon block structure of the genetic code could have evolved to decrease the ambiguity of amino acid to codon assignments and to increase the fidelity of reading the genetic information. However, the robustness to errors was not the decisive factor that influenced the genetic code evolution because it is possible to find theoretical codes that minimize the reading errors better than the standard genetic code.
Collapse
Affiliation(s)
- Paweł BłaŻej
- Department of Genomics, University of Wrocław, ul. Joliot-Curie 14a, Wrocław, 50-383 Poland
| | - Małgorzata Wnetrzak
- Department of Genomics, University of Wrocław, ul. Joliot-Curie 14a, Wrocław, 50-383 Poland
| | - Dorota Mackiewicz
- Department of Genomics, University of Wrocław, ul. Joliot-Curie 14a, Wrocław, 50-383 Poland
| | - Paweł Mackiewicz
- Department of Genomics, University of Wrocław, ul. Joliot-Curie 14a, Wrocław, 50-383 Poland
| |
Collapse
|
21
|
Wnętrzak M, Błażej P, Mackiewicz D, Mackiewicz P. The optimality of the standard genetic code assessed by an eight-objective evolutionary algorithm. BMC Evol Biol 2018; 18:192. [PMID: 30545289 PMCID: PMC6293558 DOI: 10.1186/s12862-018-1304-0] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Accepted: 11/22/2018] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND The standard genetic code (SGC) is a unique set of rules which assign amino acids to codons. Similar amino acids tend to have similar codons indicating that the code evolved to minimize the costs of amino acid replacements in proteins, caused by mutations or translational errors. However, if such optimization in fact occurred, many different properties of amino acids must have been taken into account during the code evolution. Therefore, this problem can be reformulated as a multi-objective optimization task, in which the selection constraints are represented by measures based on various amino acid properties. RESULTS To study the optimality of the SGC we applied a multi-objective evolutionary algorithm and we used the representatives of eight clusters, which grouped over 500 indices describing various physicochemical properties of amino acids. Thanks to that we avoided an arbitrary choice of amino acid features as optimization criteria. As a consequence, we were able to conduct a more general study on the properties of the SGC than the ones presented so far in other papers on this topic. We considered two models of the genetic code, one preserving the characteristic codon blocks structure of the SGC and the other without this restriction. The results revealed that the SGC could be significantly improved in terms of error minimization, hereby it is not fully optimized. Its structure differs significantly from the structure of the codes optimized to minimize the costs of amino acid replacements. On the other hand, using newly defined quality measures that placed the SGC in the global space of theoretical genetic codes, we showed that the SGC is definitely closer to the codes that minimize the costs of amino acids replacements than those maximizing them. CONCLUSIONS The standard genetic code represents most likely only partially optimized systems, which emerged under the influence of many different factors. Our findings can be useful to researchers involved in modifying the genetic code of the living organisms and designing artificial ones.
Collapse
Affiliation(s)
- Małgorzata Wnętrzak
- Department of Genomics, Faculty of Biotechnology, University of Wrocław, ul. Joliot-Curie 14a, 50-383, Wrocław, Poland
| | - Paweł Błażej
- Department of Genomics, Faculty of Biotechnology, University of Wrocław, ul. Joliot-Curie 14a, 50-383, Wrocław, Poland
| | - Dorota Mackiewicz
- Department of Genomics, Faculty of Biotechnology, University of Wrocław, ul. Joliot-Curie 14a, 50-383, Wrocław, Poland
| | - Paweł Mackiewicz
- Department of Genomics, Faculty of Biotechnology, University of Wrocław, ul. Joliot-Curie 14a, 50-383, Wrocław, Poland.
| |
Collapse
|
22
|
Błażej P, Wnętrzak M, Mackiewicz D, Mackiewicz P. Correction: Optimization of the standard genetic code according to three codon positions using an evolutionary algorithm. PLoS One 2018; 13:e0205450. [PMID: 30286199 PMCID: PMC6171936 DOI: 10.1371/journal.pone.0205450] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|