1
|
Fimmel E, Michel CJ, Strüngmann L. Circular cut codes in genetic information. Biosystems 2024; 243:105263. [PMID: 38971553 DOI: 10.1016/j.biosystems.2024.105263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Revised: 06/30/2024] [Accepted: 06/30/2024] [Indexed: 07/08/2024]
Abstract
In this work we present an analysis of the dinucleotide occurrences in the three codon sites 1-2, 2-3 and 1-3, based on a computation of the codon usage of three large sets of bacterial, archaeal and eukaryotic genes using the same method that identified a maximal C3 self-complementary trinucleotide circular code X in genes of bacteria and eukaryotes in 1996 (Arquès and Michel, 1996). Surprisingly, two dinucleotide circular codes are identified in the codon sites 1-2 and 2-3. Furthermore, these two codes are shifted versions of each other. Moreover, the dinucleotide code in the codon site 1-3 is circular, self-complementary and contained in the projection of X onto the 1st and 3rd bases, i.e. by cutting the middle base in each codon of X. We prove several results showing that the circularity and the self-complementarity of trinucleotide codes is induced by the circularity and the self-complementarity of its dinucleotide cut codes. Finally, we present several evolutionary approaches for an emergence of trinucleotide codes from dinucleotide codes.
Collapse
Affiliation(s)
- Elena Fimmel
- Institute of Mathematical Biology, Faculty for Computer Sciences, Mannheim University of Applied Sciences, 68163 Mannheim, Germany.
| | - Christian J Michel
- Theoretical bioinformatics, ICube, University of Strasbourg, C.N.R.S., 300 Boulevard Sébastien Brant, 67400 Illkirch, France.
| | - Lutz Strüngmann
- Institute of Mathematical Biology, Faculty for Computer Sciences, Mannheim University of Applied Sciences, 68163 Mannheim, Germany.
| |
Collapse
|
2
|
Michel CJ. Circular code in introns. Biosystems 2024; 239:105215. [PMID: 38641199 DOI: 10.1016/j.biosystems.2024.105215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 04/14/2024] [Accepted: 04/15/2024] [Indexed: 04/21/2024]
Abstract
A massive statistical analysis based on the autocorrelation function of the circular code X observed in genes is performed on the (eukaryotic) introns. Surprisingly, a circular code periodicity 0 modulo 3 is identified in 5 groups of introns: birds, ascomycetes, basidiomycetes, green algae and land plants. This circular code periodicity, which is a property of retrieving the reading frame in (protein coding) genes, may suggest that these introns have a coding property. In a well-known way, a periodicity 1 modulo 2 is observed in 6 groups of introns: amphibians, fishes, mammals, other animals, reptiles and apicomplexans. A mixed periodicity modulo 2 and 3 is found in the introns of insects. Astonishing, a subperiodicity 3 modulo 6 is a common statistical property in these 3 classes of introns. When the particular trinucleotides N1N2N1 of the circular code X are not considered, the circular code periodicity 0 modulo 3, hidden by the periodicity 1 modulo 2, is now retrieved in 5 groups of introns: amphibians, fishes, other animals, reptiles and insects. Thus, 10 groups of introns, taxonomically different, out of 12 have a coding property related to the reading frame retrieval. The trinucleotides N1N2N1 are analysed in the 216 maximal C3 self-complementary trinucleotide circular codes. A hexanucleotide code (words of 6 letters) is proposed to explain the periodicity 3 modulo 6. It could be a trace of more general circular codes at the origin of the circular code X.
Collapse
Affiliation(s)
- Christian J Michel
- Theoretical Bioinformatics, ICube, C.N.R.S., University of Strasbourg, 300 Boulevard Sébastien Brant, 67400 Illkirch, France.
| |
Collapse
|
3
|
Fimmel E, Strüngmann L. The spiderweb of error-detecting codes in the genetic information. Biosystems 2023; 233:105009. [PMID: 37640191 DOI: 10.1016/j.biosystems.2023.105009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 08/21/2023] [Accepted: 08/21/2023] [Indexed: 08/31/2023]
Abstract
Nature possesses inherent mechanisms for error detection and correction during the translation of genetic information, as demonstrated by the discovery of a self-complementary circular C3-code called X0 in various organisms such as bacteria, eukaryotes, plasmids, and viruses (Arquès and Michel, 1996; Michel, 2015, 2017). Since then, extensive research has focused on circular codes, which are believed to be remnants of ancient comma-free codes. These codes can be regarded as an additional genetic code specifically optimized for detecting and preserving the proper reading frame in protein-coding sequences. A study by Fimmel et al. in 2014 identified that a total of 216 maximal self-complementary C3-codes can be grouped into 27 equivalence classes with eight codes in each class. In this work, we study how the 27 equivalence classes are related to each other. While the codes in each equivalence class obtained by Fimmel et al. in 2014 are permutations of each other, i.e. one code can be obtained from the other by applying a permutation of the bases, it has not been clear how the equvalence classes are connected. We show that there is an ordering of the equivalence classes such that one gets from one class to the next one by substituting only one pair of codon/anticodon in the corresponding codes, i.e. the corresponding codes have a maximal intersection of 18 codons. To perform this analysis, we define two graphs, G216 and G27, whose vertices are, respectively, all 216 maximal self-complementary C3-codes and 27 equivalence classes. Several properties of the graphs are obtained. Most surprisingly, it turns out that G27 contains Hamiltonian paths of length 27. This fact ultimately leads to a representation of the set of all 216 maximal self-complementary C3-codes as a kind of spider web. Finally, we define dinucleotide cuts of such codes by projecting each codon to its first two bases and show that the paths of lengths 27 in G216 can even be chosen so that all the codes contain a special subset of dinucleotides defined by Rumer's roots. These observations raise a lot of new questions about the biological function of such structures.
Collapse
Affiliation(s)
- Elena Fimmel
- Institute of Mathematical Biology, Faculty for Computer Sciences, Mannheim University of Applied Sciences, 68163 Mannheim, Germany.
| | - Lutz Strüngmann
- Institute of Mathematical Biology, Faculty for Computer Sciences, Mannheim University of Applied Sciences, 68163 Mannheim, Germany.
| |
Collapse
|
4
|
Fimmel E, Michel CJ, Strüngmann L. Circular mixed sets. Biosystems 2023; 229:104906. [PMID: 37196893 DOI: 10.1016/j.biosystems.2023.104906] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 04/29/2023] [Indexed: 05/19/2023]
Abstract
In this article, we introduce the new mathematical concept of circular mixed sets of words over an arbitrary finite alphabet. These circular mixed sets may not be codes in the classical sense and hence allow a higher amount of information to be encoded. After describing their basic properties, we generalize a recent graph theoretical approach for circularity and apply it to distinguish codes from sets (i.e. non-codes). Moreover, several methods are given to construct circular mixed sets. Finally, this approach allows us to propose a new evolution model of the present genetic code that could have evolved from a dinucleotide world to a trinucleotide world via circular mixed sets of dinucleotides and trinucleotides.
Collapse
Affiliation(s)
- Elena Fimmel
- Institute of Mathematical Biology, Faculty for Computer Sciences, Mannheim University of Applied Sciences, 68163 Mannheim, Germany.
| | - Christian J Michel
- Theoretical bioinformatics, ICube, University of Strasbourg, C.N.R.S., 300 Boulevard Sébastien Brant, 67400 Illkirch, France.
| | - Lutz Strüngmann
- Institute of Mathematical Biology, Faculty for Computer Sciences, Mannheim University of Applied Sciences, 68163 Mannheim, Germany.
| |
Collapse
|
5
|
Abstract
A message such as mRNA, which consists of continuous characters without separators (such as commas or spaces), can easily be decoded incorrectly if it is read in the wrong reading frame. One construct to theoretically avoid these reading frame errors is the class of block codes. However, the first hypothesis of Watson and Crick (1953) that block codes are used as a tool to avoid reading frame errors in coding sequences already failed because the four periodical codons AAA, CCC, GGG and UUU seem to play an important role in protein coding sequences. Even the class of circular codes later discovered by Arquès and Michel (1996) in coding sequences cannot contain a periodic codon. However, by incorporating the interpretation of the message into the robustness of the reading frame, the extension of circular codes to include periodic codons is theoretically possible. In this work, we introduce the new class of I-circular codes. Unlike circular codes, these codes allow frame shifts, but only if the decoded interpretation of the message is identical to the intended interpretation. In the following, the formal definition of I-circular codes is introduced and the maximum and the maximal size of I-circular codes are given based on the standard genetic code table. These numbers are calculated using a new graph-theoretic approach derived from the classical one for the class of circular codes. Furthermore, we show that all 216 maximum self-complementary C3-codes (see Fimmel et al., 2015) can be extended to larger I-circular codes. We present the increased code coverage of the 216 newly constructed I-circular codes based on the human coding sequences in chromosome 1. In the last section of this paper, we use the polarity of amino acids as an interpretation table to construct I-circular codes. In an optimization process, two maximum I-circular codes of length 30 are found.
Collapse
|
6
|
Fimmel E, Gumbel M, Starman M, Strüngmann L. Robustness against point mutations of genetic code extensions under consideration of wobble-like effects. Biosystems 2021; 208:104485. [PMID: 34280517 DOI: 10.1016/j.biosystems.2021.104485] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Revised: 07/07/2021] [Accepted: 07/09/2021] [Indexed: 11/25/2022]
Abstract
Many theories of the evolution of the genetic code assume that the genetic code has always evolved in the direction of increasing the supply of amino acids to be encoded (Barbieri, 2019; Di Giulio, 2005; Wong, 1975). In order to reduce the risk of the formation of a non-functional protein due to point mutations, nature is said to have built in control mechanisms. Using graph theory the authors have investigated in Blazej et al. (2019) if this robustness is optimal in the sense that a different codon-amino acid assignment would not generate a code that is even more robust. At present, efforts to expand the genetic code are very relevant in biotechnological applications, for example, for the synthesis of new drugs (Anderson et al., 2004; Chin, 2017; Dien et al., 2018; Kimoto et al., 2009; Neumann et al., 2010). In this paper we generalize the approach proposed in Blazej et al. (2019) and will explore hypothetical extensions of the standard genetic code with respect to their optimal robustness in two ways: (1) We keep the usual genetic alphabet but move from codons to longer words, such as tetranucleotides. This increases the supply of coding words and thus makes it possible to encode non-canonical amino acids. (2) We expand the genetic alphabet by introducing non-canonical base pairs. In addition, the approach from Blazej et al. (2019) and Blazej et al. (2018) is extended by incorporating the weights of single point-mutations into the model. The weights can be interpreted as probabilities (appropriately normalized) or degrees of severity of a single point mutation. In particular, this new approach allows us to take a closer look at the wobble effects in the translation of codons into amino acids. According to the results from Blazej et al. (2019) and Blazej et al. (2018), the standard genetic code is not optimal in terms of its robustness to point mutations if the weights of single point mutations are not taken into account. After incorporation into the model weights that mimic the wobble effect, the results of the present work show that it is much more robust, almost optimal in that respect. We hope, that this theoretical analysis might help to assess extended genetic codes and their abilities to encode new amino acids.
Collapse
Affiliation(s)
- E Fimmel
- Competence Center in Medicine, Biology, and Biotechnology, Mannheim University of Applied Sciences, 68163 Mannheim, Germany.
| | - M Gumbel
- Competence Center in Medicine, Biology, and Biotechnology, Mannheim University of Applied Sciences, 68163 Mannheim, Germany.
| | - M Starman
- Competence Center in Medicine, Biology, and Biotechnology, Mannheim University of Applied Sciences, 68163 Mannheim, Germany.
| | - L Strüngmann
- Competence Center in Medicine, Biology, and Biotechnology, Mannheim University of Applied Sciences, 68163 Mannheim, Germany.
| |
Collapse
|
7
|
Abstract
The origin of the modern genetic code and the mechanisms that have contributed to its present form raise many questions. The main goal of this work is to test two hypotheses concerning the development of the genetic code for their compatibility and complementarity and see if they could benefit from each other. On the one hand, Gonzalez, Giannerini and Rosa developed a theory, based on four-based codons, which they called tesserae. This theory can explain the degeneracy of the modern vertebrate mitochondrial code. On the other hand, in the 1990s, so-called circular codes were discovered in nature, which seem to ensure the maintenance of a correct reading-frame during the translation process. It turns out that the two concepts not only do not contradict each other, but on the contrary complement and enrichen each other.
Collapse
|