1
|
Fimmel E, Saleh H, Strüngmann L. Forbidden codon combinations in error-detecting circular codes. Theory Biosci 2025; 144:67-80. [PMID: 39676149 PMCID: PMC11802632 DOI: 10.1007/s12064-024-00431-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Accepted: 11/11/2024] [Indexed: 12/17/2024]
Abstract
Circular codes, which are considered as putative remnants of primaeval comma-free codes, have recently become a focal point of research. These codes constitute a secondary type of genetic code, primarily tasked with detecting and preserving the normal reading frame within protein-coding sequences. The identification of a universal code present across various species has sparked numerous theoretical and experimental inquiries. Among these, the exploration of the class of 216 self-complementary C 3 -codes of maximum size 20 has garnered significant attention. However, the origin of the number 216 lacks a satisfactory explanation, and the mathematical construction of these codes remains elusive. This paper introduces a new software designed to facilitate the construction of self-complementary C 3 -codes (of maximum size). The approach involves a systematic exclusion of codons, guided by two fundamental mathematical theorems. These theorems demonstrate how codons can be automatically excluded from consideration when imposing requirements such as self-complementarity, circularity or maximality. By leveraging these theorems, our software provides a novel and efficient means to construct these intriguing circular codes, shedding light on their mathematical foundations and contributing to a deeper understanding of their biological significance.
Collapse
Affiliation(s)
- Elena Fimmel
- Institute for Mathematical Biology Faculty of Computer Sciences, Mannheim University of Applied Sciences, 68163, Mannheim, Germany.
| | - Hadi Saleh
- Institute for Mathematical Biology Faculty of Computer Sciences, Mannheim University of Applied Sciences, 68163, Mannheim, Germany
| | - Lutz Strüngmann
- Institute for Mathematical Biology Faculty of Computer Sciences, Mannheim University of Applied Sciences, 68163, Mannheim, Germany
| |
Collapse
|
2
|
Fimmel E, Strüngmann L. The spiderweb of error-detecting codes in the genetic information. Biosystems 2023; 233:105009. [PMID: 37640191 DOI: 10.1016/j.biosystems.2023.105009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 08/21/2023] [Accepted: 08/21/2023] [Indexed: 08/31/2023]
Abstract
Nature possesses inherent mechanisms for error detection and correction during the translation of genetic information, as demonstrated by the discovery of a self-complementary circular C3-code called X0 in various organisms such as bacteria, eukaryotes, plasmids, and viruses (Arquès and Michel, 1996; Michel, 2015, 2017). Since then, extensive research has focused on circular codes, which are believed to be remnants of ancient comma-free codes. These codes can be regarded as an additional genetic code specifically optimized for detecting and preserving the proper reading frame in protein-coding sequences. A study by Fimmel et al. in 2014 identified that a total of 216 maximal self-complementary C3-codes can be grouped into 27 equivalence classes with eight codes in each class. In this work, we study how the 27 equivalence classes are related to each other. While the codes in each equivalence class obtained by Fimmel et al. in 2014 are permutations of each other, i.e. one code can be obtained from the other by applying a permutation of the bases, it has not been clear how the equvalence classes are connected. We show that there is an ordering of the equivalence classes such that one gets from one class to the next one by substituting only one pair of codon/anticodon in the corresponding codes, i.e. the corresponding codes have a maximal intersection of 18 codons. To perform this analysis, we define two graphs, G216 and G27, whose vertices are, respectively, all 216 maximal self-complementary C3-codes and 27 equivalence classes. Several properties of the graphs are obtained. Most surprisingly, it turns out that G27 contains Hamiltonian paths of length 27. This fact ultimately leads to a representation of the set of all 216 maximal self-complementary C3-codes as a kind of spider web. Finally, we define dinucleotide cuts of such codes by projecting each codon to its first two bases and show that the paths of lengths 27 in G216 can even be chosen so that all the codes contain a special subset of dinucleotides defined by Rumer's roots. These observations raise a lot of new questions about the biological function of such structures.
Collapse
Affiliation(s)
- Elena Fimmel
- Institute of Mathematical Biology, Faculty for Computer Sciences, Mannheim University of Applied Sciences, 68163 Mannheim, Germany.
| | - Lutz Strüngmann
- Institute of Mathematical Biology, Faculty for Computer Sciences, Mannheim University of Applied Sciences, 68163 Mannheim, Germany.
| |
Collapse
|
3
|
Fimmel E, Michel CJ, Strüngmann L. Circular mixed sets. Biosystems 2023; 229:104906. [PMID: 37196893 DOI: 10.1016/j.biosystems.2023.104906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 04/29/2023] [Indexed: 05/19/2023]
Abstract
In this article, we introduce the new mathematical concept of circular mixed sets of words over an arbitrary finite alphabet. These circular mixed sets may not be codes in the classical sense and hence allow a higher amount of information to be encoded. After describing their basic properties, we generalize a recent graph theoretical approach for circularity and apply it to distinguish codes from sets (i.e. non-codes). Moreover, several methods are given to construct circular mixed sets. Finally, this approach allows us to propose a new evolution model of the present genetic code that could have evolved from a dinucleotide world to a trinucleotide world via circular mixed sets of dinucleotides and trinucleotides.
Collapse
Affiliation(s)
- Elena Fimmel
- Institute of Mathematical Biology, Faculty for Computer Sciences, Mannheim University of Applied Sciences, 68163 Mannheim, Germany.
| | - Christian J Michel
- Theoretical bioinformatics, ICube, University of Strasbourg, C.N.R.S., 300 Boulevard Sébastien Brant, 67400 Illkirch, France.
| | - Lutz Strüngmann
- Institute of Mathematical Biology, Faculty for Computer Sciences, Mannheim University of Applied Sciences, 68163 Mannheim, Germany.
| |
Collapse
|
4
|
Borah C, Ali T. Genetic code noise immunity features: Degeneracy and frameshift correction. GENE REPORTS 2022. [DOI: 10.1016/j.genrep.2022.101707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
5
|
Pentamers with Non-redundant Frames: Bias for Natural Circular Code Codons. J Mol Evol 2020; 88:194-201. [DOI: 10.1007/s00239-019-09925-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Accepted: 12/17/2019] [Indexed: 02/06/2023]
|
6
|
Demongeot J, Seligmann H. Spontaneous evolution of circular codes in theoretical minimal RNA rings. Gene 2019; 705:95-102. [DOI: 10.1016/j.gene.2019.03.069] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2018] [Revised: 03/08/2019] [Accepted: 03/29/2019] [Indexed: 02/06/2023]
|
7
|
Warthi G, Seligmann H. Transcripts with systematic nucleotide deletion of 1-12 nucleotide in human mitochondrion suggest potential non-canonical transcription. PLoS One 2019; 14:e0217356. [PMID: 31120958 PMCID: PMC6532905 DOI: 10.1371/journal.pone.0217356] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Accepted: 05/09/2019] [Indexed: 11/22/2022] Open
Abstract
Raw transcriptomic data contain numerous RNA reads whose homology with template DNA doesn't match canonical transcription. Transcriptome analyses usually ignore such noncanonical RNA reads. Here, analyses search for noncanonical mitochondrial RNAs systematically deleting 1 to 12 nucleotides after each transcribed nucleotide triplet, producing deletion-RNAs (delRNAs). We detected delRNAs in the human whole cell and purified mitochondrial transcriptomes, and in Genbank's human EST database corresponding to systematic deletions of 1 to 12 nucleotides after each transcribed trinucleotide. DelRNAs detected in both transcriptomes mapped along with 55.63% of the EST delRNAs. A bias exists for delRNAs covering identical mitogenomic regions in both transcriptomic and EST datasets. Among 227 delRNAs detected in these 3 datasets, 81.1% and 8.4% of delRNAs were mapped on mitochondrial coding and hypervariable region 2 of dloop. Del-transcription analyses of GenBank's EST database confirm observations from whole cell and purified mitochondrial transcriptomes, eliminating the possibility that detected delRNAs are false positives matches, cytosolic DNA/RNA nuclear contamination or sequencing artefacts. These detected delRNAs are enriched in frameshift-inducing homopolymers and are poor in frameshift-preventing circular code codons (a set of 20 codons which regulate reading frame detection, over- and underrepresented in coding and other frames of genes, respectively) suggesting a motif-based regulation of non-canonical transcription. These findings show that rare non-canonical transcripts exist. Such non canonical del-transcription does increases mitochondrial coding potential and non-coding regulation of intracellular mechanisms, and could explain the dark DNA conundrum.
Collapse
Affiliation(s)
- Ganesh Warthi
- Aix-Marseille Université, IRD, VITROME, Institut Hospitalo-Universitaire Méditerranée-Infection, Marseille, France
| | - Hervé Seligmann
- Aix-Marseille Université, IRD, MEPHI, Institut Hospitalo-Universitaire (IHU) Méditerranée Infection, Marseille, France
- The National Natural History Collections, The Hebrew University of Jerusalem, Jerusalem, Israel
| |
Collapse
|
8
|
Mathematical fundamentals for the noise immunity of the genetic code. Biosystems 2018; 164:186-198. [DOI: 10.1016/j.biosystems.2017.09.007] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2017] [Revised: 09/07/2017] [Accepted: 09/08/2017] [Indexed: 01/05/2023]
|
9
|
|
10
|
Seligmann H, Warthi G. Genetic Code Optimization for Cotranslational Protein Folding: Codon Directional Asymmetry Correlates with Antiparallel Betasheets, tRNA Synthetase Classes. Comput Struct Biotechnol J 2017; 15:412-424. [PMID: 28924459 PMCID: PMC5591391 DOI: 10.1016/j.csbj.2017.08.001] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2017] [Revised: 07/20/2017] [Accepted: 08/05/2017] [Indexed: 12/14/2022] Open
Abstract
A new codon property, codon directional asymmetry in nucleotide content (CDA), reveals a biologically meaningful genetic code dimension: palindromic codons (first and last nucleotides identical, codon structure XZX) are symmetric (CDA = 0), codons with structures ZXX/XXZ are 5'/3' asymmetric (CDA = - 1/1; CDA = - 0.5/0.5 if Z and X are both purines or both pyrimidines, assigning negative/positive (-/+) signs is an arbitrary convention). Negative/positive CDAs associate with (a) Fujimoto's tetrahedral codon stereo-table; (b) tRNA synthetase class I/II (aminoacylate the 2'/3' hydroxyl group of the tRNA's last ribose, respectively); and (c) high/low antiparallel (not parallel) betasheet conformation parameters. Preliminary results suggest CDA-whole organism associations (body temperature, developmental stability, lifespan). Presumably, CDA impacts spatial kinetics of codon-anticodon interactions, affecting cotranslational protein folding. Some synonymous codons have opposite CDA sign (alanine, leucine, serine, and valine), putatively explaining how synonymous mutations sometimes affect protein function. Correlations between CDA and tRNA synthetase classes are weaker than between CDA and antiparallel betasheet conformation parameters. This effect is stronger for mitochondrial genetic codes, and potentially drives mitochondrial codon-amino acid reassignments. CDA reveals information ruling nucleotide-protein relations embedded in reversed (not reverse-complement) sequences (5'-ZXX-3'/5'-XXZ-3').
Collapse
Affiliation(s)
- Hervé Seligmann
- Aix-Marseille Univ, Unité de Recherche sur les Maladies Infectieuses et Tropicales Emergentes, UM 63, CNRS UMR7278, IRD 198, INSERM U1095, Institut Hospitalo-Universitaire Méditerranée-Infection, Marseille, Postal code 13385, France
- Dept. Ecol Evol Behav, Alexander Silberman Inst Life Sci, The Hebrew University of Jerusalem, IL-91904 Jerusalem, Israel
| | - Ganesh Warthi
- Aix-Marseille Univ, Unité de Recherche sur les Maladies Infectieuses et Tropicales Emergentes, UM 63, CNRS UMR7278, IRD 198, INSERM U1095, Institut Hospitalo-Universitaire Méditerranée-Infection, Marseille, Postal code 13385, France
| |
Collapse
|
11
|
El Houmami N, Seligmann H. Evolution of Nucleotide Punctuation Marks: From Structural to Linear Signals. Front Genet 2017; 8:36. [PMID: 28396681 PMCID: PMC5366352 DOI: 10.3389/fgene.2017.00036] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2016] [Accepted: 03/13/2017] [Indexed: 01/13/2023] Open
Abstract
We present an evolutionary hypothesis assuming that signals marking nucleotide synthesis (DNA replication and RNA transcription) evolved from multi- to unidimensional structures, and were carried over from transcription to translation. This evolutionary scenario presumes that signals combining secondary and primary nucleotide structures are evolutionary transitions. Mitochondrial replication initiation fits this scenario. Some observations reported in the literature corroborate that several signals for nucleotide synthesis function in translation, and vice versa. (a) Polymerase-induced frameshift mutations occur preferentially at translational termination signals (nucleotide deletion is interpreted as termination of nucleotide polymerization, paralleling the role of stop codons in translation). (b) Stem-loop hairpin presence/absence modulates codon-amino acid assignments, showing that translational signals sometimes combine primary and secondary nucleotide structures (here codon and stem-loop). (c) Homopolymer nucleotide triplets (AAA, CCC, GGG, TTT) cause transcriptional and ribosomal frameshifts. Here we find in recently described human mitochondrial RNAs that systematically lack mono-, dinucleotides after each trinucleotide (delRNAs) that delRNA triplets include 2x more homopolymers than mitogenome regions not covered by delRNA. Further analyses of delRNAs show that the natural circular code X (a little-known group of 20 translational signals enabling ribosomal frame retrieval consisting of 20 codons {AAC, AAT, ACC, ATC, ATT, CAG, CTC, CTG, GAA, GAC, GAG, GAT, GCC, GGC, GGT, GTA, GTC, GTT, TAC, TTC} universally overrepresented in coding versus other frames of gene sequences), regulates frameshift in transcription and translation. This dual transcription and translation role confirms for X the hypothesis that translational signals were carried over from transcriptional signals.
Collapse
Affiliation(s)
- Nawal El Houmami
- URMITE, Aix Marseille Université UM63, CNRS 7278, IRD 198, INSERM 1095, IHU - Méditerranée Infection Marseille, France
| | - Hervé Seligmann
- URMITE, Aix Marseille Université UM63, CNRS 7278, IRD 198, INSERM 1095, IHU - Méditerranée Infection Marseille, France
| |
Collapse
|
12
|
Fimmel E, Strüngmann L. Maximal dinucleotide comma-free codes. J Theor Biol 2015; 389:206-13. [PMID: 26562635 DOI: 10.1016/j.jtbi.2015.10.022] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2015] [Revised: 10/16/2015] [Accepted: 10/19/2015] [Indexed: 10/22/2022]
Abstract
The problem of retrieval and maintenance of the correct reading frame plays a significant role in RNA transcription. Circular codes, and especially comma-free codes, can help to understand the underlying mechanisms of error-detection in this process. In recent years much attention has been paid to the investigation of trinucleotide circular codes (see, for instance, Fimmel et al., 2014; Fimmel and Strüngmann, 2015a; Michel and Pirillo, 2012; Michel et al., 2012, 2008), while dinucleotide codes had been touched on only marginally, even though dinucleotides are associated to important biological functions. Recently, all maximal dinucleotide circular codes were classified (Fimmel et al., 2015; Michel and Pirillo, 2013). The present paper studies maximal dinucleotide comma-free codes and their close connection to maximal dinucleotide circular codes. We give a construction principle for such codes and provide a graphical representation that allows them to be visualized geometrically. Moreover, we compare the results for dinucleotide codes with the corresponding situation for trinucleotide maximal self-complementary C(3)-codes. Finally, the results obtained are discussed with respect to Crick׳s hypothesis about frame-shift-detecting codes without commas.
Collapse
Affiliation(s)
- Elena Fimmel
- Institute of Mathematical Biology, Faculty of Computer Sciences, Mannheim University of Applied Sciences, 68163 Mannheim, Germany.
| | - Lutz Strüngmann
- Institute of Mathematical Biology, Faculty of Computer Sciences, Mannheim University of Applied Sciences, 68163 Mannheim, Germany.
| |
Collapse
|