1
|
Dragovich B, Fimmel E, Khrennikov A, Mišić NŽ. Modeling the origin, evolution, and functioning of the genetic code. Biosystems 2025; 247:105373. [PMID: 39642979 DOI: 10.1016/j.biosystems.2024.105373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2024] [Accepted: 11/14/2024] [Indexed: 12/09/2024]
Affiliation(s)
- Branko Dragovich
- Institute of Physics, University of Belgrade, Pregrevica 118, Belgrade, 11080, Serbia; Mathematical Institute of the Serbian Academy of Sciences and Arts, Kneza Mihaila 36, Belgrade, 11000, Serbia
| | - Elena Fimmel
- Institute of Mathematical Biology, Faculty for Computer Sciences, Mannheim University of Applied Sciences, Paul-Wittsack-Str. 10, Mannheim, 68163, Germany
| | - Andrei Khrennikov
- International Center for Mathematical Modelling in Physics and Cognitive Sciences, Linnaeus University, Universitetsplatsen 1, Växjö, 35195, Sweden
| | - Nataša Ž Mišić
- Research and Development Institute Lola Ltd, Kneza Viseslava 70a, Belgrade, 11030, Serbia
| |
Collapse
|
2
|
Michel CJ. Circular code identified by the codon usage. Biosystems 2024; 244:105308. [PMID: 39159879 DOI: 10.1016/j.biosystems.2024.105308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2024] [Revised: 08/04/2024] [Accepted: 08/13/2024] [Indexed: 08/21/2024]
Abstract
Since 1996, circular codes in genes have been identified thanks to the development of 6 statistical approaches: trinucleotide frequencies per frame (Arquès and Michel, 1996), correlation functions per frame (Arquès and Michel, 1997), frame permuted trinucleotide frequencies (Frey and Michel, 2003, 2006), advanced statistical functions at the gene population level (Michel, 2015) and at the gene level (Michel, 2017). All these 3-frame statistical methods analyse the trinucleotide information in the 3 frames of genes: the reading frame and the 2 shifted frames. Notably, codon usage does not allow for the identification of circular codes (Michel, 2020). This has been a long-standing problem since 1996, hindering biologists' access to circular code theory. By considering circular code conditions resulting from code theory, particularly the concept of permutation class, and building upon previous statistical work, a new statistical approach based solely on the codon usage, i.e. a 1-frame statistical method, surprisingly reveals the maximal C3 self-complementary trinucleotide circular code X in bacterial genes and in average (bacterial, archaeal, eukaryotic) genes, and almost in archaeal genes. Additionally, a new parameter definition indicates that bacterial and archaeal genes exhibit codon usage dispersion of the same order of magnitude, but significantly higher than that observed in eukaryotic genes. This statistical finding may explain the greater variability of codes in eukaryotic genes compared to bacterial and archaeal genes, an issue that has been open for many years. Finally, biologists can now search for new (variant) circular codes at both the genome level (across all genes in a given genome) and the gene level using only codon usage, without the need for analysing the shifted frames.
Collapse
Affiliation(s)
- Christian J Michel
- Theoretical bioinformatics, ICube, University of Strasbourg, C.N.R.S., 300 Boulevard Sébastien Brant, 67400 Illkirch, France.
| |
Collapse
|
3
|
Thompson JD, Ripp R, Mayer C, Poch O, Michel CJ. Potential role of the X circular code in the regulation of gene expression. Biosystems 2021; 203:104368. [PMID: 33567309 DOI: 10.1016/j.biosystems.2021.104368] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 01/18/2021] [Accepted: 01/20/2021] [Indexed: 02/06/2023]
Abstract
The X circular code is a set of 20 trinucleotides (codons) that has been identified in the protein-coding genes of most organisms (bacteria, archaea, eukaryotes, plasmids, viruses). It has been shown previously that the X circular code has the important mathematical property of being an error-correcting code. Thus, motifs of the X circular code, i.e. a series of codons belonging to X and called X motifs, allow identification and maintenance of the reading frame in genes. X motifs are significantly enriched in protein-coding genes, but have also been identified in many transfer RNA (tRNA) genes and in important functional regions of the ribosomal RNA (rRNA), notably in the peptidyl transferase center and the decoding center. Here, we investigate the potential role of X motifs as functional elements of protein-coding genes. First, we identify the codons of the X circular code which are frequent or rare in each domain of life (archaea, bacteria, eukaryota) and show that, for the amino acids with the highest codon bias, the preferred codon is often an X codon. We also observe a correlation between the 20 X codons and the optimal codons/dicodons that have been shown to influence translation efficiency. Then, we examined recently published experimental results concerning gene expression levels in diverse organisms. The approach used is the analysis of X motifs according to their density ds(X), i.e. the number of X motifs per kilobase in a gene sequence s. Surprisingly, this simple parameter identifies several unexpected relations between the X circular code and gene expression. For example, the X motifs are significantly enriched in the minimal gene set belonging to the three domains of life, and in codon-optimized genes. Furthermore, the density of X motifs generally correlates with experimental measures of translation efficiency and mRNA stability. Taken together, these results lead us to propose that the X motifs may represent a genetic signal contributing to the maintenance of the correct reading frame and the optimization and regulation of gene expression.
Collapse
Affiliation(s)
- Julie D Thompson
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France.
| | - Raymond Ripp
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France.
| | - Claudine Mayer
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France; Unité de Microbiologie Structurale, Institut Pasteur, CNRS, 75724, Paris Cedex 15, France; Université Paris Diderot, Sorbonne Paris Cité, 75724, Paris Cedex 15, France.
| | - Olivier Poch
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France.
| | - Christian J Michel
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France.
| |
Collapse
|
4
|
Dila G, Michel CJ, Thompson JD. Optimality of circular codes versus the genetic code after frameshift errors. Biosystems 2020; 195:104134. [DOI: 10.1016/j.biosystems.2020.104134] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2020] [Revised: 03/23/2020] [Accepted: 03/25/2020] [Indexed: 12/24/2022]
|
5
|
Dila G, Ripp R, Mayer C, Poch O, Michel CJ, Thompson JD. Circular code motifs in the ribosome: a missing link in the evolution of translation? RNA (NEW YORK, N.Y.) 2019; 25:1714-1730. [PMID: 31506380 PMCID: PMC6859856 DOI: 10.1261/rna.072074.119] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/23/2019] [Accepted: 09/06/2019] [Indexed: 05/29/2023]
Abstract
The origin of the genetic code remains enigmatic five decades after it was elucidated, although there is growing evidence that the code coevolved progressively with the ribosome. A number of primordial codes were proposed as ancestors of the modern genetic code, including comma-free codes such as the RRY, RNY, or GNC codes (R = G or A, Y = C or T, N = any nucleotide), and the X circular code, an error-correcting code that also allows identification and maintenance of the reading frame. It was demonstrated previously that motifs of the X circular code are significantly enriched in the protein-coding genes of most organisms, from bacteria to eukaryotes. Here, we show that imprints of this code also exist in the ribosomal RNA (rRNA). In a large-scale study involving 133 organisms representative of the three domains of life, we identified 32 universal X motifs that are conserved in the rRNA of >90% of the organisms. Intriguingly, most of the universal X motifs are located in rRNA regions involved in important ribosome functions, notably in the peptidyl transferase center and the decoding center that form the original "proto-ribosome." Building on the existing accretion models for ribosome evolution, we propose that error-correcting circular codes represented an important step in the emergence of the modern genetic code. Thus, circular codes would have allowed the simultaneous coding of amino acids and synchronization of the reading frame in primitive translation systems, prior to the emergence of more sophisticated start codon recognition and translation initiation mechanisms.
Collapse
Affiliation(s)
- Gopal Dila
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg 67000, France
| | - Raymond Ripp
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg 67000, France
| | - Claudine Mayer
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg 67000, France
- Unité de Microbiologie Structurale, Institut Pasteur, CNRS, 75724 Paris Cedex 15, France
- Université Paris Diderot, Sorbonne Paris Cité, 75724 Paris Cedex 15, France
| | - Olivier Poch
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg 67000, France
| | - Christian J Michel
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg 67000, France
| | - Julie D Thompson
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg 67000, France
| |
Collapse
|
6
|
Michel CJ. The Maximal C³ Self-Complementary Trinucleotide Circular Code X in Genes of Bacteria, Archaea, Eukaryotes, Plasmids and Viruses. Life (Basel) 2017; 7:life7020020. [PMID: 28420220 PMCID: PMC5492142 DOI: 10.3390/life7020020] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2017] [Revised: 03/23/2017] [Accepted: 03/31/2017] [Indexed: 12/16/2022] Open
Abstract
In 1996, a set X of 20 trinucleotides was identified in genes of both prokaryotes and eukaryotes which has on average the highest occurrence in reading frame compared to its two shifted frames. Furthermore, this set X has an interesting mathematical property as X is a maximal C3 self-complementary trinucleotide circular code. In 2015, by quantifying the inspection approach used in 1996, the circular code X was confirmed in the genes of bacteria and eukaryotes and was also identified in the genes of plasmids and viruses. The method was based on the preferential occurrence of trinucleotides among the three frames at the gene population level. We extend here this definition at the gene level. This new statistical approach considers all the genes, i.e., of large and small lengths, with the same weight for searching the circular code X. As a consequence, the concept of circular code, in particular the reading frame retrieval, is directly associated to each gene. At the gene level, the circular code X is strengthened in the genes of bacteria, eukaryotes, plasmids, and viruses, and is now also identified in the genes of archaea. The genes of mitochondria and chloroplasts contain a subset of the circular code X. Finally, by studying viral genes, the circular code X was found in DNA genomes, RNA genomes, double-stranded genomes, and single-stranded genomes.
Collapse
Affiliation(s)
- Christian J Michel
- Theoretical Bioinformatics, ICube, University of Strasbourg, CNRS, 300 Boulevard Sébastien Brant, 67400 Illkirch, France.
| |
Collapse
|
7
|
Michel CJ. WITHDRAWN: The maximal C 3 self-complementary trinucleotide circular code X in genes of bacteria, archaea, eukaryotes, plasmids and viruses. J Theor Biol 2017:S0022-5193(17)30029-2. [PMID: 28115203 DOI: 10.1016/j.jtbi.2017.01.028] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2016] [Revised: 01/12/2017] [Accepted: 01/19/2017] [Indexed: 11/28/2022]
Affiliation(s)
- Christian J Michel
- Theoretical bioinformatics, ICube, University of Strasbourg, CNRS, 300 Boulevard Sébastien Brant, 67400 Illkirch, France.
| |
Collapse
|
8
|
Michel CJ, Pellegrini M, Pirillo G. Maximal dinucleotide and trinucleotide circular codes. J Theor Biol 2015; 389:40-6. [PMID: 26382231 DOI: 10.1016/j.jtbi.2015.08.029] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Revised: 07/28/2015] [Accepted: 08/29/2015] [Indexed: 10/23/2022]
Abstract
We determine here the number and the list of maximal dinucleotide and trinucleotide circular codes. We prove that there is no maximal dinucleotide circular code having strictly less than 6 elements (maximum size of dinucleotide circular codes). On the other hand, a computer calculus shows that there are maximal trinucleotide circular codes with less than 20 elements (maximum size of trinucleotide circular codes). More precisely, there are maximal trinucleotide circular codes with 14, 15, 16, 17, 18 and 19 elements and no maximal trinucleotide circular code having less than 14 elements. We give the same information for the maximal self-complementary dinucleotide and trinucleotide circular codes. The amino acid distribution of maximal trinucleotide circular codes is also determined.
Collapse
Affiliation(s)
- Christian J Michel
- Theoretical Bioinformatics, ICube, University of Strasbourg, CNRS, 300 Boulevard Sébastien Brant, 67400 Illkirch, France.
| | - Marco Pellegrini
- Dipartimento di Matematica e Informatica "U.Dini", viale Morgagni 67/A, 50134 Firenze, Italy.
| | - Giuseppe Pirillo
- Consiglio Nazionale delle Ricerche, Istituto di Analisi dei Sistemi ed Informatica "Antonio Ruberti", Unità di Firenze, Dipartimento di Matematica e Informatica "U.Dini", viale Morgagni 67/A, 50134 Firenze, Italy; Université de Marne-la-Vallée, 5 boulevard Descartes, 77454 Marne-la-Vallée Cedex 2, France.
| |
Collapse
|
9
|
Michel CJ. The maximal C(3) self-complementary trinucleotide circular code X in genes of bacteria, eukaryotes, plasmids and viruses. J Theor Biol 2015; 380:156-77. [PMID: 25934352 DOI: 10.1016/j.jtbi.2015.04.009] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2014] [Revised: 02/28/2015] [Accepted: 04/09/2015] [Indexed: 11/28/2022]
Abstract
In 1996, a set X of 20 trinucleotides is identified in genes of both prokaryotes and eukaryotes which has in average the highest occurrence in reading frame compared to the two shifted frames (Arquès and Michel, 1996). Furthermore, this set X has an interesting mathematical property as X is a maximal C(3) self-complementary trinucleotide circular code (Arquès and Michel, 1996). In 2014, the number of trinucleotides in prokaryotic genes has been multiplied by a factor of 527. Furthermore, two new gene kingdoms of plasmids and viruses contain enough trinucleotide data to be analysed. The approach used in 1996 for identifying a preferential frame for a trinucleotide is quantified here with a new definition analysing the occurrence probability of a complementary/permutation (CP) trinucleotide set in a gene kingdom. Furthermore, in order to increase the statistical significance of results compared to those of 1996, the circular code X is studied on several gene taxonomic groups in a kingdom. Based on this new statistical approach, the circular code X is strengthened in genes of prokaryotes and eukaryotes, and now also identified in genes of plasmids. A subset of X with 18 or 16 trinucleotides is identified in genes of viruses. Furthermore, a simple probabilistic model based on the independent occurrence of trinucleotides in reading frame of genes explains the circular code frequencies and asymmetries observed in the shifted frames in all studied gene kingdoms. Finally, the developed approach allows to identify variant X codes in genes, i.e. trinucleotide codes which differ from X. In genes of bacteria, eukaryotes and plasmids, 14 among the 47 studied gene taxonomic groups (about 30%) have variant X codes. Seven variant X codes are identified with at least 16 trinucleotides of X. Two variant X codes XA in cyanobacteria and plasmids of cyanobacteria, and XD in birds are self-complementary, without permuted trinucleotides but non-circular. Five variant X codes XB in deinococcus, plasmids of chloroflexi and deinococcus, mammals and kinetoplasts, XC in elusimicrobia and apicomplexans, XE in fishes, XF in insects, and XG in basidiomycetes and plasmids of spirochaetes are C(3) self-complementary circular. In genes of viruses, no variant X code is found.
Collapse
Affiliation(s)
- Christian J Michel
- Theoretical Bioinformatics, ICube, University of Strasbourg, CNRS, 300 Boulevard Sébastien Brant, 67400 Illkirch, France.
| |
Collapse
|
10
|
Abstract
We begin here a combinatorial study of dinucleotide circular codes. A word written on a circle is called circular. A set of dinucleotides is a circular code if all circular words constructed with this set have a unique decomposition. Propositions based on a letter necklace allow to determine the 24 maximum dinucleotide circular codes (of 6 elements). A partition property is also identified with eight self-complementary maximum dinucleotide circular codes and two classes of eight maximum dinucleotide circular codes in bijective correspondence by the complementarity map.
Collapse
|
11
|
Michel CJ, Pirillo G. A permuted set of a trinucleotide circular code coding the 20 amino acids in variant nuclear codes. J Theor Biol 2013. [DOI: 10.1016/j.jtbi.2012.11.023] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
12
|
Bussoli L, Michel CJ, Pirillo G. On Conjugation Partitions of Sets of Trinucleotides. ACTA ACUST UNITED AC 2012. [DOI: 10.4236/am.2012.31017] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
13
|
Abstract
Recently, we identified a hierarchy relation between trinucleotide comma-free codes and trinucleotide circular codes (see our previous works). Here, we extend our hierarchy with two new classes of codes, called DLD and LDL codes, which are stronger than the comma-free codes. We also prove that no circular code with 20 trinucleotides is a DLD code and that a circular code with 20 trinucleotides is comma-free if and only if it is a LDL code. Finally, we point out the possible role of the symmetric group ∑4 in the mathematical study of trinucleotide circular codes.
Collapse
|
14
|
Gonzalez D, Giannerini S, Rosa R. Circular codes revisited: A statistical approach. J Theor Biol 2011; 275:21-8. [DOI: 10.1016/j.jtbi.2011.01.028] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2010] [Revised: 01/18/2011] [Accepted: 01/19/2011] [Indexed: 11/29/2022]
|
15
|
On the evolution of the standard genetic code: vestiges of critical scale invariance from the RNA world in current prokaryote genomes. PLoS One 2009; 4:e4340. [PMID: 19183813 PMCID: PMC2631149 DOI: 10.1371/journal.pone.0004340] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2008] [Accepted: 11/21/2008] [Indexed: 11/19/2022] Open
Abstract
Herein two genetic codes from which the primeval RNA code could have originated the standard genetic code (SGC) are derived. One of them, called extended RNA code type I, consists of all codons of the type RNY (purine-any base-pyrimidine) plus codons obtained by considering the RNA code but in the second (NYR type) and third (YRN type) reading frames. The extended RNA code type II, comprises all codons of the type RNY plus codons that arise from transversions of the RNA code in the first (YNY type) and third (RNR) nucleotide bases. In order to test if putative nucleotide sequences in the RNA World and in both extended RNA codes, share the same scaling and statistical properties to those encountered in current prokaryotes, we used the genomes of four Eubacteria and three Archaeas. For each prokaryote, we obtained their respective genomes obeying the RNA code or the extended RNA codes types I and II. In each case, we estimated the scaling properties of triplet sequences via a renormalization group approach, and we calculated the frequency distributions of distances for each codon. Remarkably, the scaling properties of the distance series of some codons from the RNA code and most codons from both extended RNA codes turned out to be identical or very close to the scaling properties of codons of the SGC. To test for the robustness of these results, we show, via computer simulation experiments, that random mutations of current genomes, at the rates of 10(-10) per site per year during three billions of years, were not enough for destroying the observed patterns. Therefore, we conclude that most current prokaryotes may still contain relics of the primeval RNA World and that both extended RNA codes may well represent two plausible evolutionary paths between the RNA code and the current SGC.
Collapse
|
16
|
Frey G, Michel CJ. An analytical model of gene evolution with six mutation parameters: an application to archaeal circular codes. Comput Biol Chem 2006; 30:1-11. [PMID: 16324886 DOI: 10.1016/j.compbiolchem.2005.09.001] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2005] [Revised: 09/04/2005] [Accepted: 09/05/2005] [Indexed: 11/17/2022]
Abstract
We develop here an analytical evolutionary model based on a trinucleotide mutation matrix 64 x 64 with six substitution parameters associated with the transitions and transversions in the three trinucleotide sites. It generalizes the previous models based on the nucleotide mutation matrices 4 x 4 and the trinucleotide mutation matrix 64 x 64 with three parameters. It determines at some time t the exact occurrence probabilities of trinucleotides mutating randomly according to six substitution parameters. An application of this model allows an evolutionary study of the common circular code COM and the 15 archaeal circular codes X which have been recently identified in several archaeal genomes. The main property of a circular code is the retrieval of the reading frames in genes, both locally, i.e. anywhere in genes and in particular without a start codon, and automatically with a window of a few nucleotides. In genes, the circular code is superimposed on the traditional genetic one. Very unexpectedly, the evolutionary model demonstrates that the archaeal circular codes can derive from the common circular code subjected to random substitutions with particular values for six substitutions parameters. It has a strong correlation with the statistical observations of three archaeal codes in actual genes. Furthermore, the properties of these substitution rates allow proposal of an evolutionary classification of the 15 archaeal codes into three main classes according to this model. In almost all the cases, they agree with the actual degeneracy of the genetic code with substitutions more frequent in the third trinucleotide site and with transitions more frequent that transversions in any trinucleotide site.
Collapse
Affiliation(s)
- Gabriel Frey
- Equipe de Bioinformatique Théorique, LSIIT (UMR CNRS-ULP 7005), Université Louis Pasteur de Strasbourg, Pôle API, Boulevard Sébastien Brant, 67400 Illkirch, France.
| | | |
Collapse
|
17
|
Frey G, Michel CJ. Identification of circular codes in bacterial genomes and their use in a factorization method for retrieving the reading frames of genes. Comput Biol Chem 2006; 30:87-101. [PMID: 16439185 DOI: 10.1016/j.compbiolchem.2005.11.001] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2005] [Revised: 11/07/2005] [Accepted: 11/07/2005] [Indexed: 10/25/2022]
Abstract
We developed a statistical method that allows each trinucleotide to be associated with a unique frame among the three possible ones in a (protein coding) gene. An extensive gene study in 175 complete bacterial genomes based on this statistical approach resulted in identification of 72 new circular codes. Finding a circular code enables an immediate retrieval of the reading frame locally anywhere in a gene. No knowledge of location of the start codon is required and a short window of only a few nucleotides is sufficient for automatic retrieval. We have therefore developed a factorization method (that explores previously found circular codes) for retrieving the reading frames of bacterial genes. Its principle is new and easy to understand. Neither complex treatment nor specific information on the nucleotide sequences is necessary. Moreover, the method can be used for short regions in nucleotide sequences (less than 25 nucleotides in protein coding genes). Selected additional properties of circular codes and their possible biological consequences are also discussed.
Collapse
Affiliation(s)
- Gabriel Frey
- Equipe de Bioinformatique Théorique, LSIIT (UMR CNRS-ULP 7005), Université Louis Pasteur de Strasbourg, Pôle API, Boulevard Sébastien Brant, 67400 Illkirch, France.
| | | |
Collapse
|