1
|
Montemayor-Aldrete JA, Nieto-Villar JM, Villagómez CJ, Márquez-Caballé RF. An irreversible thermodynamic model of prebiological dissipative molecular structures inside vacuoles at the surface of the Archean Ocean. Biosystems 2025; 247:105379. [PMID: 39710184 DOI: 10.1016/j.biosystems.2024.105379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2024] [Revised: 12/05/2024] [Accepted: 12/05/2024] [Indexed: 12/24/2024]
Abstract
A prebiotic model, based in the framework of thermodynamic efficiency loss from small dissipative eukaryote organisms is developed to describe the maximum possible concentration of solar power to be dissipated on topological circular molecules structures encapsulated in lipid-walled vacuoles, which floated in the Archean oceans. By considering previously, the analysis of 71 species examined by covering 18 orders of mass magnitude from the Megapteranovaeangliae to Saccharomyces cerevisiae suggest that in molecular structures of smaller masses than any living being known nowadays, the power dissipation must be directly proportional to the power of the photons of solar origin that impinge them to give rise to the formation of more complex self-assembled molecular structures at the prebiotic stage by a quantum mechanics model of resonant photon wavelength excitation. The analysis of 12 circular molecules (encapsulated in lipid-walled vacuoles) relevant to the evolution of life on planet Earth such as the five nucleobases, and some aromatic molecules as pyrimidine, porphyrin, chlorin, coumarin, xanthine, etc., were carried out. Considering one vacuole of each type of molecule per square meter of the ocean's surface of planet Earth (1.8∗1015 vacuoles), their dissipative operation would require only 10-10 times the matter used by the biomass currently existing on Earth. Relevant numbers (1020-1021) for the annual dissipative cycles corresponding to high energy photo chemical events, which in principle allow the assembling of more complex polymers, were obtained. The previous figures are compatible with some results obtained by followers of the primordial soup theory where under certain suppositions about the Archean chemical kinetical changes on the precursors of RNA and DNA try to justify the formation rate of RNA and DNA components and the emergence of life within a 10-million-year window, 3.5 billion years ago. The physical foundation perspective and the simplicity of the proposed approach suggests that it can serve as a possible template for both, the development of new kind of experiments, and for prebiotic theories that address self-organization occurring inside such vacuoles. Our model provides a new way to conceptualize the self-production of simple cyclic dissipative molecular structures in the Archean period of planet Earth. © 2017 ElsevierInc.Allrightsreserved.
Collapse
Affiliation(s)
- Jorge A Montemayor-Aldrete
- Departamento de Estado Sólido, Instituto de Física, Universidad Nacional Autónoma de México, Circuito de la Investigación Científica, Ciudad Universitaria, Ciudad de México, 04510, Mexico.
| | - José Manuel Nieto-Villar
- Department of Chemical-Physics, A. Alzola Group of Thermodynamics of Complex Systems of M.V. Lomonosov Chair, Faculty of Chemistry, University of Havana, Cuba
| | - Carlos J Villagómez
- Departamento de Estado Sólido, Instituto de Física, Universidad Nacional Autónoma de México, Circuito de la Investigación Científica, Ciudad Universitaria, Ciudad de México, 04510, Mexico
| | - Rafael F Márquez-Caballé
- Departamento de Estado Sólido, Instituto de Física, Universidad Nacional Autónoma de México, Circuito de la Investigación Científica, Ciudad Universitaria, Ciudad de México, 04510, Mexico
| |
Collapse
|
2
|
Michel CJ, Sereni JS. Genome Galaxy Identified by the Circular Code Theory. Bull Math Biol 2024; 87:5. [PMID: 39589676 DOI: 10.1007/s11538-024-01366-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Accepted: 09/24/2024] [Indexed: 11/27/2024]
Abstract
The genome galaxy identified in bacteria is studied by expressing the reading frame retrieval (RFR) function according to the YZ-content (GC-, AG- and GT-content) of bacterial codons. We have developed a simple probabilistic model for ambiguous sequences in order to show that the RFR function is a measure of the gene reading frame retrieval. Indeed, the RFR function increases with the ratio of ambiguous sequences and the ratio of ambiguous sequences decreases when the codon usage dispersion increases. The classical GC-content is the best parameter for characterizing the upper arm, which is related to bacterial genes with a low GC-content, and the lower arm, which is related to bacterial genes with a high GC-content. The galaxy center has a GC-content around 0.5. Then, these results are confirmed by expressing the GC-content of bacterial codons as a function of the codon usage dispersion. Finally, the bacterial genome galaxy is better described with the GC3-content in the 3rd codon site compared to the GC1-content and GC2-content in the 1st and 2nd codons sites, respectively. Whereas the codon usage is used extensively by biologists, its dispersion, which is an important parameter to reveal this genome galaxy, is surprisingly little known and unused. Therefore, we have developed a mathematical theory of codon usage dispersion by deriving several formulæ. It shows three important parameters in codon usage: the minimum and maximum codon probabilities and the number of codons with high frequency, i.e. with a probability at least 1/64. By applying this theory to the evolution of the genetic code, we see that bacteria have optimised the number of codons with high frequency to maximise the codon dispersion, thus maximising the capacity to retrieve the reading frame in genes. The derived formulæ of dispersion can be easily extended to any weighted code over a finite alphabet.
Collapse
Affiliation(s)
- Christian J Michel
- Theoretical Bioinformatics, ICube, C.N.R.S., University of Strasbourg, 300 Boulevard Sébastien Brant, 67400, Illkirch, France.
| | - Jean-Sébastien Sereni
- Theoretical Bioinformatics, ICube, C.N.R.S., University of Strasbourg, 300 Boulevard Sébastien Brant, 67400, Illkirch, France
| |
Collapse
|
3
|
Fimmel E, Michel CJ, Strüngmann L. Circular cut codes in genetic information. Biosystems 2024; 243:105263. [PMID: 38971553 DOI: 10.1016/j.biosystems.2024.105263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Revised: 06/30/2024] [Accepted: 06/30/2024] [Indexed: 07/08/2024]
Abstract
In this work we present an analysis of the dinucleotide occurrences in the three codon sites 1-2, 2-3 and 1-3, based on a computation of the codon usage of three large sets of bacterial, archaeal and eukaryotic genes using the same method that identified a maximal C3 self-complementary trinucleotide circular code X in genes of bacteria and eukaryotes in 1996 (Arquès and Michel, 1996). Surprisingly, two dinucleotide circular codes are identified in the codon sites 1-2 and 2-3. Furthermore, these two codes are shifted versions of each other. Moreover, the dinucleotide code in the codon site 1-3 is circular, self-complementary and contained in the projection of X onto the 1st and 3rd bases, i.e. by cutting the middle base in each codon of X. We prove several results showing that the circularity and the self-complementarity of trinucleotide codes is induced by the circularity and the self-complementarity of its dinucleotide cut codes. Finally, we present several evolutionary approaches for an emergence of trinucleotide codes from dinucleotide codes.
Collapse
Affiliation(s)
- Elena Fimmel
- Institute of Mathematical Biology, Faculty for Computer Sciences, Mannheim University of Applied Sciences, 68163 Mannheim, Germany.
| | - Christian J Michel
- Theoretical bioinformatics, ICube, University of Strasbourg, C.N.R.S., 300 Boulevard Sébastien Brant, 67400 Illkirch, France.
| | - Lutz Strüngmann
- Institute of Mathematical Biology, Faculty for Computer Sciences, Mannheim University of Applied Sciences, 68163 Mannheim, Germany.
| |
Collapse
|
4
|
Girard C. The tri-flow adaptiveness of codes in major evolutionary transitions. Biosystems 2024; 237:105133. [PMID: 38336225 DOI: 10.1016/j.biosystems.2024.105133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 01/26/2024] [Accepted: 01/27/2024] [Indexed: 02/12/2024]
Abstract
Life codes increase in both number and variety with biological complexity. Although our knowledge of codes is constantly expanding, the evolutionary progression of organic, neural, and cultural codes in response to selection pressure remains poorly understood. Greater clarification of the selective mechanisms is achieved by investigating how major evolutionary transitions reduce spatiotemporal and energetic constraints on transmitting heritable code to offspring. Evolution toward less constrained flows is integral to enduring flow architecture everywhere, in both engineered and natural flow systems. Beginning approximately 4 billion years ago, the most basic level for transmitting genetic material to offspring was initiated by protocell division. Evidence from ribosomes suggests that protocells transmitted comma-free or circular codes, preceding the evolution of standard genetic code. This rudimentary information flow within protocells is likely to have first emerged within the geo-energetic and geospatial constraints of hydrothermal vents. A broad-gauged hypothesis is that major evolutionary transitions overcame such constraints with tri-flow adaptations. The interconnected triple flows incorporated energy-converting, spatiotemporal, and code-based informational dynamics. Such tri-flow adaptations stacked sequence splicing code on top of protein-DNA recognition code in eukaryotes, prefiguring the transition to sexual reproduction. Sex overcame the spatiotemporal-energetic constraints of binary fission with further code stacking. Examples are tubulin code and transcription initiation code in vertebrates. In a later evolutionary transition, language reduced metabolic-spatiotemporal constraints on inheritance by stacking phonetic, phonological, and orthographic codes. In organisms that reproduce sexually, each major evolutionary transition is shown to be a tri-flow adaptation that adds new levels of code-based informational exchange. Evolving biological complexity is also shown to increase the nongenetic transmissibility of code.
Collapse
Affiliation(s)
- Chris Girard
- Department of Global and Sociocultural Studies, Florida International University, Miami, FL 33199, United States.
| |
Collapse
|
5
|
Paredes O, Farfán-Ugalde E, Gómez-Márquez C, Borrayo E, Mendizabal AP, Morales JA. The calculus of codes - From entropy, complexity, and information to life. Biosystems 2024; 236:105099. [PMID: 38101727 DOI: 10.1016/j.biosystems.2023.105099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 12/05/2023] [Accepted: 12/05/2023] [Indexed: 12/17/2023]
Abstract
Exploring the core components that define living systems and their operational mechanisms within emerging biological entities is a complex endeavor. In the realm of biological systems literature, the terms matter, energy, information, complexity, and entropy are frequently referenced. However, possessing these concepts alone does not guarantee a comprehensive understanding or the ability to reconstruct the intricate nature of life. This study aims to illuminate the trajectory of these organic attributes, presenting a theoretical framework that delves into the integrated role of these concepts in biology. We assert that Code Biology serves as a pivotal steppingstone for unraveling the mechanisms underlying life. Biological codes (BCs) emerge not only from the interplay of matter and energy but also from Information. Contrary to deriving information from the former elements, we propose that information holds its place as a fundamental physical aspect. Consequently, we propose a continuum perspective called Calculus of Fundamentals involving three fundamentals: Matter, Energy, and Information, to depict the dynamics of BCs. To achieve this, we emphasize the necessity of studying Entropy and Complexity as integral organic descriptors. This perspective also facilitates the introduction of a mathematical theoretical framework that aids in comprehending continuous changes, the driving dynamics of biological fundamentals. We posit that Energy, Matter, and Information constitute the essential building blocks of living systems, and their interactions are governed by Entropy and Complexity analyses, redefined as biological descriptors. This interdisciplinary perspective of Code Biology sheds light on the intricate interplay between the controversial phenomenon of life and advances the idea of constructing a theory rooted in information as an organic fundamental.
Collapse
Affiliation(s)
- Omar Paredes
- Biodigital Innovation Lab, Translational Bioengineering Department, CUCEI, UDG, México
| | - Enrique Farfán-Ugalde
- Biodigital Innovation Lab, Translational Bioengineering Department, CUCEI, UDG, México
| | | | - Ernesto Borrayo
- Biodigital Innovation Lab, Translational Bioengineering Department, CUCEI, UDG, México
| | | | - J Alejandro Morales
- Biodigital Innovation Lab, Translational Bioengineering Department, CUCEI, UDG, México.
| |
Collapse
|
6
|
Fimmel E, Strüngmann L. The spiderweb of error-detecting codes in the genetic information. Biosystems 2023; 233:105009. [PMID: 37640191 DOI: 10.1016/j.biosystems.2023.105009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 08/21/2023] [Accepted: 08/21/2023] [Indexed: 08/31/2023]
Abstract
Nature possesses inherent mechanisms for error detection and correction during the translation of genetic information, as demonstrated by the discovery of a self-complementary circular C3-code called X0 in various organisms such as bacteria, eukaryotes, plasmids, and viruses (Arquès and Michel, 1996; Michel, 2015, 2017). Since then, extensive research has focused on circular codes, which are believed to be remnants of ancient comma-free codes. These codes can be regarded as an additional genetic code specifically optimized for detecting and preserving the proper reading frame in protein-coding sequences. A study by Fimmel et al. in 2014 identified that a total of 216 maximal self-complementary C3-codes can be grouped into 27 equivalence classes with eight codes in each class. In this work, we study how the 27 equivalence classes are related to each other. While the codes in each equivalence class obtained by Fimmel et al. in 2014 are permutations of each other, i.e. one code can be obtained from the other by applying a permutation of the bases, it has not been clear how the equvalence classes are connected. We show that there is an ordering of the equivalence classes such that one gets from one class to the next one by substituting only one pair of codon/anticodon in the corresponding codes, i.e. the corresponding codes have a maximal intersection of 18 codons. To perform this analysis, we define two graphs, G216 and G27, whose vertices are, respectively, all 216 maximal self-complementary C3-codes and 27 equivalence classes. Several properties of the graphs are obtained. Most surprisingly, it turns out that G27 contains Hamiltonian paths of length 27. This fact ultimately leads to a representation of the set of all 216 maximal self-complementary C3-codes as a kind of spider web. Finally, we define dinucleotide cuts of such codes by projecting each codon to its first two bases and show that the paths of lengths 27 in G216 can even be chosen so that all the codes contain a special subset of dinucleotides defined by Rumer's roots. These observations raise a lot of new questions about the biological function of such structures.
Collapse
Affiliation(s)
- Elena Fimmel
- Institute of Mathematical Biology, Faculty for Computer Sciences, Mannheim University of Applied Sciences, 68163 Mannheim, Germany.
| | - Lutz Strüngmann
- Institute of Mathematical Biology, Faculty for Computer Sciences, Mannheim University of Applied Sciences, 68163 Mannheim, Germany.
| |
Collapse
|
7
|
Fimmel E, Michel CJ, Strüngmann L. Circular mixed sets. Biosystems 2023; 229:104906. [PMID: 37196893 DOI: 10.1016/j.biosystems.2023.104906] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 04/29/2023] [Indexed: 05/19/2023]
Abstract
In this article, we introduce the new mathematical concept of circular mixed sets of words over an arbitrary finite alphabet. These circular mixed sets may not be codes in the classical sense and hence allow a higher amount of information to be encoded. After describing their basic properties, we generalize a recent graph theoretical approach for circularity and apply it to distinguish codes from sets (i.e. non-codes). Moreover, several methods are given to construct circular mixed sets. Finally, this approach allows us to propose a new evolution model of the present genetic code that could have evolved from a dinucleotide world to a trinucleotide world via circular mixed sets of dinucleotides and trinucleotides.
Collapse
Affiliation(s)
- Elena Fimmel
- Institute of Mathematical Biology, Faculty for Computer Sciences, Mannheim University of Applied Sciences, 68163 Mannheim, Germany.
| | - Christian J Michel
- Theoretical bioinformatics, ICube, University of Strasbourg, C.N.R.S., 300 Boulevard Sébastien Brant, 67400 Illkirch, France.
| | - Lutz Strüngmann
- Institute of Mathematical Biology, Faculty for Computer Sciences, Mannheim University of Applied Sciences, 68163 Mannheim, Germany.
| |
Collapse
|
8
|
Michel CJ, Sereni JS. Reading Frame Retrieval of Genes: A New Parameter of Codon Usage Based on the Circular Code Theory. Bull Math Biol 2023; 85:24. [PMID: 36826719 PMCID: PMC9950712 DOI: 10.1007/s11538-023-01129-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2022] [Accepted: 01/26/2023] [Indexed: 02/25/2023]
Abstract
Based on the circular code theory, we define a new function f that quantifies the property of reading frame retrieval (RFR) of genes from their codon usage. This RFR function f is computed on a massive scale in genes of genomes of bacteria, eukaryotes and archaea. By expressing f as a function of the mean number [Formula: see text] of codons per gene, a "universal" property is identified, whatever the kingdom: the reading frame retrieval is enhanced in large genes. By investigating this property according to the theory developed, a Spearman's rank correlation with a strong negative coefficient is observed between the codon usage dispersion d (from the uniform codon distribution [Formula: see text]) and the RFR function f, whatever the kingdom (p-values [Formula: see text] in bacteria, [Formula: see text] in eukaryotes and [Formula: see text] in archaea). Thus, the reading frame retrieval is enhanced with the codon usage dispersion. Furthermore, this approach identifies a "genome centre" from which emerge two distinct "genome arms": an upper arm and a lower arm, respectively, above and below the linear regression. The RFR function by itself or combined with classical methods (alignment, phylogeny) could also be a new approach to classify the genomes in the future.
Collapse
Affiliation(s)
- Christian J. Michel
- grid.11843.3f0000 0001 2157 9291Theoretical Bioinformatics, ICube, C.N.R.S., University of Strasbourg, 300 Boulevard Sébastien Brant, 67400 Illkirch, France
| | - Jean-Sébastien Sereni
- grid.11843.3f0000 0001 2157 9291Theoretical Bioinformatics, ICube, C.N.R.S., University of Strasbourg, 300 Boulevard Sébastien Brant, 67400 Illkirch, France
| |
Collapse
|
9
|
Borah C, Ali T. Genetic code noise immunity features: Degeneracy and frameshift correction. GENE REPORTS 2022. [DOI: 10.1016/j.genrep.2022.101707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
10
|
Property based analysis: Optimality of RNY comma-free code versus circular code (X) after frameshift errors. GENE REPORTS 2022. [DOI: 10.1016/j.genrep.2022.101652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
11
|
Abstract
A message such as mRNA, which consists of continuous characters without separators (such as commas or spaces), can easily be decoded incorrectly if it is read in the wrong reading frame. One construct to theoretically avoid these reading frame errors is the class of block codes. However, the first hypothesis of Watson and Crick (1953) that block codes are used as a tool to avoid reading frame errors in coding sequences already failed because the four periodical codons AAA, CCC, GGG and UUU seem to play an important role in protein coding sequences. Even the class of circular codes later discovered by Arquès and Michel (1996) in coding sequences cannot contain a periodic codon. However, by incorporating the interpretation of the message into the robustness of the reading frame, the extension of circular codes to include periodic codons is theoretically possible. In this work, we introduce the new class of I-circular codes. Unlike circular codes, these codes allow frame shifts, but only if the decoded interpretation of the message is identical to the intended interpretation. In the following, the formal definition of I-circular codes is introduced and the maximum and the maximal size of I-circular codes are given based on the standard genetic code table. These numbers are calculated using a new graph-theoretic approach derived from the classical one for the class of circular codes. Furthermore, we show that all 216 maximum self-complementary C3-codes (see Fimmel et al., 2015) can be extended to larger I-circular codes. We present the increased code coverage of the 216 newly constructed I-circular codes based on the human coding sequences in chromosome 1. In the last section of this paper, we use the polarity of amino acids as an interpretation table to construct I-circular codes. In an optimization process, two maximum I-circular codes of length 30 are found.
Collapse
|
12
|
Wang X, Dong Q, Chen G, Zhang J, Liu Y, Cai Y. Frameshift and wild-type proteins are often highly similar because the genetic code and genomes were optimized for frameshift tolerance. BMC Genomics 2022; 23:416. [PMID: 35655139 PMCID: PMC9164415 DOI: 10.1186/s12864-022-08435-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Accepted: 03/02/2022] [Indexed: 11/10/2022] Open
Abstract
Frameshift mutations have been considered of significant importance for the molecular evolution of proteins and their coding genes, while frameshift protein sequences encoded in the alternative reading frames of coding genes have been considered to be meaningless. However, functional frameshifts have been found widely existing. It was puzzling how a frameshift protein kept its structure and functionality while substantial changes occurred in its primary amino-acid sequence. This study shows that the similarities among frameshifts and wild types are higher than random similarities and are determined at different levels. Frameshift substitutions are more conservative than random substitutions in the standard genetic code (SGC). The frameshift substitutions score of SGC ranks in the top 2.0-3.5% of alternative genetic codes, showing that SGC is nearly optimal for frameshift tolerance. In many genes and certain genomes, frameshift-resistant codons and codon pairs appear more frequently than expected, suggesting that frameshift tolerance is achieved through not only the optimality of the genetic code but, more importantly, the further optimization of a specific gene or genome through the usages of codons/codon pairs, which sheds light on the role of frameshift mutations in molecular and genomic evolution.
Collapse
Affiliation(s)
- Xiaolong Wang
- Department of Biotechnology, College of Marine Life Sciences, Ocean University of China, No. 5 Yushan Road, Shandong, Qingdao, 266003, P. R. China.
| | - Quanjiang Dong
- Qingdao Municipal Hospital, Qingdao, Shandong, 266003, P. R. China
| | - Gang Chen
- Department of Biotechnology, College of Marine Life Sciences, Ocean University of China, No. 5 Yushan Road, Shandong, Qingdao, 266003, P. R. China
| | - Jianye Zhang
- Department of Biotechnology, College of Marine Life Sciences, Ocean University of China, No. 5 Yushan Road, Shandong, Qingdao, 266003, P. R. China
| | - Yongqiang Liu
- Department of Biotechnology, College of Marine Life Sciences, Ocean University of China, No. 5 Yushan Road, Shandong, Qingdao, 266003, P. R. China
| | - Yujia Cai
- Department of Biotechnology, College of Marine Life Sciences, Ocean University of China, No. 5 Yushan Road, Shandong, Qingdao, 266003, P. R. China
| |
Collapse
|
13
|
Trinucleotide k-circular codes II: Biology. Biosystems 2022; 217:104668. [DOI: 10.1016/j.biosystems.2022.104668] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 02/27/2022] [Accepted: 03/16/2022] [Indexed: 11/22/2022]
|
14
|
Giannerini S, Gonzalez DL, Goracci G, Danielli A. A role for circular code properties in translation. Sci Rep 2021; 11:9218. [PMID: 33911089 PMCID: PMC8080828 DOI: 10.1038/s41598-021-87534-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 03/23/2021] [Indexed: 11/19/2022] Open
Abstract
Circular codes represent a form of coding allowing detection/correction of frame-shift errors. Building on recent theoretical advances on circular codes, we provide evidence that protein coding sequences exhibit in-frame circular code marks, that are absent in introns and are intimately linked to the keto-amino transformation of codon bases. These properties strongly correlate with translation speed, codon influence and protein synthesis levels. Strikingly, circular code marks are absent at the beginning of coding sequences, but stably occur 40 codons after the initiator codon, hinting at the translation elongation process. Finally, we use the lens of circular codes to show that codon influence on translation correlates with the strong-weak dichotomy of the first two bases of the codon. The results can lead to defining new universal tools for sequence indicators and sequence optimization for bioinformatics and biotechnological applications, and can shed light on the molecular mechanisms behind the decoding process.
Collapse
Affiliation(s)
- Simone Giannerini
- Department of Statistical Sciences, University of Bologna, Bologna, 40126, Italy.
| | - Diego Luis Gonzalez
- Department of Statistical Sciences, University of Bologna, Bologna, 40126, Italy.,Institute for Microelectronics and Microsystems - Bologna Unit, CNR, Bologna, 40129, Italy
| | - Greta Goracci
- Department of Statistical Sciences, University of Bologna, Bologna, 40126, Italy
| | - Alberto Danielli
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, 40126, Italy
| |
Collapse
|
15
|
Michel CJ. Genes on the circular code alphabet. Biosystems 2021; 206:104431. [PMID: 33894288 DOI: 10.1016/j.biosystems.2021.104431] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Revised: 04/15/2021] [Accepted: 04/15/2021] [Indexed: 02/07/2023]
Abstract
The X motifs, motifs from the circular code X, are enriched in the (protein coding) genes of bacteria, archaea, eukaryotes, plasmids and viruses, moreover, in the minimal gene set belonging to the three domains of life, as well as in tRNA and rRNA sequences. They allow to retrieve, maintain and synchronize the reading frame in genes, and contribute to the regulation of gene expression. These results lead here to a theoretical study of genes based on the circular code alphabet. A new occurrence relation of the circular code X under the hypothesis of an equiprobable (balanced) strand pairing is given. Surprisingly, a statistical analysis of a large set of bacterial genes retrieves this relation on the circular code alphabet, but not on the DNA alphabet. Furthermore, the circular code X has the strongest balanced circular code pairing among 216 maximal C3 self-complementary trinucleotide circular codes, a new property of this circular code X. As an application of this theory, different tRNAs studied on the circular code alphabet reveal an unexpected stem structure. Thus, the circular code X would have constructed a coding stem in tRNAs as an outline of the future gene structure and the future DNA double helix.
Collapse
Affiliation(s)
- Christian J Michel
- Theoretical Bioinformatics, ICube, CNRS, University of Strasbourg, 300 Boulevard Sébastien Brant, 67400 Illkirch, France.
| |
Collapse
|
16
|
Thompson JD, Ripp R, Mayer C, Poch O, Michel CJ. Potential role of the X circular code in the regulation of gene expression. Biosystems 2021; 203:104368. [PMID: 33567309 DOI: 10.1016/j.biosystems.2021.104368] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 01/18/2021] [Accepted: 01/20/2021] [Indexed: 02/06/2023]
Abstract
The X circular code is a set of 20 trinucleotides (codons) that has been identified in the protein-coding genes of most organisms (bacteria, archaea, eukaryotes, plasmids, viruses). It has been shown previously that the X circular code has the important mathematical property of being an error-correcting code. Thus, motifs of the X circular code, i.e. a series of codons belonging to X and called X motifs, allow identification and maintenance of the reading frame in genes. X motifs are significantly enriched in protein-coding genes, but have also been identified in many transfer RNA (tRNA) genes and in important functional regions of the ribosomal RNA (rRNA), notably in the peptidyl transferase center and the decoding center. Here, we investigate the potential role of X motifs as functional elements of protein-coding genes. First, we identify the codons of the X circular code which are frequent or rare in each domain of life (archaea, bacteria, eukaryota) and show that, for the amino acids with the highest codon bias, the preferred codon is often an X codon. We also observe a correlation between the 20 X codons and the optimal codons/dicodons that have been shown to influence translation efficiency. Then, we examined recently published experimental results concerning gene expression levels in diverse organisms. The approach used is the analysis of X motifs according to their density ds(X), i.e. the number of X motifs per kilobase in a gene sequence s. Surprisingly, this simple parameter identifies several unexpected relations between the X circular code and gene expression. For example, the X motifs are significantly enriched in the minimal gene set belonging to the three domains of life, and in codon-optimized genes. Furthermore, the density of X motifs generally correlates with experimental measures of translation efficiency and mRNA stability. Taken together, these results lead us to propose that the X motifs may represent a genetic signal contributing to the maintenance of the correct reading frame and the optimization and regulation of gene expression.
Collapse
Affiliation(s)
- Julie D Thompson
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France.
| | - Raymond Ripp
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France.
| | - Claudine Mayer
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France; Unité de Microbiologie Structurale, Institut Pasteur, CNRS, 75724, Paris Cedex 15, France; Université Paris Diderot, Sorbonne Paris Cité, 75724, Paris Cedex 15, France.
| | - Olivier Poch
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France.
| | - Christian J Michel
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France.
| |
Collapse
|
17
|
Nesterov-Mueller A, Popov R, Seligmann H. Combinatorial Fusion Rules to Describe Codon Assignment in the Standard Genetic Code. Life (Basel) 2020; 11:life11010004. [PMID: 33374866 PMCID: PMC7824455 DOI: 10.3390/life11010004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Revised: 12/15/2020] [Accepted: 12/21/2020] [Indexed: 11/16/2022] Open
Abstract
We propose combinatorial fusion rules that describe the codon assignment in the standard genetic code simply and uniformly for all canonical amino acids. These rules become obvious if the origin of the standard genetic code is considered as a result of a fusion of four protocodes: Two dominant AU and GC protocodes and two recessive AU and GC protocodes. The biochemical meaning of the fusion rules consists of retaining the complementarity between cognate codons of the small hydrophobic amino acids and large charged or polar amino acids within the protocodes. The proto tRNAs were assembled in form of two kissing hairpins with 9-base and 10-base loops in the case of dominant protocodes and two 9-base loops in the case of recessive protocodes. The fusion rules reveal the connection between the stop codons, the non-canonical amino acids, pyrrolysine and selenocysteine, and deviations in the translation of mitochondria. Using fusion rules, we predicted the existence of additional amino acids that are essential for the development of the standard genetic code. The validity of the proposed partition of the genetic code into dominant and recessive protocodes is considered referring to state-of-the-art hypotheses. The formation of two aminoacyl-tRNA synthetase classes is compatible with four-protocode partition.
Collapse
Affiliation(s)
- Alexander Nesterov-Mueller
- Institute of Microstructure Technology, Karlsruhe Institute of Technology (KIT), 76344 Eggenstein-Leopoldshafen, Germany; (R.P.); (H.S.)
- Correspondence:
| | - Roman Popov
- Institute of Microstructure Technology, Karlsruhe Institute of Technology (KIT), 76344 Eggenstein-Leopoldshafen, Germany; (R.P.); (H.S.)
| | - Hervé Seligmann
- Institute of Microstructure Technology, Karlsruhe Institute of Technology (KIT), 76344 Eggenstein-Leopoldshafen, Germany; (R.P.); (H.S.)
- The National Natural History Collections, The Hebrew University of Jerusalem, Jerusalem 91904, Israel
- Laboratory AGEIS EA 7407, Team Tools for e-GnosisMedical & LabcomCNRS/UGA/OrangeLabs Telecoms4Health, Faculty of Medicine, Université Grenoble Alpes, F-38700 La Tronche, France
| |
Collapse
|
18
|
Demongeot J, Moreira A, Seligmann H. Negative CG dinucleotide bias: An explanation based on feedback loops between Arginine codon assignments and theoretical minimal RNA rings. Bioessays 2020; 43:e2000071. [PMID: 33319381 DOI: 10.1002/bies.202000071] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Revised: 11/23/2020] [Accepted: 11/26/2020] [Indexed: 01/05/2023]
Abstract
Theoretical minimal RNA rings are candidate primordial genes evolved for non-redundant coding of the genetic code's 22 coding signals (one codon per biogenic amino acid, a start and a stop codon) over the shortest possible length: 29520 22-nucleotide-long RNA rings solve this min-max constraint. Numerous RNA ring properties are reminiscent of natural genes. Here we present analyses showing that all RNA rings lack dinucleotide CG (a mutable, chemically instable dinucleotide coding for Arginine), bearing a resemblance to known CG-depleted genomes. CG in "incomplete" RNA rings (not coding for all coding signals, with only 3-12 nucleotides) gradually decreases towards CG absence in complete, 22-nucleotide-long RNA rings. Presumably, feedback loops during RNA ring growth during evolution (when amino acid assignment fixed the genetic code) assigned Arg to codons lacking CG (AGR) to avoid CG. Hence, as a chemical property of base pairs, CG mutability restructured the genetic code, thereby establishing itself as genetically encoded biological information.
Collapse
Affiliation(s)
- Jacques Demongeot
- Laboratory AGEIS EA 7407, Team Tools for e-Gnosis Medical & Labcom CNRS/UGA/OrangeLabs Telecom4Health, Faculty of Medicine, Université Grenoble Alpes, La Tronche, France
| | - Andrés Moreira
- Departamento de Informática, Universidad Técnica Federico Santa María, Santiago, Chile
| | - Hervé Seligmann
- Laboratory AGEIS EA 7407, Team Tools for e-Gnosis Medical & Labcom CNRS/UGA/OrangeLabs Telecom4Health, Faculty of Medicine, Université Grenoble Alpes, La Tronche, France.,The National Natural History Collections, The Hebrew University of Jerusalem, Jerusalem, Israel.,Institute of Microstructure Technology, Karlsruhe Institute of Technology (KIT), Eggenstein-Leopoldshafen, Germany
| |
Collapse
|
19
|
Kunnev D. Origin of Life: The Point of No Return. Life (Basel) 2020; 10:life10110269. [PMID: 33153087 PMCID: PMC7693465 DOI: 10.3390/life10110269] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Revised: 11/01/2020] [Accepted: 11/01/2020] [Indexed: 12/13/2022] Open
Abstract
Origin of life research is one of the greatest scientific frontiers of mankind. Many hypotheses have been proposed to explain how life began. Although different hypotheses emphasize different initial phenomena, all of them agree around one important concept: at some point, along with the chain of events toward life, Darwinian evolution emerged. There is no consensus, however, how this occurred. Frequently, the mechanism leading to Darwinian evolution is not addressed and it is assumed that this problem could be solved later, with experimental proof of the hypothesis. Here, the author first defines the minimum components required for Darwinian evolution and then from this standpoint, analyzes some of the hypotheses for the origin of life. Distinctive features of Darwinian evolution and life rooted in the interaction between information and its corresponding structure/function are then reviewed. Due to the obligatory dependency of the information and structure subject to Darwinian evolution, these components must be locked in their origin. One of the most distinctive characteristics of Darwinian evolution in comparison with all other processes is the establishment of a fundamentally new level of matter capable of evolving and adapting. Therefore, the initiation of Darwinian evolution is the "point of no return" after which life begins. In summary: a definition and a mechanism for Darwinian evolution are provided together with a critical analysis of some of the hypotheses for the origin of life.
Collapse
Affiliation(s)
- Dimiter Kunnev
- Department of Oral Biology, University at Buffalo, Buffalo, NY 14263, USA
| |
Collapse
|
20
|
The maximality of circular codes in genes statistically verified. Biosystems 2020; 197:104201. [DOI: 10.1016/j.biosystems.2020.104201] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2020] [Revised: 06/22/2020] [Accepted: 06/22/2020] [Indexed: 11/18/2022]
|
21
|
Artemaki PI, Scorilas A, Kontos CK. Circular RNAs: A New Piece in the Colorectal Cancer Puzzle. Cancers (Basel) 2020; 12:cancers12092464. [PMID: 32878117 PMCID: PMC7564116 DOI: 10.3390/cancers12092464] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2020] [Revised: 08/24/2020] [Accepted: 08/25/2020] [Indexed: 12/16/2022] Open
Abstract
Colorectal cancer (CRC) is the third most fatal type of malignancy, worldwide. Despite the advances accomplished in the elucidation of its molecular base and the existing CRC biomarkers introduced in the clinical practice, additional research is required. Circular RNAs (circRNAs) constitute a new RNA type, formed by back-splicing of primary transcripts. They have been discovered during the 1970s but were characterized as by-products of aberrant splicing. However, the modern high-throughput approaches uncovered their widespread expression; therefore, several questions were raised regarding their potential biological roles. During the last years, great progress has been achieved in the elucidation of their functions: circRNAs can act as microRNA sponges, transcription regulators, and interfere with splicing, as well. Furthermore, they are heavily involved in various human pathological states, including cancer, and could serve as diagnostic and prognostic biomarkers in several diseases. Particularly in CRC, aberrant expression of circRNAs has been observed. More specifically, these molecules either inhibit or promote colorectal carcinogenesis by regulating different molecules and signaling pathways. The present review discusses the characteristics and functions of circRNA, prior to analyzing the multifaceted role of these molecules in CRC and their potential value as biomarkers and therapeutic targets.
Collapse
Affiliation(s)
- Pinelopi I Artemaki
- Department of Biochemistry and Molecular Biology, Faculty of Biology, National and Kapodistrian University of Athens, GR-15701 Athens, Greece
| | - Andreas Scorilas
- Department of Biochemistry and Molecular Biology, Faculty of Biology, National and Kapodistrian University of Athens, GR-15701 Athens, Greece
| | - Christos K Kontos
- Department of Biochemistry and Molecular Biology, Faculty of Biology, National and Kapodistrian University of Athens, GR-15701 Athens, Greece
| |
Collapse
|
22
|
Michel CJ, Mayer C, Poch O, Thompson JD. Characterization of accessory genes in coronavirus genomes. Virol J 2020; 17:131. [PMID: 32854725 PMCID: PMC7450977 DOI: 10.1186/s12985-020-01402-1] [Citation(s) in RCA: 110] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Accepted: 08/16/2020] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND The Covid19 infection is caused by the SARS-CoV-2 virus, a novel member of the coronavirus (CoV) family. CoV genomes code for a ORF1a / ORF1ab polyprotein and four structural proteins widely studied as major drug targets. The genomes also contain a variable number of open reading frames (ORFs) coding for accessory proteins that are not essential for virus replication, but appear to have a role in pathogenesis. The accessory proteins have been less well characterized and are difficult to predict by classical bioinformatics methods. METHODS We propose a computational tool GOFIX to characterize potential ORFs in virus genomes. In particular, ORF coding potential is estimated by searching for enrichment in motifs of the X circular code, that is known to be over-represented in the reading frames of viral genes. RESULTS We applied GOFIX to study the SARS-CoV-2 and related genomes including SARS-CoV and SARS-like viruses from bat, civet and pangolin hosts, focusing on the accessory proteins. Our analysis provides evidence supporting the presence of overlapping ORFs 7b, 9b and 9c in all the genomes and thus helps to resolve some differences in current genome annotations. In contrast, we predict that ORF3b is not functional in all genomes. Novel putative ORFs were also predicted, including a truncated form of the ORF10 previously identified in SARS-CoV-2 and a little known ORF overlapping the Spike protein in Civet-CoV and SARS-CoV. CONCLUSIONS Our findings contribute to characterizing sequence properties of accessory genes of SARS coronaviruses, and especially the newly acquired genes making use of overlapping reading frames.
Collapse
Affiliation(s)
- Christian Jean Michel
- Laboratoire ICube, Department of Computer Science, CNRS, University of Strasbourg, F-67412 Strasbourg, France
| | - Claudine Mayer
- Laboratoire ICube, Department of Computer Science, CNRS, University of Strasbourg, F-67412 Strasbourg, France
- Unité de Microbiologie Structurale, Institut Pasteur, CNRS UMR 3528, 75724 Paris Cedex 15, France
- Université Paris Diderot, Sorbonne Paris Cité, 75724 Paris Cedex 15, France
| | - Olivier Poch
- Laboratoire ICube, Department of Computer Science, CNRS, University of Strasbourg, F-67412 Strasbourg, France
| | - Julie Dawn Thompson
- Laboratoire ICube, Department of Computer Science, CNRS, University of Strasbourg, F-67412 Strasbourg, France
| |
Collapse
|
23
|
Fimmel E, Michel CJ, Pirot F, Sereni JS, Starman M, Strüngmann L. The Relation Between k-Circularity and Circularity of Codes. Bull Math Biol 2020; 82:105. [PMID: 32754878 PMCID: PMC7402406 DOI: 10.1007/s11538-020-00770-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Accepted: 06/24/2020] [Indexed: 12/15/2022]
Abstract
A code X is k-circular if any concatenation of at most k words from X, when read on a circle, admits exactly one partition into words from X. It is circular if it is k-circular for every integer k. While it is not a priori clear from the definition, there exists, for every pair \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$(n,\ell )$$\end{document}(n,ℓ), an integer k such that every k-circular \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\ell $$\end{document}ℓ-letter code over an alphabet of cardinality n is circular, and we determine the least such integer k for all values of n and \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\ell $$\end{document}ℓ. The k-circular codes may represent an important evolutionary step between the circular codes, such as the comma-free codes, and the genetic code.
Collapse
Affiliation(s)
- Elena Fimmel
- Institute of Mathematical Biology, Faculty for Computer Sciences, Mannheim University of Applied Sciences, 68163 Mannheim, Germany
| | - Christian J. Michel
- Theoretical Bioinformatics, ICube, C.N.R.S., University of Strasbourg, 300 Boulevard Sébastien Brant, 67400 Illkirch, France
| | - François Pirot
- Theoretical Bioinformatics, ICube, C.N.R.S., University of Strasbourg, 300 Boulevard Sébastien Brant, 67400 Illkirch, France
- LORIA (Orpailleur), C.N.R.S., University of Lorraine, INRIA, Campus scientifique, 54506 Vandœuvre-lès-Nancy Cedex, France
| | - Jean-Sébastien Sereni
- Theoretical Bioinformatics, ICube, C.N.R.S., University of Strasbourg, 300 Boulevard Sébastien Brant, 67400 Illkirch, France
| | - Martin Starman
- Institute of Mathematical Biology, Faculty for Computer Sciences, Mannheim University of Applied Sciences, 68163 Mannheim, Germany
- Theoretical Bioinformatics, ICube, C.N.R.S., University of Strasbourg, 300 Boulevard Sébastien Brant, 67400 Illkirch, France
| | - Lutz Strüngmann
- Institute of Mathematical Biology, Faculty for Computer Sciences, Mannheim University of Applied Sciences, 68163 Mannheim, Germany
| |
Collapse
|
24
|
Dila G, Michel CJ, Thompson JD. Optimality of circular codes versus the genetic code after frameshift errors. Biosystems 2020; 195:104134. [DOI: 10.1016/j.biosystems.2020.104134] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2020] [Revised: 03/23/2020] [Accepted: 03/25/2020] [Indexed: 12/24/2022]
|
25
|
Gospodinov A, Kunnev D. Universal Codons with Enrichment from GC to AU Nucleotide Composition Reveal a Chronological Assignment from Early to Late Along with LUCA Formation. Life (Basel) 2020; 10:life10060081. [PMID: 32516985 PMCID: PMC7345086 DOI: 10.3390/life10060081] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Revised: 05/30/2020] [Accepted: 06/03/2020] [Indexed: 12/14/2022] Open
Abstract
The emergence of a primitive genetic code should be considered the most essential event during the origin of life. Almost a complete set of codons (as we know them) should have been established relatively early during the evolution of the last universal common ancestor (LUCA) from which all known organisms descended. Many hypotheses have been proposed to explain the driving forces and chronology of the evolution of the genetic code; however, none is commonly accepted. In the current paper, we explore the features of the genetic code that, in our view, reflect the mechanism and the chronological order of the origin of the genetic code. Our hypothesis postulates that the primordial RNA was mostly GC-rich, and this bias was reflected in the order of amino acid codon assignment. If we arrange the codons and their corresponding amino acids from GC-rich to AU-rich, we find that: 1. The amino acids encoded by GC-rich codons (Ala, Gly, Arg, and Pro) are those that contribute the most to the interactions with RNA (if incorporated into short peptides). 2. This order correlates with the addition of novel functions necessary for the evolution from simple to longer folded peptides. 3. The overlay of aminoacyl-tRNA synthetases (aaRS) to the amino acid order produces a distinctive zonal distribution for class I and class II suggesting an interdependent origin. These correlations could be explained by the active role of the bridge peptide (BP), which we proposed earlier in the evolution of the genetic code.
Collapse
Affiliation(s)
- Anastas Gospodinov
- Roumen Tsanev Institute of Molecular Biology, Bulgarian Academy of Sciences, Acad. G. Bonchev Str. 21, Sofia 1113, Bulgaria;
| | - Dimiter Kunnev
- Department of Molecular & Cellular Biology, Roswell Park Cancer Institute, Buffalo, NY 14263, USA
- Correspondence:
| |
Collapse
|
26
|
Footprints of a Singular 22-Nucleotide RNA Ring at the Origin of Life. BIOLOGY 2020; 9:biology9050088. [PMID: 32344921 PMCID: PMC7285048 DOI: 10.3390/biology9050088] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/17/2020] [Revised: 04/06/2020] [Accepted: 04/19/2020] [Indexed: 11/17/2022]
Abstract
(1) Background: Previous experimental observations and theoretical hypotheses have been providing insight into a hypothetical world where an RNA hairpin or ring may have debuted as the primary informational and functional molecule. We propose a model revisiting the architecture of RNA-peptide interactions at the origin of life through the evolutionary dynamics of RNA populations. (2) Methods: By performing a step-by-step computation of the smallest possible hairpin/ring RNA sequences compatible with building up a variety of peptides of the primitive network, we inferred the sequence of a singular docosameric RNA molecule, we call the ALPHA sequence. Then, we searched for any relics of the peptides made from ALPHA in sequences deposited in the different public databases. (3) Results: Sequence matching between ALPHA and sequences from organisms among the earliest forms of life on Earth were found at high statistical relevance. We hypothesize that the frequency of appearance of relics from ALPHA sequence in present genomes has a functional necessity. (4) Conclusions: Given the fitness of ALPHA as a supportive sequence of the framework of all existing theories, and the evolution of Archaea and giant viruses, it is anticipated that the unique properties of this singular archetypal ALPHA sequence should prove useful as a model matrix for future applications, ranging from synthetic biology to DNA computing.
Collapse
|
27
|
Michel CJ, Thompson JD. Identification of a circular code periodicity in the bacterial ribosome: origin of codon periodicity in genes? RNA Biol 2020; 17:571-583. [PMID: 31960748 PMCID: PMC8647727 DOI: 10.1080/15476286.2020.1719311] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Revised: 01/10/2020] [Accepted: 01/14/2020] [Indexed: 02/09/2023] Open
Abstract
Three-base periodicity (TBP), where nucleotides and higher order n-tuples are preferentially spaced by 3, 6, 9, etc. bases, is a well-known intrinsic property of protein-coding DNA sequences. However, its origins are still not fully understood. One hypothesis is that the periodicity reflects a primordial coding system that was used before the emergence of the modern standard genetic code (SGC). Recent evidence suggests that the X circular code, a set of 20 trinucleotides allowing the reading frames in genes to be retrieved locally, represents a possible ancestor of the SGC. Motifs from the X circular code have been found in the reading frame of protein-coding regions in extant organisms from bacteria to eukaryotes, in many transfer RNA (tRNA) genes and in important functional regions of the ribosomal RNA (rRNA), notably in the peptidyl transferase centre and the decoding centre. Here, we have used a powerful correlation function to search for periodicity patterns involving the 20 trinucleotides of the X circular code in a large set of bacterial protein-coding genes, as well as in the translation machinery, including rRNA and tRNA sequences. As might be expected, we found a strong circular code periodicity 0 modulo 3 in the protein-coding genes. More surprisingly, we also identified a similar circular code periodicity in a large region of the 16S rRNA. This region includes the 3' major domain corresponding to the primordial proto-ribosome decoding centre and containing numerous sites that interact with the tRNA and messenger RNA (mRNA) during translation. Furthermore, 3D structural analysis shows that the periodicity region surrounds the mRNA channel that lies between the head and the body of the SSU. Our results support the hypothesis that the X circular code may constitute an ancestral translation code involved in reading frame retrieval and maintenance, traces of which persist in modern mRNA, tRNA and rRNA despite their long evolution and adaptation to the SGC.
Collapse
Affiliation(s)
- Christian J. Michel
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France
| | - Julie D. Thompson
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France
| |
Collapse
|
28
|
Zeng K, Wang S. Circular RNAs: The crucial regulatory molecules in colorectal cancer. Pathol Res Pract 2020; 216:152861. [PMID: 32061452 DOI: 10.1016/j.prp.2020.152861] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/22/2019] [Revised: 01/20/2020] [Accepted: 02/10/2020] [Indexed: 02/06/2023]
Abstract
Colorectal cancer (CRC) is one of the most common malignancies worldwide. Recent studies have shown that circular RNAs (circRNAs) play critical roles in the pathogenesis and progression of CRC. CircRNAs are a special class of endogenous non-coding RNAs (ncRNAs) that harbor covalently closed ring structure with high conservation and stability, which are expressed in a tissue- and developmental-stage-specific manner. A growing body of evidence suggests that circRNAs are abnormally expressed in CRC tissues, cell lines and plasma, and are closely linked with CRC clinical malignant features. CircRNAs participate in various biological processes of CRC cells, including cell proliferation, apoptosis, senescence, migration and invasion and so on, through acting as "microRNA (miRNA) sponges", binding to protein and even translating protein. In the present review, we systematically introduce the CRC-related circRNAs and their functional mechanisms, as well as the potential applications for CRC diagnosis and prognosis.
Collapse
Affiliation(s)
- Kaixuan Zeng
- School of Medicine, Southeast University, Nanjing, 210009, China; General Clinical Research Center, Nanjing First Hospital, Nanjing Medical University, Nanjing, 210006, China
| | - Shukui Wang
- School of Medicine, Southeast University, Nanjing, 210009, China; General Clinical Research Center, Nanjing First Hospital, Nanjing Medical University, Nanjing, 210006, China.
| |
Collapse
|
29
|
Demongeot J, Seligmann H. Accretion history of large ribosomal subunits deduced from theoretical minimal RNA rings is congruent with histories derived from phylogenetic and structural methods. Gene 2020; 738:144436. [PMID: 32027954 DOI: 10.1016/j.gene.2020.144436] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2019] [Revised: 01/24/2020] [Accepted: 02/01/2020] [Indexed: 12/17/2022]
Abstract
Accretions of tRNAs presumably formed the large complex ribosomal RNA structures. Similarities of tRNA secondary structures with rRNA secondary structures increase with the integration order of their cognate amino acid in the genetic code, indicating tRNA evolution towards rRNA-like structures. Here analyses rank secondary structure subelements of three large ribosomal RNAs (Prokaryota: Archaea: Thermus thermophilus; Bacteria: Escherichia coli; Eukaryota: Saccharomyces cerevisiae) in relation to their similarities with secondary structures formed by presumed proto-tRNAs, represented by 25 theoretical minimal RNA rings. These ranks are compared to those derived from two independent methods (ranks provide a relative evolutionary age to the rRNA substructure), (a) cladistic phylogenetic analyses and (b) 3D-crystallography where core subelements are presumed ancient and peripheral ones recent. Comparisons of rRNA secondary structure subelements with RNA ring secondary structures show congruence between ranks deduced by this method and both (a) and (b) (more with (a) than (b)), especially for RNA rings with predicted ancient cognate amino acid. Reconstruction of accretion histories of large rRNAs will gain from adequately integrating information from independent methods. Theoretical minimal RNA rings, sequences deterministically designed in silico according to specific coding constraints, might produce adequate scales for prebiotic and early life molecular evolution.
Collapse
Affiliation(s)
- Jacques Demongeot
- Université Grenoble Alpes, Faculty of Medicine, Laboratory AGEIS EA 7407, Team Tools for e-Gnosis Medical & Labcom CNRS/UGA/OrangeLabs Telecoms4Health, F-38700 La Tronche, France.
| | - Hervé Seligmann
- Université Grenoble Alpes, Faculty of Medicine, Laboratory AGEIS EA 7407, Team Tools for e-Gnosis Medical & Labcom CNRS/UGA/OrangeLabs Telecoms4Health, F-38700 La Tronche, France; The National Natural History Collections, The Hebrew University of Jerusalem, 91404 Jerusalem, Israel.
| |
Collapse
|
30
|
Demongeot J, Seligmann H. The primordial tRNA acceptor stem code from theoretical minimal RNA ring clusters. BMC Genet 2020; 21:7. [PMID: 31973715 PMCID: PMC6979358 DOI: 10.1186/s12863-020-0812-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2019] [Accepted: 01/13/2020] [Indexed: 01/28/2023] Open
Abstract
BACKGROUND Theoretical minimal RNA rings code by design over the shortest length once for each of the 20 amino acids, a start and a stop codon, and form stem-loop hairpins. This defines at most 25 RNA rings of 22 nucleotides. As a group, RNA rings mimick numerous prebiotic and early life biomolecular properties: tRNAs, deamination gradients and replication origins, emergence of codon preferences for the natural circular code, and contents of early protein coding genes. These properties result from the RNA ring's in silico design, based mainly on coding nonredundancy among overlapping translation frames, as the genetic code's codon-amino acid assignments determine. RNA rings resemble ancestral tRNAs, defining RNA ring anticodons and corresponding cognate amino acids. Surprisingly, all examined RNA ring properties coevolve with genetic code integration ranks of RNA ring cognates, as if RNA rings mimick prebiotic and early life evolution. METHODS Distances between RNA rings were calculated using different evolutionary models. Associations between these distances and genetic code evolutionary hypotheses detect evolutionary models best describing RNA ring diversification. RESULTS Here pseudo-phylogenetic analyses of RNA rings produce clusters corresponding to the primordial code in tRNA acceptor stems, more so when substitution matrices from neutrally evolving pseudogenes are used rather than from functional protein coding genes reflecting selection for conserving amino acid properties. CONCLUSIONS Results indicate RNA rings with recent cognates evolved from those with early cognates. Hence RNA rings, as designed by the genetic code's structure, simulate tRNA stem evolution and prebiotic history along neutral chemistry-driven mutation regimes.
Collapse
Affiliation(s)
- Jacques Demongeot
- Faculty of Medicine, Laboratory AGEIS EA 7407, Team Tools for e-Gnosis Medical & Labcom CNRS/UGA/OrangeLabs Telecoms4Health, Université Grenoble Alpes, F-38700 La Tronche, France
| | - Hervé Seligmann
- Faculty of Medicine, Laboratory AGEIS EA 7407, Team Tools for e-Gnosis Medical & Labcom CNRS/UGA/OrangeLabs Telecoms4Health, Université Grenoble Alpes, F-38700 La Tronche, France
- The National Natural History Collections, The Hebrew University of Jerusalem, 91404 Jerusalem, Israel
| |
Collapse
|