1
|
Rosandić M, Vlahović I, Pilaš I, Glunčić M, Paar V. An Explanation of Exceptions from Chargaff's Second Parity Rule/Strand Symmetry of DNA Molecules. Genes (Basel) 2022; 13:1929. [PMID: 36360166 PMCID: PMC9689577 DOI: 10.3390/genes13111929] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2022] [Revised: 10/12/2022] [Accepted: 10/17/2022] [Indexed: 11/04/2022] Open
Abstract
In this article, we show that mono/oligonucleotide quadruplets, as basic structures of DNA, along with our classification of trinucleotides, disclose an organization of genomes based on purine-pyrimidine symmetry. Moreover, the structure and stability of DNA are influenced by the Watson-Crick pairing and the natural law of DNA creation and conservation, according to which the same mono- or oligonucleotide insertion must be inserted simultaneously into both strands of DNA. Taken together, they lead to quadruplets with central mirror symmetry and bidirectional DNA strand orientation and are incorporated into Chargaff's second parity rule (CSPR). Performing our quadruplet frequency analysis of all human chromosomes and of Neuroblastoma BreakPoint Family (NBPF) genes, which code Olduvai protein domains in the human genome, we show that the coding part of DNA violates CSPR. This may shed new light and give rise to a novel hypothesis on DNA creation and its evolution. In this framework, the logarithmic relationship between oligonucleotide order and minimal DNA sequence length, to establish the validity of CSPR, automatically follows from the quadruplet structure of the genomic sequence. The problem of the violation of CSPR in rare symbionts is discussed.
Collapse
Affiliation(s)
- Marija Rosandić
- University Hospital Centre Zagreb (Ret.), 10000 Zagreb, Croatia
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia
| | - Ines Vlahović
- Faculty of Science, Algebra University College, 10000 Zagreb, Croatia
| | - Ivan Pilaš
- Forest Research Institute, 10450 Jastrebarsko, Croatia
| | - Matko Glunčić
- Physics Department, Faculty of Science, University of Zagreb, 10000 Zagreb, Croatia
| | - Vladimir Paar
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia
- Physics Department, Faculty of Science, University of Zagreb, 10000 Zagreb, Croatia
| |
Collapse
|
2
|
Malhotra N, Seshasayee ASN. Replication-Dependent Organization Constrains Positioning of Long DNA Repeats in Bacterial Genomes. Genome Biol Evol 2022; 14:6625829. [PMID: 35776426 PMCID: PMC9297083 DOI: 10.1093/gbe/evac102] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/27/2022] [Indexed: 01/29/2023] Open
Abstract
Bacterial genome organization is primarily driven by chromosomal replication from a single origin of replication. However, chromosomal rearrangements, which can disrupt such organization, are inevitable in nature. Long DNA repeats are major players mediating rearrangements, large and small, via homologous recombination. Since changes to genome organization affect bacterial fitness-and more so in fast-growing than slow-growing bacteria-and are under selection, it is reasonable to expect that genomic positioning of long DNA repeats is also under selection. To test this, we identified identical DNA repeats of at least 100 base pairs across ∼6,000 bacterial genomes and compared their distribution in fast- and slow-growing bacteria. We found that long identical DNA repeats are distributed in a non-random manner across bacterial genomes. Their distribution differs in the overall number, orientation, and proximity to the origin of replication, between fast- and slow-growing bacteria. We show that their positioning-which might arise from a combination of the processes that produce repeats and selection on rearrangements that recombination between repeat elements might cause-permits less disruption to the replication-dependent genome organization of bacteria compared with random suggesting it as a major constraint to positioning of long DNA repeats.
Collapse
|
3
|
Affinity and Correlation in DNA. J 2022. [DOI: 10.3390/j5020016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
A statistical analysis of important DNA sequences and related proteins has been performed to study the relationships between monomers, and some general considerations about these macromolecules can be provided from the results. First, the most important relationship between sites in all the DNA sequences examined is that between two consecutive base pairs. This is an indication of an energetic stabilization due to the stacking interaction of these couples of base pairs. Secondly, the difference between human chromosome sequences and their coding parts is relevant both in the relationships between sites and in some specific compositional rules, such as the second Chargaff rule. Third, the evidence of the relationship in two successive triplets of DNA coding sequences generates a relationship between two successive amino acids in the proteins. This is obviously impossible if all the relationships between the sites are statistical evidence and do not involve causes; therefore, in this article, due to stacking interactions and this relationship in coding sequences, we will divide the concept of the relationship between sites into two concepts: affinity and correlation, the first with physical causes and the second without. Finally, from the statistical analyses carried out, it will emerge that the human genome is uniform, with the only significant exception being the Y chromosome.
Collapse
|
4
|
Rosandić M, Vlahović I, Paar V. Novel look at DNA and life-Symmetry as evolutionary forcing. J Theor Biol 2019; 483:109985. [PMID: 31469987 DOI: 10.1016/j.jtbi.2019.08.016] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2018] [Revised: 06/21/2018] [Accepted: 08/22/2019] [Indexed: 11/20/2022]
Abstract
After explanation of the Chargaff´s first parity rule in terms of the Watson-Crick base-pairing between the two DNA strands, the Chargaff´s second parity rule for each strand of DNA (also named strand symmetry), which cannot be explained by Watson-Crick base-pairing only, is still a challenging issue already fifty years. We show that during evolution DNA preserves its identity in the form of quadruplet A+T and C+G rich matrices based on purine-pyrimidine mirror symmetries of trinucleotides. Identical symmetries are present in our classification of trinucleotides and the genetic code table. All eukaryotes and almost all prokaryotes (bacteria and archaea) have quadruplet mirror symmetries in structural form and frequencies following the principle of Chargaff's second parity rule and Natural symmetry law of DNA creation and conservation. Some rare symbionts have mirror symmetry only in their structural form within each DNA strand. Based on our matrix analysis of closely related species, humans and Neanderthals, we find that the circular cycle of inverse proportionality between trinucleotides preserves identical relative frequencies of trinucleotides in each quadruplet and in the whole genome. According to our calculations, a change in frequencies in quadruplet matrices could lead to the creation of new species. Violation of quadruplet symmetries is practically inconsistent with life. DNA symmetries provide a key for understanding the restriction of disorder (entropy) due to mutations in the evolution of DNA.
Collapse
Affiliation(s)
- Marija Rosandić
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia; University hospital centre Zagreb (ret.), Zagreb, Croatia.
| | - Ines Vlahović
- Department of Physics, Faculty of Science, University of Zagreb, 10000 Zagreb, Croatia; Algebra University College, 10000 Zagreb, Croatia.
| | - Vladimir Paar
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia; Department of Physics, Faculty of Science, University of Zagreb, 10000 Zagreb, Croatia.
| |
Collapse
|
5
|
Cristadoro G, Degli Esposti M, Altmann EG. The common origin of symmetry and structure in genetic sequences. Sci Rep 2018; 8:15817. [PMID: 30361485 PMCID: PMC6202410 DOI: 10.1038/s41598-018-34136-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Accepted: 10/09/2018] [Indexed: 12/20/2022] Open
Abstract
Biologists have long sought a way to explain how statistical properties of genetic sequences emerged and are maintained through evolution. On the one hand, non-random structures at different scales indicate a complex genome organisation. On the other hand, single-strand symmetry has been scrutinised using neutral models in which correlations are not considered or irrelevant, contrary to empirical evidence. Different studies investigated these two statistical features separately, reaching minimal consensus despite sustained efforts. Here we unravel previously unknown symmetries in genetic sequences, which are organized hierarchically through scales in which non-random structures are known to be present. These observations are confirmed through the statistical analysis of the human genome and explained through a simple domain model. These results suggest that domain models which account for the cumulative action of mobile elements can explain simultaneously non-random structures and symmetries in genetic sequences.
Collapse
Affiliation(s)
- Giampaolo Cristadoro
- Dipartimento di Matematica e Applicazioni, Università di Milano-Bicocca, 20125, Milano, Italy.
| | | | - Eduardo G Altmann
- School of Mathematics and Statistics, University of Sydney, Sydney, 2006, NSW, Australia
| |
Collapse
|
6
|
Akhter S, Aziz RK, Kashef MT, Ibrahim ES, Bailey B, Edwards RA. Kullback Leibler divergence in complete bacterial and phage genomes. PeerJ 2017; 5:e4026. [PMID: 29204318 PMCID: PMC5712468 DOI: 10.7717/peerj.4026] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 10/22/2017] [Indexed: 12/11/2022] Open
Abstract
The amino acid content of the proteins encoded by a genome may predict the coding potential of that genome and may reflect lifestyle restrictions of the organism. Here, we calculated the Kullback–Leibler divergence from the mean amino acid content as a metric to compare the amino acid composition for a large set of bacterial and phage genome sequences. Using these data, we demonstrate that (i) there is a significant difference between amino acid utilization in different phylogenetic groups of bacteria and phages; (ii) many of the bacteria with the most skewed amino acid utilization profiles, or the bacteria that host phages with the most skewed profiles, are endosymbionts or parasites; (iii) the skews in the distribution are not restricted to certain metabolic processes but are common across all bacterial genomic subsystems; (iv) amino acid utilization profiles strongly correlate with GC content in bacterial genomes but very weakly correlate with the G+C percent in phage genomes. These findings might be exploited to distinguish coding from non-coding sequences in large data sets, such as metagenomic sequence libraries, to help in prioritizing subsequent analyses.
Collapse
Affiliation(s)
- Sajia Akhter
- Computational Science Research Center, San Diego State University, San Diego, CA, USA
| | - Ramy K Aziz
- Department of Microbiology and Immunology, Faculty of Pharmacy, Cairo University, Cairo, Egypt.,Department of Computer Science, San Diego State University, San Diego, CA, United States of America
| | - Mona T Kashef
- Department of Microbiology and Immunology, Faculty of Pharmacy, Cairo University, Cairo, Egypt
| | - Eslam S Ibrahim
- Department of Microbiology and Immunology, Faculty of Pharmacy, Cairo University, Cairo, Egypt
| | - Barbara Bailey
- Department of Mathematics & Statistics, San Diego State University, San Diego, CA, USA
| | - Robert A Edwards
- Computational Science Research Center, San Diego State University, San Diego, CA, USA.,Department of Computer Science, San Diego State University, San Diego, CA, United States of America.,Department of Mathematics & Statistics, San Diego State University, San Diego, CA, USA.,Department of Biology, San Diego State University, San Diego, CA, USA
| |
Collapse
|
7
|
Afreixo V, Rodrigues JMOS, Bastos CAC, Tavares AHMP, Silva RM. Exceptional Symmetry by Genomic Word : A Statistical Analysis. Interdiscip Sci 2016; 9:14-23. [PMID: 27866321 DOI: 10.1007/s12539-016-0200-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2016] [Revised: 11/02/2016] [Accepted: 11/04/2016] [Indexed: 01/12/2023]
Abstract
Single-strand DNA symmetry is pointed as a universal law observed in the genomes from all living organisms. It is a somewhat broadly defined concept, which has been refined into some more specific measurable effects. Here we discuss the exceptional symmetry effect. Exceptional symmetry is the symmetry effect beyond that expected in independence contexts, and it can be measured for each word, for each equivalent composition group, or globally, combining the effects of all possible words of a given length. Global exceptional symmetry was found in several species, but there are genomic words with no exceptional symmetry effect, whereas others show a very high exceptional symmetry effect. In this work, we discuss a measure to evaluate the exceptional symmetry effect by symmetric word pair, and compare it with others. We present a detailed study of the exceptional symmetry by symmetric pairs and take the CG content into account. We also introduce and discuss the exceptional symmetry profile for the DNA of each organism, and we perform a multiple comparison for 31 genomes: 7 viruses; 5 archaea; 5 bacteria; 14 eukaryotes.
Collapse
Affiliation(s)
- Vera Afreixo
- iBiMED-Institute of Biomedicine, IEETA-Institute of Electronic Engineering and Informatics of Aveiro, CIDMA- Center for Research and Development in Mathematics and Applications, Department of Mathematics, University of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal.
| | - João M O S Rodrigues
- IEETA-Institute of Electronic Engineering and Informatics of Aveiro, Department of Electronics, Telecommunications and Informatics, University of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal
| | - Carlos A C Bastos
- IEETA-Institute of Electronic Engineering and Informatics of Aveiro, Department of Electronics, Telecommunications and Informatics, University of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal
| | - Ana H M P Tavares
- iBiMED-Institute of Biomedicine, Department of Mathematics, University of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal
| | - Raquel M Silva
- iBiMED-Institute of Biomedicine, IEETA-Institute of Electronic Engineering and Informatics of Aveiro, Department of Medical Sciences, University of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal
| |
Collapse
|
8
|
Shporer S, Chor B, Rosset S, Horn D. Inversion symmetry of DNA k-mer counts: validity and deviations. BMC Genomics 2016; 17:696. [PMID: 27580854 PMCID: PMC5006273 DOI: 10.1186/s12864-016-3012-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2016] [Accepted: 08/11/2016] [Indexed: 01/25/2023] Open
Abstract
Background The generalization of the second Chargaff rule states that counts of any string of nucleotides of length k on a single chromosomal strand equal the counts of its inverse (reverse-complement) k-mer. This Inversion Symmetry (IS) holds for many species, both eukaryotes and prokaryotes, for ranges of k which may vary from 7 to 10 as chromosomal lengths vary from 2Mbp to 200 Mbp. The existence of IS has been demonstrated in the literature, and other pair-wise candidate symmetries (e.g. reverse or complement) have been ruled out. Results Studying IS in the human genome, we find that IS holds up to k = 10. It holds for complete chromosomes, also after applying the low complexity mask. We introduce a numerical IS criterion, and define the k-limit, KL, as the highest k for which this criterion is valid. We demonstrate that chromosomes of different species, as well as different human chromosomal sections, follow a universal logarithmic dependence of KL ~ 0.7 ln(L), where L is the length of the chromosome. We introduce a statistical IS-Poisson model that allows us to apply confidence measures to our numerical findings. We find good agreement for large k, where the variance of the Poisson distribution determines the outcome of the analysis. This model predicts the observed logarithmic increase of KL with length. The model allows us to conclude that for low k, e.g. k = 1 where IS becomes the 2nd Chargaff rule, IS violation, although extremely small, is significant. Studying this violation we come up with an unexpected observation for human chromosomes, finding a meaningful correlation with the excess of genes on particular strands. Conclusions Our IS-Poisson model agrees well with genomic data, and accounts for the universal behavior of k-limits. For low k we point out minute, yet significant, deviations from the model, including excess of counts of nucleotides T vs A and G vs C on positive strands of human chromosomes. Interestingly, this correlates with a significant (but small) excess of genes on the same positive strands. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3012-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sagi Shporer
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, 69978, Israel
| | - Benny Chor
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, 69978, Israel
| | - Saharon Rosset
- Sackler School of Mathematical Sciences, Tel Aviv University, Tel Aviv, 69978, Israel
| | - David Horn
- Sackler School of Physics and Astronomy, Tel Aviv University, Tel Aviv, 69978, Israel.
| |
Collapse
|
9
|
Rosandić M, Vlahović I, Glunčić M, Paar V. Trinucleotide's quadruplet symmetries and natural symmetry law of DNA creation ensuing Chargaff's second parity rule. J Biomol Struct Dyn 2016; 34:1383-94. [PMID: 26524490 DOI: 10.1080/07391102.2015.1080628] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
For almost 50 years the conclusive explanation of Chargaff's second parity rule (CSPR), the equality of frequencies of nucleotides A=T and C=G or the equality of direct and reverse complement trinucleotides in the same DNA strand, has not been determined yet. Here, we relate CSPR to the interstrand mirror symmetry in 20 symbolic quadruplets of trinucleotides (direct, reverse complement, complement, and reverse) mapped to double-stranded genome. The symmetries of Q-box corresponding to quadruplets can be obtained as a consequence of Watson-Crick base pairing and CSPR together. Alternatively, assuming Natural symmetry law for DNA creation that each trinucleotide in one strand of DNA must simultaneously appear also in the opposite strand automatically leads to Q-box direct-reverse mirror symmetry which in conjunction with Watson-Crick base pairing generates CSPR. We demonstrate quadruplet's symmetries in chromosomes of wide range of organisms, from Escherichia coli to Neanderthal and human genomes, introducing novel quadruplet-frequency histograms and 3D-diagrams with combined interstrand frequencies. These "landscapes" are mutually similar in all mammals, including extinct Neanderthals, and somewhat different in most of older species. In human chromosomes 1-12, and X, Y the "landscapes" are almost identical and slightly different in the remaining smaller and telocentric chromosomes. Quadruplet frequencies could provide a new robust tool for characterization and classification of genomes and their evolutionary trajectories.
Collapse
Affiliation(s)
- Marija Rosandić
- a Croatian Academy of Sciences and Arts, HAZU, Bioinformatics and Biological Physics , Zrinski trg 11, 10000 Zagreb , Croatia
| | - Ines Vlahović
- b Faculty of Science , University of Zagreb , Bijenicka 32, 10000 Zagreb , Croatia
| | - Matko Glunčić
- b Faculty of Science , University of Zagreb , Bijenicka 32, 10000 Zagreb , Croatia
| | - Vladimir Paar
- a Croatian Academy of Sciences and Arts, HAZU, Bioinformatics and Biological Physics , Zrinski trg 11, 10000 Zagreb , Croatia.,b Faculty of Science , University of Zagreb , Bijenicka 32, 10000 Zagreb , Croatia
| |
Collapse
|
10
|
Afreixo V, Rodrigues JMOS, Bastos CAC, Silva RM. The exceptional genomic word symmetry along DNA sequences. BMC Bioinformatics 2016; 17:59. [PMID: 26842742 PMCID: PMC4738807 DOI: 10.1186/s12859-016-0905-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2015] [Accepted: 01/19/2016] [Indexed: 01/05/2023] Open
Abstract
Background The second Chargaff’s parity rule and its extensions are recognized as universal phenomena in DNA sequences. However, parity of the frequencies of reverse complementary oligonucleotides could be a mere consequence of the single nucleotide parity rule, if nucleotide independence is assumed. Exceptional symmetry (symmetry beyond that expected under an independent nucleotide assumption) was proposed previously as a meaningful measure of the extension of the second parity rule to oligonucleotides. The global exceptional symmetry was detected in long and short genomes. Results To explore the exceptional genomic word symmetry along the genome sequences, we propose a sliding window method to extract the values of exceptional symmetry (for all words or by word groups). We compare the exceptional symmetry effect size distribution in all human chromosomes against control scenarios (positive and negative controls), testing the differences and performing a residual analysis. We explore local exceptional symmetry in equivalent composition word groups, and find that the behaviour of the local exceptional symmetry depends on the word group. Conclusions We conclude that the exceptional symmetry is a local phenomenon in genome sequences, with distinct characteristics along the sequence of each chromosome. The local exceptional symmetry along the genomic sequences shows outlying segments, and those segments have high biological annotation density.
Collapse
Affiliation(s)
- Vera Afreixo
- Department of Mathematics, University of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal. .,Department of Medical Sciences and Institute of Biomedicine - iBiMED, University of Aveiro, 3810-193 Aveiro, Portugal, Campus Universitário de Santiago, Aveiro, Portugal. .,IEETA-Institute of Electronic Engineering and Informatics of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal.
| | - João M O S Rodrigues
- Department of Electronics, Telecommunications and Informatics, University of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal. .,IEETA-Institute of Electronic Engineering and Informatics of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal.
| | - Carlos A C Bastos
- Department of Electronics, Telecommunications and Informatics, University of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal. .,IEETA-Institute of Electronic Engineering and Informatics of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal.
| | - Raquel M Silva
- Department of Medical Sciences and Institute of Biomedicine - iBiMED, University of Aveiro, 3810-193 Aveiro, Portugal, Campus Universitário de Santiago, Aveiro, Portugal. .,IEETA-Institute of Electronic Engineering and Informatics of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal.
| |
Collapse
|
11
|
Stems and Loops. Evol Bioinform Online 2016. [DOI: 10.1007/978-3-319-28755-3_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022] Open
|
12
|
Srinivasan R, Scolari VF, Lagomarsino MC, Seshasayee ASN. The genome-scale interplay amongst xenogene silencing, stress response and chromosome architecture in Escherichia coli. Nucleic Acids Res 2014; 43:295-308. [PMID: 25429971 PMCID: PMC4288151 DOI: 10.1093/nar/gku1229] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The gene expression state of exponentially growing Escherichia coli cells is manifested by high expression of essential and growth-associated genes and low levels of stress-related and horizontally acquired genes. An important player in maintaining this homeostasis is the H-NS-StpA gene silencing system. A Δhns-stpA deletion mutant results in high expression of otherwise-silent horizontally acquired genes, many located in the terminus-half of the chromosome, and an indirect downregulation of many highly expressed genes. The Δhns-stpA double mutant displays slow growth. Using laboratory evolution we address the evolutionary strategies that E. coli would adopt to redress this gene expression imbalance. We show that two global gene regulatory mutations-(i) point mutations inactivating the stress-responsive sigma factor RpoS or σ38 and (ii) an amplification of ∼40% of the chromosome centred around the origin of replication-converge in partially reversing the global gene expression imbalance caused by Δhns-stpA. Transcriptome data of these mutants further show a three-way link amongst the global gene regulatory networks of H-NS and σ38, as well as chromosome architecture. Increasing gene expression around the terminus of replication results in a decrease in the expression of genes around the origin and vice versa; this appears to be a persistent phenomenon observed as an association across ∼300 publicly-available gene expression data sets for E. coli. These global suppressor effects are transient and rapidly give way to more specific mutations, whose roles in reversing the growth defect of H-NS mutations remain to be understood.
Collapse
Affiliation(s)
- Rajalakshmi Srinivasan
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, GKVK, Bellary Road, Bangalore 560065, India Manipal University, Manipal 576104, India
| | - Vittore Ferdinando Scolari
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, GKVK, Bellary Road, Bangalore 560065, India Manipal University, Manipal 576104, India Genomic Physics Group, UMR 7238 CNRS Microorganism Genomics, UPMC, Paris, France
| | - Marco Cosentino Lagomarsino
- Genomic Physics Group, UMR 7238 CNRS Microorganism Genomics, UPMC, Paris, France Sorbonne Universités, UPMC Univ Paris 06, UMR 7238, Computational and Quantitative Biology, 15 Rue de l'École de Médecine Paris, France CNRS, UMR 7238, Paris, France
| | - Aswin Sai Narain Seshasayee
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, GKVK, Bellary Road, Bangalore 560065, India
| |
Collapse
|
13
|
Afreixo V, Rodrigues JMOS, Bastos CAC. Analysis of single-strand exceptional word symmetry in the human genome: new measures. Biostatistics 2014; 16:209-21. [PMID: 25190514 DOI: 10.1093/biostatistics/kxu041] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Some previous studies suggest the extension of Chargaff's second rule (the phenomenon of symmetry in a single DNA strand) to long DNA words. However, in random sequences generated under an independent symbol model where complementary nucleotides have equal occurrence probabilities, we expect the phenomenon of symmetry to hold for any word length. In this work, we develop new statistical methods to measure the exceptional symmetry. Exceptional symmetry is a refinement of Chargaff's second parity rule that highlights the words whose frequency of occurrence is similar to that of its reversed complement but dissimilar to the frequencies of occurrence of other words which contain the same number of nucleotides A or T. We analyze words of lengths up to 12 in the complete human genome and in each chromosome separately. We assess exceptional symmetry globally, by word group, and by word. We conclude that the global symmetry present in the human genome is clearly exceptional and significant. The chromosomes present distinct exceptional symmetry profiles. There are several exceptional word groups and exceptional words with a strong exceptional symmetry.
Collapse
Affiliation(s)
- Vera Afreixo
- Department of Mathematics, University of Aveiro, 3810-193 Aveiro, PortugalCIDMA, University of Aveiro, 3810-193 Aveiro, PortugalIEETA, University of Aveiro, 3810-193 Aveiro, Portugal
| | - João M O S Rodrigues
- Department of Electronics, Telecommunications and Informatics, University of Aveiro, 3810-193 Aveiro, PortugalIEETA, University of Aveiro, 3810-193 Aveiro, Portugal
| | - Carlos A C Bastos
- Department of Electronics, Telecommunications and Informatics, University of Aveiro, 3810-193 Aveiro, PortugalIEETA, University of Aveiro, 3810-193 Aveiro, Portugal
| |
Collapse
|
14
|
Provata A, Nicolis C, Nicolis G. DNA viewed as an out-of-equilibrium structure. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2014; 89:052105. [PMID: 25353737 DOI: 10.1103/physreve.89.052105] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2013] [Indexed: 05/02/2023]
Abstract
The complexity of the primary structure of human DNA is explored using methods from nonequilibrium statistical mechanics, dynamical systems theory, and information theory. A collection of statistical analyses is performed on the DNA data and the results are compared with sequences derived from different stochastic processes. The use of χ^{2} tests shows that DNA can not be described as a low order Markov chain of order up to r=6. Although detailed balance seems to hold at the level of a binary alphabet, it fails when all four base pairs are considered, suggesting spatial asymmetry and irreversibility. Furthermore, the block entropy does not increase linearly with the block size, reflecting the long-range nature of the correlations in the human genomic sequences. To probe locally the spatial structure of the chain, we study the exit distances from a specific symbol, the distribution of recurrence distances, and the Hurst exponent, all of which show power law tails and long-range characteristics. These results suggest that human DNA can be viewed as a nonequilibrium structure maintained in its state through interactions with a constantly changing environment. Based solely on the exit distance distribution accounting for the nonequilibrium statistics and using the Monte Carlo rejection sampling method, we construct a model DNA sequence. This method allows us to keep both long- and short-range statistical characteristics of the native DNA data. The model sequence presents the same characteristic exponents as the natural DNA but fails to capture spatial correlations and point-to-point details.
Collapse
Affiliation(s)
- A Provata
- Institute of Nanoscience and Nanotechnology, National Center for Scientific Research "Demokritos", 15310 Athens, Greece and Interdisciplinary Center for Nonlinear Phenomena and Complex Systems, Université Libre de Bruxelles, Campus Plaine, CP. 231, 1050 Bruxelles, Belgium
| | - C Nicolis
- Institut Royal Météorologique de Belgique, 3 Avenue Circulaire, 1180 Bruxelles, Belgium
| | - G Nicolis
- Interdisciplinary Center for Nonlinear Phenomena and Complex Systems, Université Libre de Bruxelles, Campus Plaine, CP. 231, 1050 Bruxelles, Belgium
| |
Collapse
|
15
|
Li S, Yang J. System analysis of synonymous codon usage biases in archaeal virus genomes. J Theor Biol 2014; 355:128-39. [PMID: 24685889 PMCID: PMC7094158 DOI: 10.1016/j.jtbi.2014.03.022] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2013] [Revised: 03/11/2014] [Accepted: 03/12/2014] [Indexed: 12/30/2022]
Abstract
Recent studies of geothermally heated aquatic ecosystems have found widely divergent viruses with unusual morphotypes. Archaeal viruses isolated from these hot habitats usually have double-stranded DNA genomes, linear or circular, and can infect members of the Archaea domain. In this study, the synonymous codon usage bias (SCUB) and dinucleotide composition in the available complete archaeal virus genome sequences have been investigated. It was found that there is a significant variation in SCUB among different Archaeal virus species, which is mainly determined by the base composition. The outcome of correspondence analysis (COA) and Spearman׳s rank correlation analysis shows that codon usage of selected archaeal virus genes depends mainly on GC richness of genome, and the gene׳s function, albeit with smaller effects, also contributes to codon usage in this virus. Furthermore, this investigation reveals that aromaticity of each protein is also critical in affecting SCUB of these viral genes although it was less important than that of the mutational bias. Especially, mutational pressure may influence SCUB in SIRV1, SIRV2, ARV1, AFV1, and PhiCh1 viruses, whereas translational selection could play a leading role in HRPV1׳s SCUB. These conclusions not only can offer an insight into the codon usage biases of archaeal virus and subsequently the possible relationship between archaeal viruses and their host, but also may help in understanding the evolution of archaeal viruses and their gene classification, and more helpful to explore the origin of life and the evolution of biology. The SCUB of archaeal virus genes depends mainly on GC richness of genome. The mutational pressure is the main factor that influences SCUB. The aromaticity of each protein is also critical in affecting SCUB. The translational selection could play a leading role in HRPV1׳s SCUB. The mode is helpful to explore the origin of life and the evolution of biology.
Collapse
Affiliation(s)
- Sen Li
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Science, Nanjing University, Nanjing 210093, China
| | - Jie Yang
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Science, Nanjing University, Nanjing 210093, China.
| |
Collapse
|
16
|
Afreixo V, Bastos CA, Garcia SP, Rodrigues JM, Pinho AJ, Ferreira PJ. The breakdown of the word symmetry in the human genome. J Theor Biol 2013; 335:153-9. [DOI: 10.1016/j.jtbi.2013.06.032] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2012] [Revised: 05/30/2013] [Accepted: 06/25/2013] [Indexed: 01/13/2023]
|
17
|
Patterns of nucleotide asymmetries in plant and animal genomes. Biosystems 2013; 111:181-9. [PMID: 23438636 DOI: 10.1016/j.biosystems.2013.02.001] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2012] [Revised: 11/29/2012] [Accepted: 02/07/2013] [Indexed: 11/20/2022]
Abstract
Symmetry in biology provides many intriguing puzzles to the scientist's mind. Chargaff's second parity rule states a symmetric distribution of oligonucleotides within a single strand of double-stranded DNA. While this rule has been verified in a wide range of microbial genomes, it still awaits explanation. In our study, we inquired into patterns of mono- and trinucleotide intra-strand parity in complex plant genomic sequences that became available during the last few years, and compared these to equally complex animal genomes. The degree and patterns of deviation from Chargaff's second rule were different between plant and animal species. We observed a universal inter-chromosomal homogeneity of mononucleotide skews in coding sequences of plant chromosomes, while the base composition of animal coding sequences differed between chromosomes even within a single species. We also found differences in the base composition of dicot introns in comparison to those of monocots. These genome-wide patterns were limited to genic regions and were not encountered in inter-genic sequences. We discuss the implications of our findings in relation to hypotheses about functional correlations of intra-strand parity which have hitherto been put forward. Furthermore, we propose more recent polyploidization and subsequent homogenization of homoeologues as a possible reason for more homogeneous skew patterns in plants.
Collapse
|
18
|
Applying Shannon's information theory to bacterial and phage genomes and metagenomes. Sci Rep 2013; 3:1033. [PMID: 23301154 PMCID: PMC3539204 DOI: 10.1038/srep01033] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2012] [Accepted: 11/20/2012] [Indexed: 01/12/2023] Open
Abstract
All sequence data contain inherent information that can be measured by Shannon's uncertainty theory. Such measurement is valuable in evaluating large data sets, such as metagenomic libraries, to prioritize their analysis and annotation, thus saving computational resources. Here, Shannon's index of complete phage and bacterial genomes was examined. The information content of a genome was found to be highly dependent on the genome length, GC content, and sequence word size. In metagenomic sequences, the amount of information correlated with the number of matches found by comparison to sequence databases. A sequence with more information (higher uncertainty) has a higher probability of being significantly similar to other sequences in the database. Measuring uncertainty may be used for rapid screening for sequences with matches in available database, prioritizing computational resources, and indicating which sequences with no known similarities are likely to be important for more detailed analysis.
Collapse
|
19
|
Comin M, Verzotto D. Alignment-free phylogeny of whole genomes using underlying subwords. Algorithms Mol Biol 2012; 7:34. [PMID: 23216990 PMCID: PMC3549825 DOI: 10.1186/1748-7188-7-34] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2012] [Accepted: 11/29/2012] [Indexed: 11/24/2022] Open
Abstract
Background With the progress of modern sequencing technologies a large number of complete genomes are now available. Traditionally the comparison of two related genomes is carried out by sequence alignment. There are cases where these techniques cannot be applied, for example if two genomes do not share the same set of genes, or if they are not alignable to each other due to low sequence similarity, rearrangements and inversions, or more specifically to their lengths when the organisms belong to different species. For these cases the comparison of complete genomes can be carried out only with ad hoc methods that are usually called alignment-free methods. Methods In this paper we propose a distance function based on subword compositions called Underlying Approach (UA). We prove that the matching statistics, a popular concept in the field of string algorithms able to capture the statistics of common words between two sequences, can be derived from a small set of “independent” subwords, namely the irredundant common subwords. We define a distance-like measure based on these subwords, such that each region of genomes contributes only once, thus avoiding to count shared subwords a multiple number of times. In a nutshell, this filter discards subwords occurring in regions covered by other more significant subwords. Results The Underlying Approach (UA) builds a scoring function based on this set of patterns, called underlying. We prove that this set is by construction linear in the size of input, without overlaps, and can be efficiently constructed. Results show the validity of our method in the reconstruction of phylogenetic trees, where the Underlying Approach outperforms the current state of the art methods. Moreover, we show that the accuracy of UA is achieved with a very small number of subwords, which in some cases carry meaningful biological information. Availability http://www.dei.unipd.it/∼ciompin/main/underlying.html
Collapse
|
20
|
|