201
|
|
202
|
Zhang SH, Huang YZ. Limited contribution of stem-loop potential to symmetry of single-stranded genomic DNA. ACTA ACUST UNITED AC 2009; 26:478-85. [PMID: 20031973 DOI: 10.1093/bioinformatics/btp703] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
MOTIVATION The phenomenon of strand symmetry, which may provide clues to genome evolution, exists in all prokaryotic and eukaryotic genomes studied. Several possible mechanisms for its origins have been proposed, including: no strand biases for mutation and selection, strand inversion and selection of stem-loop structures. However, the relative contributions of these mechanisms to strand symmetry are not clear. In this article, we studied specifically the role of stem-loop potential of single-stranded DNA in strand symmetry. RESULTS We analyzed the complete genomes of 90 prokaryotes. We found that most oligonucleotides (pentanucleotides and higher) do not have a reverse complement in close proximity in the genomic sequences. Combined with further analysis, we conclude that the contribution of the widespread stem-loop potential of single-stranded genomic DNA to the formation and maintenance of strand symmetry would be very limited, at least for higher-order oligonucleotides. Therefore, other possible causes for strand symmetry must be taken into account to a deeper degree.
Collapse
Affiliation(s)
- Shang-Hong Zhang
- The Key Laboratory of Gene Engineering of Ministry of Education, and Biotechnology Research Center, Sun Yat-sen University, Guangzhou 510275, China.
| | | |
Collapse
|
203
|
Powdel BR, Satapathy SS, Kumar A, Jha PK, Buragohain AK, Borah M, Ray SK. A study in entire chromosomes of violations of the intra-strand parity of complementary nucleotides (Chargaff's second parity rule). DNA Res 2009; 16:325-43. [PMID: 19861381 PMCID: PMC2780954 DOI: 10.1093/dnares/dsp021] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Chargaff's rule of intra-strand parity (ISP) between complementary mono/oligonucleotides in chromosomes is well established in the scientific literature. Although a large numbers of papers have been published citing works and discussions on ISP in the genomic era, scientists are yet to find all the factors responsible for such a universal phenomenon in the chromosomes. In the present work, we have tried to address the issue from a new perspective, which is a parallel feature to ISP. The compositional abundance values of mono/oligonucleotides were determined in all non-overlapping sub-chromosomal regions of specific size. Also the frequency distributions of the mono/oligonucleotides among the regions were compared using the Kolmogorov–Smirnov test. Interestingly, the frequency distributions between the complementary mono/oligonucleotides revealed statistical similarity, which we named as intra-strand frequency distribution parity (ISFDP). ISFDP was observed as a general feature in chromosomes of bacteria, archaea and eukaryotes. Violation of ISFDP was also observed in several chromosomes. Chromosomes of different strains belonging a species in bacteria/archaea (Haemophilus influenza, Xylella fastidiosa etc.) and chromosomes of a eukaryote are found to be different among each other with respect to ISFDP violation. ISFDP correlates weakly with ISP in chromosomes suggesting that the latter one is not entirely responsible for the former. Asymmetry of replication topography and composition of forward-encoded sequences between the strands in chromosomes are found to be insufficient to explain the ISFDP feature in all chromosomes. This suggests that multiple factors in chromosomes are responsible for establishing ISFDP.
Collapse
Affiliation(s)
- B R Powdel
- 1Department of Mathematical Sciences, Tezpur University, Tezpur, Assam 784 028, India
| | | | | | | | | | | | | |
Collapse
|
204
|
Poptsova MS, Larionov SA, Ryadchenko EV, Rybalko SD, Zakharov IA, Loskutov A. Hidden chromosome symmetry: in silico transformation reveals symmetry in 2D DNA walk trajectories of 671 chromosomes. PLoS One 2009; 4:e6396. [PMID: 19636424 PMCID: PMC2712679 DOI: 10.1371/journal.pone.0006396] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2009] [Accepted: 06/23/2009] [Indexed: 11/18/2022] Open
Abstract
Maps of 2D DNA walk of 671 examined chromosomes show composition complexity change from symmetrical half-turn in bacteria to pseudo-random trajectories in archaea, fungi and humans. In silico transformation of gene order and strand position returns most of the analyzed chromosomes to a symmetrical bacterial-like state with one transition point. The transformed chromosomal sequences also reveal remarkable segmental compositional symmetry between regions from different strands located equidistantly from the transition point. Despite extensive chromosome rearrangement the relation of gene numbers on opposite strands for chromosomes of different taxa varies in narrow limits around unity with Pearson coefficient r = 0.98. Similar relation is observed for total genes' length (r = 0.86) and cumulative GC (r = 0.95) and AT (r = 0.97) skews. This is also true for human coding sequences (CDS), which comprise only several percent of the entire chromosome length. We found that frequency distributions of the length of gene clusters, continuously located on the same strand, have close values for both strands. Eukaryotic gene distribution is believed to be non-random. Contribution of different subsystems to the noted symmetries and distributions, and evolutionary aspects of symmetry are discussed.
Collapse
Affiliation(s)
- Maria S Poptsova
- University of Connecticut, Storrs, Connecticut, United States of America.
| | | | | | | | | | | |
Collapse
|
205
|
Singh TR, Pardasani KR. Ambush hypothesis revisited: Evidences for phylogenetic trends. Comput Biol Chem 2009; 33:239-44. [PMID: 19473880 DOI: 10.1016/j.compbiolchem.2009.04.002] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2008] [Revised: 04/15/2009] [Accepted: 04/23/2009] [Indexed: 10/20/2022]
Abstract
Recoding events occur in competition with standard readout of the transcript, and are site-specific. Recoding is the reprogramming of mRNA translation by localized alterations in the standard translational rules. Frame-shifting is one class of recoding and defined as protein translations that start not at the first, but either at the second (+1 frame-shift) or the third (-1 frame-shift) nucleotide of the codon. Coding sequences lack stop codons, but frame-shifted sequences contain many stop codons, termed off-frame stops or hidden stops. These hidden stops terminate frame-shifted translation, potentially decreasing energy, and resource waste on non-functional proteins. Our results support this putative ancient adaptive event for the selection of codons that can be part of hidden stop codons. All taxonomic groups represent positive correlation between codon usage frequencies and contribution of codons to hidden stops in off-frame context. Our analysis on nuclear and mitochondrial genomic data revealed phylogenomic selection of ambush mechanism. Strongest impact of this event was found in viruses and bacteria. It has been suggested that this mechanism has occurred and been utilized in the early stages of evolution.
Collapse
Affiliation(s)
- Tiratha Raj Singh
- Department of Zoology, Faculty of Life Sciences, Tel-Aviv University, Tel-Aviv 69978, Israel.
| | | |
Collapse
|
206
|
Abstract
In spite of the importance of point mutations for evolution and human diseases, their natural spectrum of incidence in different species is not known. Here I propose to determine these spectra by comparing consecutive sequence periods in stretches of repetitive DNA. The article presents the analysis of more than 51,000 such point mutations identified by this approach in the genomes of human, chimpanzee, rat, mouse, pufferfish, zebrafish, and sea squirt. I propose to explain the observed spectra by auto-mutagenic mechanisms of genome variation involving the inter-conversions of nucleotides, single base-pair inversions and their combinations.
Collapse
Affiliation(s)
- Guenter Albrecht-Buehler
- Department of Cell and Molecular Biology, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA.
| |
Collapse
|
207
|
Hamady M, Wilson SA, Zaneveld J, Sueoka N, Knight R. CodonExplorer: an online tool for analyzing codon usage and sequence composition, scaling from genes to genomes. Bioinformatics 2009; 25:1331-2. [PMID: 19279067 DOI: 10.1093/bioinformatics/btp141] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
DNA composition in general, and codon usage in particular, is crucial for understanding gene function and evolution. CodonExplorer, available online at http://bmf.colorado.edu/codonexplorer/, is an online tool and interactive database that contains millions of genes, allowing rapid exploration of the factors governing gene and genome compositional evolution and exploiting GC content and codon usage frequency to identify genes with composition suggesting high levels of expression or horizontal transfer.
Collapse
Affiliation(s)
- Micah Hamady
- Department of Computer Science, University of Colorado, Boulder, CO 80309, USA
| | | | | | | | | |
Collapse
|
208
|
Valenzuela CY. Non-random pre-transcriptional evolution in HIV-1. A refutation of the foundational conditions for neutral evolution. Genet Mol Biol 2009; 32:159-69. [PMID: 21637663 PMCID: PMC3032973 DOI: 10.1590/s1415-47572009005000025] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2008] [Accepted: 06/03/2008] [Indexed: 12/02/2022] Open
Abstract
The complete base sequence of HIV-1 virus and GP120 ENV gene were analyzed to establish their distance to the expected neutral random sequence. An especial methodology was devised to achieve this aim. Analyses included: a) proportion of dinucleotides (signatures); b) homogeneity in the distribution of dinucleotides and bases (isochores) by dividing both segments in ten and three sub-segments, respectively; c) probability of runs of bases and No-bases according to the Bose-Einstein distribution. The analyses showed a huge deviation from the random distribution expected from neutral evolution and neutral-neighbor influence of nucleotide sites. The most significant result is the tremendous lack of CG dinucleotides (p < 10-50 ), a selective trait of eukaryote and not of single stranded RNA virus genomes. Results not only refute neutral evolution and neutral neighbor influence, but also strongly indicate that any base at any nucleotide site correlates with all the viral genome or sub-segments. These results suggest that evolution of HIV-1 is pan-selective rather than neutral or nearly neutral.
Collapse
Affiliation(s)
- Carlos Y Valenzuela
- Programa Genética Humana, Instituto de Ciencias Biomédicas, Facultad de Medicina, Universidad de Chile, Santiago Chile
| |
Collapse
|
209
|
Zaneveld J, Hamady M, Sueoka N, Knight R. CodonExplorer: an interactive online database for the analysis of codon usage and sequence composition. Methods Mol Biol 2009; 537:207-32. [PMID: 19378146 PMCID: PMC2953947 DOI: 10.1007/978-1-59745-251-9_10] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2025]
Abstract
The analysis of DNA composition and codon usage reveals many factors that influence the evolution of genes and genomes. In this chapter, we show how to use CodonExplorer, a web tool and interactive database that contains millions of genes, to better understand the principles governing evolution at the single gene and whole-genome level. We present principles and practical procedures for using analyses of GC content and codon usage frequency to identify highly expressed or horizontally transferred genes and to study the relative contribution of different types of mutation to gene and genome composition. CodonExplorer's combination of a user-friendly web interface and a comprehensive genomic database makes these diverse analyses fast and straightforward to perform. CodonExplorer is thus a powerful tool that facilitates and automates a wide range of compositional analyses.
Collapse
Affiliation(s)
- Jesse Zaneveld
- Department of Molecular, Cellular, and Developmental Biology, University of Colorado, Boulder, CO, USA
| | | | | | | |
Collapse
|
210
|
Pallejà A, Guzman E, Garcia-Vallvé S, Romeu A. In silico prediction of the origin of replication among bacteria: a case study of Bacteroides thetaiotaomicron. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2008; 12:201-10. [PMID: 18582175 DOI: 10.1089/omi.2008.0004] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
The initiation of chromosomal replication occurs only once during the prokaryote cell cycle. Some origins of replication have been experimentally determined and have led to the development of in silico approaches to find the origin of replication among other prokaryotes. DNA base composition asymmetry is the basis of numerous in silico methods used to detect the origin and terminus of replication in prokaryotes. However, the composition asymmetry does not allow us to locate precisely the positions of the origin and terminus. Since DNA replication is a key step in the cell cycle it is important to determine properly the origin and terminus regions. Therefore, we have reviewed here the methods, tools, and databases for predicting the origins and terminuses of replication, and we have proposed some complementary analyses to reinforce these predictions. These analyses include finding the dnaA gene and its binding sites; making BLAST analyses of the intergenic sequences compared to related species; studying the gene order around the origin sequence; and studying the distribution of the genes encoded in the leading versus the lagging strand.
Collapse
Affiliation(s)
- Albert Pallejà
- Department of Biochemistry and Biotechnology, Evolutionary Genomics Group, Rovira i Virgili University, Tarragona, Catalunya, Spain.
| | | | | | | |
Collapse
|
211
|
Mugal CF, von Grünberg HH, Peifer M. Transcription-induced mutational strand bias and its effect on substitution rates in human genes. Mol Biol Evol 2008; 26:131-42. [PMID: 18974087 DOI: 10.1093/molbev/msn245] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
If substitution rates are not the same on the two complementary DNA strands, a substitution is considered strand asymmetric. Such substitutional strand asymmetries are determined here for the three most frequent types of substitution on the human genome (C --> T, A --> G, and G --> T). Substitution rate differences between both strands are estimated for 4,590 human genes by aligning all repeats occurring within the introns with their ancestral consensus sequences. For 1,630 of these genes, both coding strand and noncoding strand rates could be compared with rates in gene-flanking regions. All three rates considered are found to be on average higher on the coding strand and lower on the transcribed strand in comparison to their values in the gene-flanking regions. This finding points to the simultaneous action of rate-increasing effects on the coding strand--such as increased adenine and cytosine deamination--and transcription-coupled repair as a rate-reducing effect on the transcribed strand. The common behavior of the three rates leads to strong correlations of the rate asymmetries: Whenever one rate is strand biased, the other two rates are likely to show the same bias. Furthermore, we determine all three rate asymmetries as a function of time: the A --> G and G --> T rate asymmetries are both found to be constant in time, whereas the C --> T rate asymmetry shows a pronounced time dependence, an observation that explains the difference between our results and those of an earlier work by Green et al. (2003. Transcription-associated mutational asymmetry in mammalian evolution. Nat Genet. 33:514-517.). Finally, we show that in addition to transcription also the replication process biases the substitution rates in genes.
Collapse
Affiliation(s)
- Carina F Mugal
- Institute of Chemistry, Karl-Franzens University Graz, Graz, Austria
| | | | | |
Collapse
|
212
|
Peifer M, Karro JE, von Grünberg HH. Is there an acceleration of the CpG transition rate during the mammalian radiation? Bioinformatics 2008; 24:2157-64. [PMID: 18662928 PMCID: PMC2553435 DOI: 10.1093/bioinformatics/btn391] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2008] [Revised: 07/27/2008] [Accepted: 07/27/2008] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION In this article we build a model of the CpG dinucleotide substitution rate and use it to challenge the claim that, that rate underwent a sudden mammalian-specific increase approximately 90 million years ago. The evidence supporting this hypothesis comes from the application of a model of neutral substitution rates able to account for elevated CpG dinucleotide substitution rates. With the initial goal of improving that model's accuracy, we introduced a modification enabling us to account for boundary effects arising by the truncation of the Markov field, as well as improving the optimization procedure required for estimating the substitution rates. RESULTS When using this modified method to reproduce the supporting analysis, the evidence of the rate shift vanished. Our analysis suggests that the CpG-specific rate has been constant over the relevant time period and that the asserted acceleration of the CpG rate is likely an artifact of the original model.
Collapse
Affiliation(s)
- M Peifer
- Institute of Chemistry, Karl-Franzens University Graz, Graz, Austria.
| | | | | |
Collapse
|
213
|
Squartini F, Arndt PF. Quantifying the stationarity and time reversibility of the nucleotide substitution process. Mol Biol Evol 2008; 25:2525-35. [PMID: 18682605 DOI: 10.1093/molbev/msn169] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Markov models describing the evolution of the nucleotide substitution process, widely used in phylogeny reconstruction, usually assume the hypotheses of stationarity and time reversibility. Although these models give meaningful results when applied to biological data, it is not clear if the 2 assumptions mentioned above hold and, if not, how much sequence evolution processes deviate from them. To this aim, we introduce 2 sets of indices that can be calculated from the nucleotide distribution and the substitution rates. The stationarity indices (STIs) can be used to test the validity of the equilibrium assumption. The irreversibility indices (IRIs) are derived from the Kolmogorov cycle conditions for time reversibility and quantify the degree of nontime reversibility of a process. We have computed STIs and IRIs for the evolutionary process of 2 lineages, Drosophila simulans and Homo sapiens. In the latter case, we use a modified form of the indices that takes into account the CpG decay process. In both cases, we find statistically significant deviations from the ideal case of a process that has reached stationarity and is time reversible.
Collapse
Affiliation(s)
- Federico Squartini
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany.
| | | |
Collapse
|
214
|
Sernova NV, Gelfand MS. Identification of replication origins in prokaryotic genomes. Brief Bioinform 2008; 9:376-91. [PMID: 18660512 DOI: 10.1093/bib/bbn031] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The availability of hundreds of complete bacterial genomes has created new challenges and simultaneously opportunities for bioinformatics. In the area of statistical analysis of genomic sequences, the studies of nucleotide compositional bias and gene bias between strands and replichores paved way to the development of tools for prediction of bacterial replication origins. Only a few (about 20) origin regions for eubacteria and archaea have been proven experimentally. One reason for that may be that this is now considered as an essentially bioinformatics problem, where predictions are sufficiently reliable not to run labor-intensive experiments, unless specifically needed. Here we describe the main existing approaches to the identification of replication origin (oriC) and termination (terC) loci in prokaryotic chromosomes and characterize a number of computational tools based on various skew types and other types of evidence. We also classify the eubacterial and archaeal chromosomes by predictability of their replication origins using skew plots. Finally, we discuss possible combined approaches to the identification of the oriC sites that may be used to improve the prediction tools, in particular, the analysis of DnaA binding sites using the comparative genomic methods.
Collapse
Affiliation(s)
- Natalia V Sernova
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Bolshoi Karetny pereulok, 19, Moscow, 127994, Russia
| | | |
Collapse
|
215
|
Marri PR, Golding GB. Gene amelioration demonstrated: the journey of nascent genes in bacteria. Genome 2008; 51:164-8. [DOI: 10.1139/g07-105] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Gene amelioration is the hypothesis that genes acquired via lateral gene transfer will, over time, acquire the molecular characteristics of the host genome. Species for which multiple strains have been sequenced permit a demonstration that this hypothesis is correct. We use 7 sequenced genomes of Streptococcus pyogenes and 6 sequenced genomes of Staphylococcus aureus to illustrate the action of amelioration on these genomes.
Collapse
Affiliation(s)
| | - G. Brian Golding
- Department of Biology, McMaster University, Hamilton, ON L8S 4K1, Canada
| |
Collapse
|
216
|
Sorimachi K, Okayasu T. Codon evolution is governed by linear formulas. Amino Acids 2008; 34:661-8. [PMID: 18180868 DOI: 10.1007/s00726-007-0024-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2007] [Accepted: 12/17/2007] [Indexed: 10/22/2022]
Abstract
When nucleotide (G, C, T and A) contents were plotted against each nucleotide, their relationships were clearly expressed by a linear formula, y = alphax + beta in the coding and non-coding regions. This linear relationship was obtained from the complete single-stranded DNA. Similarly, nucleotide contents at all three codon positions were expressed by linear regression lines based on the content of each nucleotide. In addition, 64 codon usages were also expressed by linear formulas against nucleotide content. Thus, the nucleotide content not only in coding sequence but also in non-coding sequence can be expressed by a linear formula, y = alphax + beta, in 145 organisms (112 bacteria, 15 archaea and 18 eukaryotes). Based on these results, the ratio of C/T, G/T, C/A or G/A one can essentially estimate all four nucleotide contents in the complete single-stranded DNA, and the determination of any ratio of two kinds of nucleotides can essentially estimate four nucleotide contents, nucleotide contents at the three different codon positions and codon distributions at 64 codons in the coding region. The maximum and minimum values of G content were approximately 0.35 and approximately 0.15, respectively, among various organisms examined. Codon evolution occurs according to linear formulas between these two values.
Collapse
Affiliation(s)
- K Sorimachi
- Educational Support Center, Dokkyo Medical University, Mibu, Tochigi 321-0293, Japan.
| | | |
Collapse
|
217
|
Karro JE, Peifer M, Hardison RC, Kollmann M, von Grünberg HH. Exponential decay of GC content detected by strand-symmetric substitution rates influences the evolution of isochore structure. Mol Biol Evol 2007; 25:362-74. [PMID: 18042807 DOI: 10.1093/molbev/msm261] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The distribution of guanine and cytosine nucleotides throughout a genome, or the GC content, is associated with numerous features in mammals; understanding the pattern and evolutionary history of GC content is crucial to our efforts to annotate the genome. The local GC content is decaying toward an equilibrium point, but the causes and rates of this decay, as well as the value of the equilibrium point, remain topics of debate. By comparing the results of 2 methods for estimating local substitution rates, we identify 620 Mb of the human genome in which the rates of the various types of nucleotide substitutions are the same on both strands. These strand-symmetric regions show an exponential decay of local GC content at a pace determined by local substitution rates. DNA segments subjected to higher rates experience disproportionately accelerated decay and are AT rich, whereas segments subjected to lower rates decay more slowly and are GC rich. Although we are unable to draw any conclusions about causal factors, the results support the hypothesis proposed by Khelifi A, Meunier J, Duret L, and Mouchiroud D (2006. GC content evolution of the human and mouse genomes: insights from the study of processed pseudogenes in regions of different recombination rates. J Mol Evol. 62:745-752.) that the isochore structure has been reshaped over time. If rate variation were a determining factor, then the current isochore structure of mammalian genomes could result from the local differences in substitution rates. We predict that under current conditions strand-symmetric portions of the human genome will stabilize at an average GC content of 30% (considerably less than the current 42%), thus confirming that the human genome has not yet reached equilibrium.
Collapse
Affiliation(s)
- J E Karro
- Department of Computer Science and Systems Analysis, Miami University, Ohio, USA.
| | | | | | | | | |
Collapse
|
218
|
Touchon M, Rocha EPC. From GC skews to wavelets: a gentle guide to the analysis of compositional asymmetries in genomic data. Biochimie 2007; 90:648-59. [PMID: 17988781 DOI: 10.1016/j.biochi.2007.09.015] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2007] [Accepted: 09/21/2007] [Indexed: 12/29/2022]
Abstract
Compositional asymmetries are pervasive in DNA sequences. They are the result of the asymmetric interactions between DNA and cellular mechanisms such as replication and transcription. Here, we review many of the methods that have been proposed over the years to analyse compositional asymmetries in DNA sequences. Among these we list GC skews, oligonucleotide skews and wavelets, which among other uses have been extensively employed to delimitate origins and termini of replication in genomes. We also review the use of multivariate methods, such as factorial correspondence analysis, discriminant analysis and analysis of variance, which allow assigning compositional strand asymmetries to the different biological processes shaping sequence composition. Finally, we review methods that have been used to infer substitution matrices and allow understanding the mutational processes underlying strand asymmetry. We focus on replication asymmetries because they have been more thoroughly studied, but the methods may be adapted, and often are, to other problems. Although strand asymmetry has been studied more frequently through compositional skews of nucleotides or oligonucleotides, we recall that, depending on the goal of the analysis, other methods may be more appropriate to answer certain biological questions. We also refer to programs freely available to analyse strand asymmetry.
Collapse
Affiliation(s)
- Marie Touchon
- Atelier de Bioinformatique, Université Pierre et Marie Curie-Paris 6, Paris, France
| | | |
Collapse
|
219
|
Evolutionary implications of inversions that have caused intra-strand parity in DNA. BMC Genomics 2007; 8:160. [PMID: 17562011 PMCID: PMC1913523 DOI: 10.1186/1471-2164-8-160] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2007] [Accepted: 06/11/2007] [Indexed: 11/22/2022] Open
Abstract
Background Chargaff's rule of DNA base composition, stating that DNA comprises equal amounts of adenine and thymine (%A = %T) and of guanine and cytosine (%C = %G), is well known because it was fundamental to the conception of the Watson-Crick model of DNA structure. His second parity rule stating that the base proportions of double-stranded DNA are also reflected in single-stranded DNA (%A = %T, %C = %G) is more obscure, likely because its biological basis and significance are still unresolved. Within each strand, the symmetry of single nucleotide composition extends even further, being demonstrated in the balance of di-, tri-, and multi-nucleotides with their respective complementary oligonucleotides. Results Here, we propose that inversions are sufficient to account for the symmetry within each single-stranded DNA. Human mitochondrial DNA does not demonstrate such intra-strand parity, and we consider how its different functional drivers may relate to our theory. This concept is supported by the recent observation that inversions occur frequently. Conclusion Along with chromosomal duplications, inversions must have been shaping the architecture of genomes since the origin of life.
Collapse
|
220
|
Hu J, Zhao X, Zhang Z, Yu J. Compositional dynamics of guanine and cytosine content in prokaryotic genomes. Res Microbiol 2007; 158:363-70. [PMID: 17449227 DOI: 10.1016/j.resmic.2007.02.007] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2006] [Revised: 02/07/2007] [Accepted: 02/15/2007] [Indexed: 11/20/2022]
Abstract
Nucleotide compositional analyses of disparities in genomic guanine and cytosine (gGC) content directly relate to the amino acid composition, through the union of the genetic code. Here we analyzed 229 prokaryotic genomes to address the intricate relationships between gGC, amino acids and their codons in the context of genes. First, we not only confirmed the universal rule that the average GC content at codon position 1 (GC1) is always higher than that at codon position 2 (GC2), but also extended the rule to show that it holds true even when codon-position-related GC contents are calculated on a per gene basis. The "GC1>GC2 rule" is attributable essentially to a few dominant amino acids that have GC at one of these two codon positions or the intermediate-GC group of amino acids. Second, we found that gGC fluctuations were largely compensated for at the codon level, when codons belonging to high-GC and low-GC amino acid groups varied accordingly. Finally, we found that prokaryotic genes also have a GC content gradient (Gd) distributed along their transcripts. The gradients at three codon positions (Gd1, Gd2 and Gd3) all correlated with gGC in two different directions: Gd3 was positive, whereas the other two were negative.
Collapse
Affiliation(s)
- Jianfei Hu
- College of Life Sciences, Peking University, Beijing 100871, China
| | | | | | | |
Collapse
|
221
|
Revisiting the directional mutation pressure theory: The analysis of a particular genomic structure in Leishmania major. Gene 2006; 385:28-40. [DOI: 10.1016/j.gene.2006.04.031] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2005] [Accepted: 04/04/2006] [Indexed: 11/20/2022]
|
222
|
Fonseca MM, Froufe E, Harris DJ. Mitochondrial gene rearrangements and partial genome duplications detected by multigene asymmetric compositional bias analysis. J Mol Evol 2006; 63:654-61. [PMID: 17075699 DOI: 10.1007/s00239-005-0242-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2005] [Accepted: 05/30/2006] [Indexed: 11/30/2022]
Abstract
Asymmetric compositional and mutation bias between the two strands occurs in mitochondrial genomes, and an asymmetric mechanism of mtDNA replication is a potential source of this bias. Some evidence indicates that during replication the heavy strand is subject to a gradient of time spent in a single-stranded state (D (ssH)) and a gradient of mutational damage. The nucleotide composition bias among genes varies with D (ssH). Consequently, partial genome duplications (PGD) will alter the skew for genes located downstream of the duplication, relatively to nascent light strand synthesis, and in the same way, gene rearrangements (GRr) will affect genes by changing their skews. We examined cases where there had been PGD or GRr and determined whether this left a trace in the form of unusual patterns of base composition. We compared the skew of genes differently located on the mtDNA genome of previously published whole mtDNA genomes from amphibians, a group that shows considerable levels of both GRr and PGD. After observing a significant correlation between AT and GC skew with D (ssH) at fourfold redundant sites, we ran our analysis and detected 31.3% of the species with GRr and/or PGD. By comparing the nucleotide composition at fourfold redundant sites in normal and "abnormal" species, we found that A/C variation occurs and is associated with GRr/PGD. These results show that by analyzing the nucleotide skews of only three genes, it may be possible to predict some mitochondrial GRr and/or PGD without knowing the complete mtDNA genome sequence.
Collapse
Affiliation(s)
- Miguel M Fonseca
- Centro de Investigação em Biodiversidade e Recursos Genéticos (CIBIO/UP), ICETA-UP, Campus Agrário de Vairão, Rua Padre Armando, 4485-661 Vairão, Portugal
| | | | | |
Collapse
|
223
|
Using the nucleotide substitution rate matrix to detect horizontal gene transfer. BMC Bioinformatics 2006; 7:476. [PMID: 17067382 PMCID: PMC1657035 DOI: 10.1186/1471-2105-7-476] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2006] [Accepted: 10/26/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Horizontal gene transfer (HGT) has allowed bacteria to evolve many new capabilities. Because transferred genes perform many medically important functions, such as conferring antibiotic resistance, improved detection of horizontally transferred genes from sequence data would be an important advance. Existing sequence-based methods for detecting HGT focus on changes in nucleotide composition or on differences between gene and genome phylogenies; these methods have high error rates. RESULTS First, we introduce a new class of methods for detecting HGT based on the changes in nucleotide substitution rates that occur when a gene is transferred to a new organism. Our new methods discriminate simulated HGT events with an error rate up to 10 times lower than does GC content. Use of models that are not time-reversible is crucial for detecting HGT. Second, we show that using combinations of multiple predictors of HGT offers substantial improvements over using any single predictor, yielding as much as a factor of 18 improvement in performance (a maximum reduction in error rate from 38% to about 3%). Multiple predictors were combined by using the random forests machine learning algorithm to identify optimal classifiers that separate HGT from non-HGT trees. CONCLUSION The new class of HGT-detection methods introduced here combines advantages of phylogenetic and compositional HGT-detection techniques. These new techniques offer order-of-magnitude improvements over compositional methods because they are better able to discriminate HGT from non-HGT trees under a wide range of simulated conditions. We also found that combining multiple measures of HGT is essential for detecting a wide range of HGT events. These novel indicators of horizontal transfer will be widely useful in detecting HGT events linked to the evolution of important bacterial traits, such as antibiotic resistance and pathogenicity.
Collapse
|
224
|
Jiang RHY, Govers F. Nonneutral GC3 and retroelement codon mimicry in Phytophthora. J Mol Evol 2006; 63:458-72. [PMID: 16955239 DOI: 10.1007/s00239-005-0211-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2005] [Accepted: 05/20/2006] [Indexed: 10/24/2022]
Abstract
Phytophthora is a genus entirely comprised of destructive plant pathogens. It belongs to the Stramenopila, a unique branch of eukaryotes, phylogenetically distinct from plants, animals, or fungi. Phytophthora genes show a strong preference for usage of codons ending with G or C (high GC3). The presence of high GC3 in genes can be utilized to differentiate coding regions from noncoding regions in the genome. We found that both selective pressure and mutation bias drive codon bias in Phytophthora. Indicative for selection pressure is the higher GC3 value of highly expressed genes in different Phytophthora species. Lineage specific GC increase of noncoding regions is reminiscent of whole-genome mutation bias, whereas the elevated Phytophthora GC3 is primarily a result of translation efficiency-driven selection. Heterogeneous retrotransposons exist in Phytophthora genomes and many of them vary in their GC content. Interestingly, the most widespread groups of retroelements in Phytophthora show high GC3 and a codon bias that is similar to host genes. Apparently, selection pressure has been exerted on the retroelement's codon usage, and such mimicry of host codon bias might be beneficial for the propagation of retrotransposons.
Collapse
Affiliation(s)
- Rays H Y Jiang
- Laboratory of Phytopathology, Plant Sciences Group, and Graduate School of Experimental Plant Sciences, Wageningen University, Binnenhaven 5, NL-6709 PD, Wageningen, The Netherlands
| | | |
Collapse
|
225
|
Doddapaneni H, Yao J, Lin H, Walker MA, Civerolo EL. Analysis of the genome-wide variations among multiple strains of the plant pathogenic bacterium Xylella fastidiosa. BMC Genomics 2006; 7:225. [PMID: 16948851 PMCID: PMC1574315 DOI: 10.1186/1471-2164-7-225] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2006] [Accepted: 09/01/2006] [Indexed: 01/19/2023] Open
Abstract
Background The Gram-negative, xylem-limited phytopathogenic bacterium Xylella fastidiosa is responsible for causing economically important diseases in grapevine, citrus and many other plant species. Despite its economic impact, relatively little is known about the genomic variations among strains isolated from different hosts and their influence on the population genetics of this pathogen. With the availability of genome sequence information for four strains, it is now possible to perform genome-wide analyses to identify and categorize such DNA variations and to understand their influence on strain functional divergence. Results There are 1,579 genes and 194 non-coding homologous sequences present in the genomes of all four strains, representing a 76. 2% conservation of the sequenced genome. About 60% of the X. fastidiosa unique sequences exist as tandem gene clusters of 6 or more genes. Multiple alignments identified 12,754 SNPs and 14,449 INDELs in the 1528 common genes and 20,779 SNPs and 10,075 INDELs in the 194 non-coding sequences. The average SNP frequency was 1.08 × 10-2 per base pair of DNA and the average INDEL frequency was 2.06 × 10-2 per base pair of DNA. On an average, 60.33% of the SNPs were synonymous type while 39.67% were non-synonymous type. The mutation frequency, primarily in the form of external INDELs was the main type of sequence variation. The relative similarity between the strains was discussed according to the INDEL and SNP differences. The number of genes unique to each strain were 60 (9a5c), 54 (Dixon), 83 (Ann1) and 9 (Temecula-1). A sub-set of the strain specific genes showed significant differences in terms of their codon usage and GC composition from the native genes suggesting their xenologous origin. Tandem repeat analysis of the genomic sequences of the four strains identified associations of repeat sequences with hypothetical and phage related functions. Conclusion INDELs and strain specific genes have been identified as the main source of variations among strains, with individual strains showing different rates of genome evolution. Based on these genome comparisons, it appears that the Pierce's disease strain Temecula-1 genome represents the ancestral genome of the X. fastidiosa. Results of this analysis are publicly available in the form of a web database.
Collapse
Affiliation(s)
| | - Jiqiang Yao
- Citrus Research Board, 323 W. Oak, P.O. Box 230, Visalia, CA 93279, USA
| | - Hong Lin
- USDA-ARS. San Joaquin Valley Agricultural Science Center, 9611 So. Riverbend Ave. Parlier, CA 93648, USA
| | - M Andrew Walker
- University of California Davis, Department of Viticulture and Enology, Davis, CA 95616, USA
| | - Edwin L Civerolo
- USDA-ARS. San Joaquin Valley Agricultural Science Center, 9611 So. Riverbend Ave. Parlier, CA 93648, USA
| |
Collapse
|
226
|
Carapelli A, Vannini L, Nardi F, Boore JL, Beani L, Dallai R, Frati F. The mitochondrial genome of the entomophagous endoparasite Xenos vesparum (Insecta: Strepsiptera). Gene 2006; 376:248-59. [PMID: 16766140 DOI: 10.1016/j.gene.2006.04.005] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2005] [Revised: 03/21/2006] [Accepted: 04/08/2006] [Indexed: 11/25/2022]
Abstract
In this study, the nearly complete sequence (14,519 bp) of the mitochondrial DNA (mtDNA) of the entomophagous endoparasite Xenos vesparum (Insecta: Strepsiptera) is described. All protein coding genes (PCGs) are in the arrangement known to be ancestral for insects, but three tRNA genes (trnA, trnS(gcu), and trnL(uag)) have transposed to derived positions and there are three tandem copies of trnH, each of which is potentially functional. All of these rearrangements except for that of trnL(uag) is within the short span between nad3 and nad4 and there are numerous blocks of unassignable sequence in this region, perhaps as remnants of larger scale predisposing rearrangements. X. vesparum mtDNA nucleotide composition is strongly biased toward A and T, as is typical for insect mtDNAs. There is also a significant strand skew in the distribution of these nucleotides, with the J-strand being richer in A than T and in C than G, and the N-strand showing an opposite skew for complementary pairs of nucleotides. The hypothetical secondary structure of the LSU rRNA has also been reconstructed, obtaining a structural model similar to that of other insects.
Collapse
MESH Headings
- Animals
- Base Composition
- Base Pairing
- Base Sequence
- Codon
- DNA, Circular/chemistry
- DNA, Circular/genetics
- DNA, Mitochondrial/chemistry
- DNA, Mitochondrial/genetics
- Evolution, Molecular
- Gene Dosage
- Gene Expression Profiling
- Gene Order
- Gene Rearrangement
- Genes, Insect
- Genome
- Insecta/classification
- Insecta/genetics
- Microsatellite Repeats
- Molecular Sequence Data
- Nucleic Acid Conformation
- Open Reading Frames
- RNA, Ribosomal/chemistry
- RNA, Ribosomal/genetics
- RNA, Transfer/chemistry
- RNA, Transfer/genetics
- Repetitive Sequences, Nucleic Acid
- Sequence Analysis, DNA
- Translocation, Genetic
Collapse
|
227
|
Nikolaou C, Almirantis Y. Deviations from Chargaff's second parity rule in organellar DNA Insights into the evolution of organellar genomes. Gene 2006; 381:34-41. [PMID: 16893615 DOI: 10.1016/j.gene.2006.06.010] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2006] [Revised: 04/18/2006] [Accepted: 06/13/2006] [Indexed: 10/24/2022]
Abstract
Chargaff' s second parity rule (PR2) states that complementary nucleotides are met with almost equal frequencies in single stranded DNA. This is indeed the case for all bacterial and eukaryotic genomes studied, although the genomic patterns may differ among genomes in terms of local deviations. The behaviour of organellar genomes regarding the second parity rule has not been studied in detail up to now. We tested all available organellar genomes and found that a large number of mitochondrial genomes significantly deviate from the 2nd parity rule in contrast to the eubacterial ones, although mitochondria are believed to have evolved from proteobacteria. Moreover, mitochondria may be divided into three distinct sub-groups according to their overall deviation from the aforementioned parity rule. On the other hand, chloroplast genomes share the pattern of eubacterial genomes and, interestingly, so do mitochondrial genomes originating from plants and some fungi. The deviation from the second parity is found to be weakly correlated with the overall excess of purines against pyrimidines. The behaviour of the large majority of the mitochondrial genomes may be attributed to their distinct mode of replication, which is fundamentally different from the one of the eubacteria. Differences between chloroplast and mitochondrial genomes might also be explained on the basis of different replication mechanisms and correlated to differences in the genome size and compaction. The results presented herein may provide some insight into different modes of evolution of genome structure between chloroplasts and mitochondria.
Collapse
Affiliation(s)
- Christoforos Nikolaou
- Computational Genomics Group, Institute of Biology, NCSR Demokritos, 15310 Athens, Greece.
| | | |
Collapse
|
228
|
Uno R, Nakayama Y, Tomita M. Over-representation of Chi sequences caused by di-codon increase in Escherichia coli K-12. Gene 2006; 380:30-7. [PMID: 16854534 DOI: 10.1016/j.gene.2006.05.013] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2005] [Revised: 04/20/2006] [Accepted: 05/09/2006] [Indexed: 11/17/2022]
Abstract
Chi sequences (5'-GCTGGTGG-3') are cis-acting 8 bp sequence elements that enhance homologous recombination promoted by the RecBCD pathway in Escherichia coli. The genome of E. coli K-12 MG1655 contains 1009 Chi sequences and this frequency far exceeds the expected value for occurrence of an 8 bp sequence in a genome of this size. It is generally thought that the over-representation of Chi sequences indicates that they have been selected for during evolution because of their function in recombination. The genes from three E. coli strains (K-12, O157 and CFT) were classified into three categories (island, match to other E. coli, and backbone). Island genes have a different base composition and codon usage in comparison with those in the backbone genes, therefore they were relatively new and not yet adapted to the base composition patterns and codon usage typical of the recipient genome. The over-representation of Chi sequences was examined by comparing Chi frequencies and codon frequencies between island and backbone genes. The difference in the CTGGTG di-codon frequency between the backbone and island genes was correlated with the frequency of Chi sequences which were translated in the Leu-Val (-G/CTG/GTG/G-) reading frame in the K-12 strain. These results suggest that the main reading frame of Chi sequences increased as a result of the di-codon CTG-GTG increasing under a genome-wide pressure for adapting to the codon usage and base composition of the E. coli K-12 strain, and that the RecBCD recombinase might adjust its recognition sequence to a frequently occurring oligomer such as G-CTG-GTG-G.
Collapse
Affiliation(s)
- Reina Uno
- Institute for Advanced Biosciences, Keio University, Tsuruoka, 997-0014, Japan.
| | | | | |
Collapse
|
229
|
Bastolla U, Porto M, Roman HE, Vendruscolo M. A protein evolution model with independent sites that reproduces site-specific amino acid distributions from the Protein Data Bank. BMC Evol Biol 2006; 6:43. [PMID: 16737532 PMCID: PMC1570368 DOI: 10.1186/1471-2148-6-43] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2005] [Accepted: 05/31/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Since thermodynamic stability is a global property of proteins that has to be conserved during evolution, the selective pressure at a given site of a protein sequence depends on the amino acids present at other sites. However, models of molecular evolution that aim at reconstructing the evolutionary history of macromolecules become computationally intractable if such correlations between sites are explicitly taken into account. RESULTS We introduce an evolutionary model with sites evolving independently under a global constraint on the conservation of structural stability. This model consists of a selection process, which depends on two hydrophobicity parameters that can be computed from protein sequences without any fit, and a mutation process for which we consider various models. It reproduces quantitatively the results of Structurally Constrained Neutral (SCN) simulations of protein evolution in which the stability of the native state is explicitly computed and conserved. We then compare the predicted site-specific amino acid distributions with those sampled from the Protein Data Bank (PDB). The parameters of the mutation model, whose number varies between zero and five, are fitted from the data. The mean correlation coefficient between predicted and observed site-specific amino acid distributions is larger than <r> = 0.70 for a mutation model with no free parameters and no genetic code. In contrast, considering only the mutation process with no selection yields a mean correlation coefficient of <r> = 0.56 with three fitted parameters. The mutation model that best fits the data takes into account increased mutation rate at CpG dinucleotides, yielding <r> = 0.90 with five parameters. CONCLUSION The effective selection process that we propose reproduces well amino acid distributions as observed in the protein sequences in the PDB. Its simplicity makes it very promising for likelihood calculations in phylogenetic studies. Interestingly, in this approach the mutation process influences the effective selection process, i.e. selection and mutation must be entangled in order to obtain effectively independent sites. This interdependence between mutation and selection reflects the deep influence that mutation has on the evolutionary process: The bias in the mutation influences the thermodynamic properties of the evolving proteins, in agreement with comparative studies of bacterial proteomes, and it also influences the rate of accepted mutations.
Collapse
Affiliation(s)
- Ugo Bastolla
- Centro de Biología Molecular "Severo Ochoa", (CSIC-UAM), Cantoblanco, 28049 Madrid, Spain
| | - Markus Porto
- Institut für Festkörperphysik, Technische Universität Darmstadt, Hochschulstr. 8, 64289 Darmstadt, Germany
| | - H Eduardo Roman
- Dipartimento di Fisica, Università di Milano Bicocca, Piazza della Scienza 3, 20126 Milano, Italy
| | - Michele Vendruscolo
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK
| |
Collapse
|
230
|
|
231
|
Bailly-Bechet M, Danchin A, Iqbal M, Marsili M, Vergassola M. Codon usage domains over bacterial chromosomes. PLoS Comput Biol 2006; 2:e37. [PMID: 16683018 PMCID: PMC1447655 DOI: 10.1371/journal.pcbi.0020037] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2005] [Accepted: 03/13/2006] [Indexed: 11/19/2022] Open
Abstract
The geography of codon bias distributions over prokaryotic genomes and its impact upon chromosomal organization are analyzed. To this aim, we introduce a clustering method based on information theory, specifically designed to cluster genes according to their codon usage and apply it to the coding sequences of Escherichia coli and Bacillus subtilis. One of the clusters identified in each of the organisms is found to be related to expression levels, as expected, but other groups feature an over-representation of genes belonging to different functional groups, namely horizontally transferred genes, motility, and intermediary metabolism. Furthermore, we show that genes with a similar bias tend to be close to each other on the chromosome and organized in coherent domains, more extended than operons, demonstrating a role of translation in structuring bacterial chromosomes. It is argued that a sizeable contribution to this effect comes from the dynamical compartimentalization induced by the recycling of tRNAs, leading to gene expression rates dependent on their genomic and expression context.
Collapse
Affiliation(s)
- Marc Bailly-Bechet
- CNRS URA 2171, Institute Pasteur, Unité Génétique in silico, Paris, France
| | - Antoine Danchin
- CNRS URA 2171, Institute Pasteur, Unité Génétique des Génomes Bactériens, Paris, France
| | - Mudassar Iqbal
- Abdus Salam International Center Theoretical Physics, Trieste, Italy
- Computing Laboratory, University of Kent, Canterbury, Kent, United Kingdom
| | - Matteo Marsili
- Abdus Salam International Center Theoretical Physics, Trieste, Italy
| | - Massimo Vergassola
- CNRS URA 2171, Institute Pasteur, Unité Génétique in silico, Paris, France
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
232
|
Glusman G, Qin S, El-Gewely MR, Siegel AF, Roach JC, Hood L, Smit AFA. A third approach to gene prediction suggests thousands of additional human transcribed regions. PLoS Comput Biol 2006; 2:e18. [PMID: 16543943 PMCID: PMC1391917 DOI: 10.1371/journal.pcbi.0020018] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2005] [Accepted: 01/25/2006] [Indexed: 12/26/2022] Open
Abstract
The identification and characterization of the complete ensemble of genes is a main goal of deciphering the digital information stored in the human genome. Many algorithms for computational gene prediction have been described, ultimately derived from two basic concepts: (1) modeling gene structure and (2) recognizing sequence similarity. Successful hybrid methods combining these two concepts have also been developed. We present a third orthogonal approach to gene prediction, based on detecting the genomic signatures of transcription, accumulated over evolutionary time. We discuss four algorithms based on this third concept: Greens and CHOWDER, which quantify mutational strand biases caused by transcription-coupled DNA repair, and ROAST and PASTA, which are based on strand-specific selection against polyadenylation signals. We combined these algorithms into an integrated method called FEAST, which we used to predict the location and orientation of thousands of putative transcription units not overlapping known genes. Many of the newly predicted transcriptional units do not appear to code for proteins. The new algorithms are particularly apt at detecting genes with long introns and lacking sequence conservation. They therefore complement existing gene prediction methods and will help identify functional transcripts within many apparent "genomic deserts."
Collapse
|
233
|
Urbina D, Tang B, Higgs PG. The response of amino acid frequencies to directional mutation pressure in mitochondrial genome sequences is related to the physical properties of the amino acids and to the structure of the genetic code. J Mol Evol 2006; 62:340-61. [PMID: 16477524 DOI: 10.1007/s00239-005-0051-1] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2005] [Accepted: 10/01/2005] [Indexed: 11/29/2022]
Abstract
The frequencies of A, C, G, and T in mitochondrial DNA vary among species due to unequal rates of mutation between the bases. The frequencies of bases at fourfold degenerate sites respond directly to mutation pressure. At first and second positions, selection reduces the degree of frequency variation. Using a simple evolutionary model, we show that first position sites are less constrained by selection than second position sites and, therefore, that the frequencies of bases at first position are more responsive to mutation pressure than those at second position. We define a measure of distance between amino acids that is dependent on eight measured physical properties and a similarity measure that is the inverse of this distance. Columns 1, 2, 3, and 4 of the genetic code correspond to codons with U, C, A, and G in their second position, respectively. The similarity of amino acids in the four columns decreases systematically from column 1 to column 2 to column 3 to column 4. We then show that the responsiveness of first position bases to mutation pressure is dependent on the second position base and follows the same decreasing trend through the four columns. Again, this shows the correlation between physical properties and responsiveness. We determine a proximity measure for each amino acid, which is the average similarity between an amino acid and all others that are accessible via single point mutations in the mitochondrial genetic code structure. We also define a responsiveness for each amino acid, which measures how rapidly an amino acid frequency changes as a result of mutation pressure acting on the base frequencies. We show that there is a strong correlation between responsiveness and proximity, and that both these quantities are also correlated with the mutability of amino acids estimated from the mtREV substitution rate matrix. We also consider the variation of base frequencies between strands and between genes on a strand. These trends are consistent with the patterns expected from analysis of the variation among genomes.
Collapse
Affiliation(s)
- Daniel Urbina
- Department of Physics and Astronomy, McMaster University, Hamilton, Ontario, Canada
| | | | | |
Collapse
|
234
|
Chargaff’s Second Parity Rule. Evol Bioinform Online 2006. [DOI: 10.1007/978-0-387-33419-6_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
|
235
|
Duplij D, Duplij S. DNA sequence representation by trianders and determinative degree of nucleotides. J Zhejiang Univ Sci B 2005; 6:743-55. [PMID: 16052707 PMCID: PMC1389855 DOI: 10.1631/jzus.2005.b0743] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
A new version of DNA walks, where nucleotides are regarded unequal in their contribution to a walk is introduced, which allows us to study thoroughly the "fine structure" of nucleotide sequences. The approach is based on the assumption that nucleotides have an inner abstract characteristic, the determinative degree, which reflects genetic code phenomenological properties and is adjusted to nucleotides physical properties. We consider each codon position independently, which gives three separate walks characterized by different angles and lengths, and that such an object is called triander which reflects the "strength" of branch. A general method for identifying DNA sequence "by triander" which can be treated as a unique "genogram" (or "gene passport") is proposed. The two- and three-dimensional trianders are considered. The difference of sequences fine structure in genes and the intergenic space is shown. A clear triplet signal in coding sequences was found which is absent in the intergenic space and is independent from the sequence length. This paper presents the topological classification of trianders which can allow us to provide a detailed working out signatures of functionally different genomic regions.
Collapse
Affiliation(s)
- Diana Duplij
- Institute of Molecular Biology and Genetics, Kiev 03143, Ukraine
| | - Steven Duplij
- Theory Group, Nuclear Physics Laboratory, Kharkov National University, Kharkov 61077, Ukraine
- †E-mail:
| |
Collapse
|
236
|
Nikolaou C, Almirantis Y. A study on the correlation of nucleotide skews and the positioning of the origin of replication: different modes of replication in bacterial species. Nucleic Acids Res 2005; 33:6816-22. [PMID: 16321966 PMCID: PMC1301597 DOI: 10.1093/nar/gki988] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Deviations from Chargaff's 2nd parity rule, according to which A approximately T and G approximately C in single stranded DNA, have been associated with replication as well as with transcription in prokaryotes. Based on observations regarding mainly the transcription-replication co-linearity in a large number of prokaryotic species, we formulate the hypothesis that the replication procedure may follow different modes between genomes throughout which the skews clearly follow different patterns. We draw the conclusion that multiple functional sites of origin of replication may exist in the genomes of most archaea and in some exceptional cases of eubacteria, while in the majority of eubacteria, replication occurs through a single fixed origin.
Collapse
Affiliation(s)
- Christoforos Nikolaou
- Institute of Biology, National Centre of Scientific Research Demokritos, 15310 Athens, Greece.
| | | |
Collapse
|
237
|
Hassanin A. Phylogeny of Arthropoda inferred from mitochondrial sequences: strategies for limiting the misleading effects of multiple changes in pattern and rates of substitution. Mol Phylogenet Evol 2005; 38:100-16. [PMID: 16290034 DOI: 10.1016/j.ympev.2005.09.012] [Citation(s) in RCA: 196] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2005] [Revised: 08/08/2005] [Accepted: 09/06/2005] [Indexed: 10/25/2022]
Abstract
In this study, mitochondrial sequences were used to investigate the relationships among the major lineages of Arthropoda. The data matrix used for the analyses includes 84 taxa and 3918 nucleotides representing six mitochondrial protein-coding genes (atp6 and 8, cox1-3, and nad2). The analyses of nucleotide composition show that a reverse strand-bias, i.e., characterized by an excess of T relative to A nucleotides and of G relative to C nucleotides, was independently acquired in six different lineages of Arthropoda: (1) the honeybee mite (Varroa), (2) Opisthothelae spiders (Argiope, Habronattus, and Ornithoctonus), (3) scorpions (Euscorpius and Mesobuthus), (4) Hutchinsoniella (Cephalocarid), (5) Tigriopus (Copepod), and (6) whiteflies (Aleurodicus and Trialeurodes). Phylogenetic analyses confirm that these convergences in nucleotide composition can be particularly misleading for tree reconstruction, as unrelated taxa with reverse strand-bias tend to group together in MP, ML, and Bayesian analyses. However, the use of a specific model for minimizing effects of the bias, the "Neutral Transition Exclusion" (NTE) model, allows Bayesian analyses to rediscover most of the higher taxa of Arthropoda. Furthermore, the analyses of branch lengths suggest that three main factors explain accelerated rates of substitution: (1) genomic rearrangements, including duplication of the control region and gene translocation, (2) parasitic lifestyle, and (3) small body size. The comparisons of Bayesian Bootstrap percentages show that the support for many nodes increases when taxa with long branches are excluded from the analyses. It is therefore recommended to select taxa and genes of the mitochondrial genome for inferring phylogenetic relationships among arthropod lineages. The phylogenetic analyses support the existence of a major dichotomy within Arthropoda, separating Pancrustacea and Paradoxopoda. Basal relationships between Pancrustacean lineages are not robust, and the question of Hexapod monophyly or polyphyly cannot be answered with the available mitochondrial sequences. Within Paradoxopoda, Chelicerata and Myriapoda are each found to be monophyletic, and Endeis (Pycnogonida) is, surprisingly, associated with Acari.
Collapse
Affiliation(s)
- Alexandre Hassanin
- Muséum National d'Histoire Naturelle, Département Systématique et Evolution, UMR 5202-Origine, Structure, et Evolution de la Biodiversité, Case postale No. 51, 55, rue Buffon, 75005 Paris, France.
| |
Collapse
|
238
|
Mitchell A, Graur D. Inferring the pattern of spontaneous mutation from the pattern of substitution in unitary pseudogenes of Mycobacterium leprae and a comparison of mutation patterns among distantly related organisms. J Mol Evol 2005; 61:795-803. [PMID: 16315108 DOI: 10.1007/s00239-004-0235-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2004] [Accepted: 04/29/2005] [Indexed: 11/27/2022]
Abstract
The pattern of spontaneous mutation can be inferred from the pattern of substitution in pseudogenes, which are known to be under very weak or no selective constraint. We modified an existing method (Gojobori T, et al., J Mol Evol 18:360, 1982) to infer the pattern of mutation in bacteria by using 569 pseudogenes from Mycobacterium leprae. In Gojobori et al.'s method, the pattern is inferred by using comparisons involving a pseudogene, a conspecific functional paralog, and an outgroup functional ortholog. Because pseudogenes in M. leprae are unitary, we replaced the missing paralogs by functional orthologs from M. tuberculosis. Functional orthologs from Streptomyces coelicolor served as outgroups. We compiled a database consisting of 69,378 inferred mutations. Transitional mutations were found to constitute more than 56% of all mutations. The transitional bias was mainly due to C-->T and G-->A, which were also the most frequent mutations on the leading strand and the only ones that were significantly more frequent than the random expectation. The least frequent mutations on the leading strand were A-->T and T-->A, each with a relative frequency of less than 3%. The mutation pattern was found to differ between the leading and the lagging strands. This asymmetry is thought to be the cause for the typical chirochoric structure of bacterial genomes. The physical distance of the pseudogene from the origin of replication (ori) was found to have almost no effect on the pattern of mutation. A surprising similarity was found between the mutation pattern in M. leprae and previously inferred patterns for such distant taxa as human and Drosophila. The mutation pattern on the leading strand of M. leprae was also found to share some common features with the pattern inferred for the heavy strand of the human mitochondrial genome. These findings indicate that taxon-specific factors may only play secondary roles in determining patterns of mutation.
Collapse
Affiliation(s)
- Amir Mitchell
- Department fo Zoology, George S. Wise Faculty of Life Sciences, Tel Aviv University Ramat Aviv, 69978, Israel
| | | |
Collapse
|
239
|
Hassanin A, Léger N, Deutsch J. Evidence for multiple reversals of asymmetric mutational constraints during the evolution of the mitochondrial genome of metazoa, and consequences for phylogenetic inferences. Syst Biol 2005; 54:277-98. [PMID: 16021696 DOI: 10.1080/10635150590947843] [Citation(s) in RCA: 341] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open
Abstract
Mitochondrial DNA (mtDNA) sequences are comonly used for inferring phylogenetic relationships. However, the strand-specific bias in the nucleotide composition of the mtDNA, which is thought to reflect assymetric mutational constraints, combined with the important compositional heterogeneity among taxa, are known to be highly problematic for phylogenetic analyses. Here, nucleotide composition was compared across 49 species of Metazoa (34 arthropods, 2 annelids, 2 molluscs, and 11 deuterosomes), and analyzed for a mtDNA fragment including six protein-coding genes, i.e., atp6, atp8, cox1, cox2, cox3, and nad2. The analyses show that most metazoan species present a clear strand assymetry, where one strand is biased in favor of A and C, whereas the other strand has reverse bias, i.e. in favor of T and G. the origin of this strand bias can be related to assymetric mutational constraints involving deaminations of A and C nucleotides during the replication and/or transcription processes. The analyses reveal that six unrelated genera are characterized by a reversal of the usual strand bias, i.e., Argiope (Araneae), Euscorpius (Scorpiones), Tigrioupus (Maxillopoda), Branchiostoma (Cephalochordata) Florometra (Echinodermata), and Katharina (Mollusca). It is proposed that assymetric mutational constraints have been independantly reversed in these six genera, through an inversion of the control region, i.e., the region that contains most regulatory elements for replication and transcription of the mtDNA. We show that reversals of assymetric mutational constraints have dramatic consequences on the phylogenetic analyses, as taxa characterized by reverse strand bias tend to group together due to long-branch attraction artifacts. We propose a new method for limiting this specific problem in tree reconstruction under the Bayesian approach. We apply our method to deal with the question of phylogenetic relationships of the major lineages of Arthropoda, This new approach provides a better congruence with nuclear analyses based on mtDNA sequences, our data suggest that Chelicerata, Crustacea, Myriapoda, Pancrustacea, and Paradoxopoda are monophyletic.
Collapse
Affiliation(s)
- Alexandre Hassanin
- Muséum National d'Histoire Naturelle, Départment Systématique et Evolution, Case Postale, Paris, France.
| | | | | |
Collapse
|
240
|
Nohara M, Nishida M, Miya M, Nishikawa T. Evolution of the Mitochondrial Genome in Cephalochordata as Inferred from Complete Nucleotide Sequences from Two Epigonichthys Species. J Mol Evol 2005; 60:526-37. [PMID: 15883887 DOI: 10.1007/s00239-004-0238-x] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2004] [Accepted: 11/07/2004] [Indexed: 11/26/2022]
Abstract
Complete mitochondrial (mt) DNA sequences of two lancelets, Epigonichthys maldivensis and E. lucayanus, were compared with those of two Branchiostoma lancelets and several deuterostomes previously surveyed. The mt-gene order of E. lucayanus was quite different from that of E. maldivensis, the latter being identical to the two Branchiostoma species. A remarkable genomic change in E. lucayanus mtDNA was an inversion, indicating the possibility of recombination of the mt-genome. Gene rearrangements, probably attributable to tandem genome duplications and subsequent random deletions, were observed in two parts. Short major unassignable sequences of the examined lancelets were regarded as a part of putative regulative elements, judging from some sequence similarity to the conserved sequence block (CSB) in mammalian mtDNA. The considerable mt-genome reorganization in E. lucayanus seemed to have affected the nucleotide substitution pattern, suggested by base composition analyses. The present analysis also suggested that AGR codons in lancelet mtDNA were likely to correspond to serine residue, rather than glycine. Furthermore, the AGG codon, so far reputed to be unassignable in lancelet mtDNA, was found twice in E. maldivensis, indicating the availability of all four AGN codons in some lancelets. This finding lends support to an alternative hypothesis regarding the evolutionary history of AGR-codon assignment in extant chordates, rather than that previously proposed. A molecular phylogenetic tree of the Epigonichthys and Branchiostoma species based on DNA sequences of the 13 mt-protein genes doubted the monophyly of the former genus, unlike the prevailing classification based on their different gonadal arrangements.
Collapse
Affiliation(s)
- Masahiro Nohara
- Yokohama R&D Center, HITEC Co., Ltd., 3-55-1 Hagoromo-cho, Naka-ku, Yokohama, Kanagawa, 231-0047, Japan
| | | | | | | |
Collapse
|
241
|
Nilsson D, Andersson B. Strand asymmetry patterns in trypanosomatid parasites. Exp Parasitol 2005; 109:143-9. [PMID: 15713445 DOI: 10.1016/j.exppara.2004.12.004] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2002] [Revised: 12/01/2004] [Accepted: 12/01/2004] [Indexed: 11/28/2022]
Abstract
The genome organization of kinetoplastid parasites is unusual, with chromosomes containing several long regions of polycistronically transcribed genes. The regions where the direction of transcription switches have been hypothesized to contain origins of replication and possibly also centromers and promoters. We report that overall strand asymmetry patterns can be observed in Trypanosoma cruzi and Trypanosoma brucei with optima on strand-switch regions. The base skews of T. cruzi and T. brucei divergent strand-switches show patterns analogous to those for bacterial origins of replication, but they differ from those of Leishmania major. Bias in codon usage and the trypanosomatid unidirectional gene clusters predict most of this skew, but fail to properly explain the same trend in intergenic regions, as does the current knowledge of regulatory sequences.
Collapse
Affiliation(s)
- Daniel Nilsson
- Center for Genomics and Bioinformatics, Karolinska Institutet, Berzeliusv. 35, SE-171 77 Stockholm, Sweden
| | | |
Collapse
|
242
|
Zagordi O, Lobry JR. Forcing reversibility in the no-strand-bias substitution model allows for the theoretical and practical identifiability of its 5 parameters from pairwise DNA sequence comparisons. Gene 2005; 347:175-82. [PMID: 15725378 DOI: 10.1016/j.gene.2004.12.019] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2004] [Revised: 12/14/2004] [Accepted: 12/16/2004] [Indexed: 11/16/2022]
Abstract
Because of the base pairing rules in DNA, some mutations experienced by a portion of DNA during its evolution result in the same substitution, as we can only observe differences in coupled nucleotides. Then, in the absence of a bias between the two DNA strands, a model with at most 6 different parameters instead of 12 is sufficient to study the evolutionary relationship between homologous sequences derived from a common ancestor. On the other hand the same symmetry reduces the number of independent observations which can be made. Such a reduction can in some cases invalidate the calculation of the parameters. A compromise between biologically acceptable hypotheses and tractability is introduced and a five-parameter reversible no-strand-bias condition (RNSB) is presented. The identifiability of the parameters under this model is shown by examples.
Collapse
Affiliation(s)
- Osvaldo Zagordi
- Dipartimento di Scienze Fisiche Complesso Universitario di Monte Sant'Angelo via Cinthia, 80126 Napoli, Italy.
| | | |
Collapse
|
243
|
|
244
|
Guy L, Roten CAH. Genometric analyses of the organization of circular chromosomes: a universal pressure determines the direction of ribosomal RNA genes transcription relative to chromosome replication. Gene 2004; 340:45-52. [PMID: 15556293 DOI: 10.1016/j.gene.2004.06.056] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2004] [Revised: 06/08/2004] [Accepted: 06/29/2004] [Indexed: 10/26/2022]
Abstract
Selective pressures related to gene function and chromosomal architecture are acting on genome sequences and can be revealed, for instance, by appropriate genometric methods. Cumulative nucleotide skew analyses, i.e., GC, TA, and ORF orientation skews, predict the location of the origin of DNA replication for 88 out of 100 completely sequenced bacterial chromosomes. These methods appear fully reliable for proteobacteria, Gram-positives, and spirochetes as well as for euryarchaeotes. Based on this genome architecture information, coorientation analyses reveal that in prokaryotes, ribosomal RNA (rRNA) genes encoding the small and large ribosomal subunits are all transcribed in the same direction as DNA replication; that is, they are located along the leading strand. This result offers a simple and reliable method for circumscribing the region containing the origin of the DNA replication and reveals a strong selective pressure acting on the orientation of rRNA genes similar to the weaker one acting on the orientation of ORFs. Rate of coorientation of transfer RNA (tRNA) genes with DNA replication appears to be taxon-specific. Analyzing nucleotide biases such as GC and TA skews of genes and plotting one against the other reveals a taxonomic clusterization of species. All ribosomal RNA genes are enriched in Gs and depleted in Cs, the only so far known exception being the rRNA genes of deuterostomian mitochondria. However, this exception can be explained by the fact that in the chromosome of the human mitochondrion, the model of the deuterostomian organelle genome, DNA replication, and rRNA transcription proceed in opposite directions. A general rule is deduced from prokaryotic and mitochondrial genomes: ribosomal RNA genes that are transcribed in the same direction as the DNA replication are enriched in Gs, and those transcribed in the opposite direction are depleted in Gs.
Collapse
MESH Headings
- Base Composition/genetics
- Chromosomes, Archaeal/genetics
- Chromosomes, Bacterial/genetics
- DNA Replication/genetics
- DNA, Circular/genetics
- DNA, Mitochondrial/genetics
- Databases, Nucleic Acid
- Genome, Archaeal
- Genome, Bacterial
- Humans
- Models, Genetic
- Phylogeny
- RNA, Ribosomal/genetics
- Replication Origin/genetics
- Transcription, Genetic/genetics
Collapse
Affiliation(s)
- Lionel Guy
- Département de Microbiologie Fondamentale, Faculté de Biologie et de Médecine, Université de Lausanne, CH-1015 Lausanne, Switzerland
| | | |
Collapse
|
245
|
Krishnan NM, Seligmann H, Raina SZ, Pollock DD. Detecting gradients of asymmetry in site-specific substitutions in mitochondrial genomes. DNA Cell Biol 2004; 23:707-14. [PMID: 15585129 PMCID: PMC2943950 DOI: 10.1089/dna.2004.23.707] [Citation(s) in RCA: 55] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022] Open
Abstract
During mitochondrial replication, spontaneous mutations occur and accumulate asymmetrically during the time spent single stranded by the heavy strand (DssH). The predominant mutations appear to be deaminations from adenine to hypoxanthine (A --> H, which leads to an A --> G substitution) and cytosine to thymine (C --> T). Previous findings indicated that C --> T substitutions accumulate rapidly and then saturate at high DssH, suggesting protection or repair, whereas A --> G accumulates linearly with DssH. We describe here the implementation of a simple hidden Markov model (HMM) of among-site rate correlations to provide an almost continuous profile of the asymmetry in substitution response for any particular substitution type. We implement this model using a phylogeny-based Bayesian Markov chain Monte Carlo (MCMC) approach. We compare and contrast the relative asymmetries in all 12 possible substitution types, and find that the observed transition substitution responses determined using our new method agree quite well with previous predictions of a saturating curve for C --> T transition substitutions and a linear accumulation of A --> G transitions. The patterns seen in transversion substitutions show much lower among-site variation, and are nonlinear and more complex than those seen in transitions. We also find that, after accounting for the principal linear effect, some of the residual variation in A --> G/G --> A response ratios is explained by the average predicted nucleic acid secondary structure propensity at a site, possibly due to protection from mutation when secondary structure forms.
Collapse
Affiliation(s)
- Neeraja M Krishnan
- Biological Computation and Visualization Center, Department of Biological Sciences, Louisiana State University, Baton Rouge, Louisiana 70803, USA
| | | | | | | |
Collapse
|
246
|
Abstract
The replication of the chromosome is among the most essential functions of the bacterial cell and influences many other cellular mechanisms, from gene expression to cell division. Yet the way it impacts on the bacterial chromosome was not fully acknowledged until the availability of complete genomes allowed one to look upon genomes as more than bags of genes. Chromosomal replication includes a set of asymmetric mechanisms, among which are a division in a lagging and a leading strand and a gradient between early and late replicating regions. These differences are the causes of many of the organizational features observed in bacterial genomes, in terms of both gene distribution and sequence composition along the chromosome. When asymmetries or gradients increase in some genomes, e.g. due to a different composition of the DNA polymerase or to a higher growth rate, so do the corresponding biases. As some of the features of the chromosome structure seem to be under strong selection, understanding such biases is important for the understanding of chromosome organization and adaptation. Inversely, understanding chromosome organization may shed further light on questions relating to replication and cell division. Ultimately, the understanding of the interplay between these different elements will allow a better understanding of bacterial genetics and evolution.
Collapse
Affiliation(s)
- Eduardo P C Rocha
- Atelier de Bioinformatique, Université Pierre et Marie Curie, 12, Rue Cuvier, 75005 Paris, and Unité Génétique des Génomes Bactériens, Institut Pasteur, 28 rue du Dr Roux, 75724 Paris Cedex 15, France
| |
Collapse
|
247
|
Yagil G. The over-representation of binary DNA tracts in seven sequenced chromosomes. BMC Genomics 2004; 5:19. [PMID: 15113401 PMCID: PMC407849 DOI: 10.1186/1471-2164-5-19] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2003] [Accepted: 03/03/2004] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND DNA tracts composed of only two bases are possible in six combinations: A+G (purines, R), C+T (pyrimidines, Y), G+T (Keto, K), A+C (Imino, M), A+T (Weak, W) and G+C (Strong, S). It is long known that all-pyrimidine tracts, complemented by all-purines tracts ("R.Y tracts"), are excessively present in analyzed DNA. We have previously shown that R.Y tracts are in vast excess in yeast promoters, and brought evidence for their role in gene regulation. Here we report the systematic mapping of all six binary combinations on the level of complete sequenced chromosomes, as well as in their different subregions. RESULTS DNA tracts composed of the above binary base combinations have been mapped in seven sequenced chromosomes: Human chromosomes 21 and 22 (the major contigs); Drosophila melanogaster chr. 2R; Caenorhabditis elegans chr. I; Arabidopsis thaliana chr. II; Saccharomyces cerevisiae chr. IV and M. jannaschii. A huge over-representation, reaching million-folds, has been found for very long tracts of all binary motifs except S, in each of the seven organisms. Long R.Y tracts are the most excessive, except in D. melanogaster, where the K.M motif predominates. S (G, C rich) tracts are in excess mainly in CpG islands; the W motif predominates in bacteria. Many excessively long W tracts are nevertheless found also in the archeon and in the eukaryotes. The survey of complete chromosomes enables us, for the first time, to map systematically the intergenic regions. In human and other chromosomes we find the highest over-representation of the binary DNA tracts in the intergenic regions. These over-representations are only partly explainable by the presence of interspersed elements. CONCLUSIONS The over-representation of long DNA tracts composed of five of the above motifs is the largest deviation from randomness so far established for DNA, and this in a wide range of eukaryotic and archeal chromosomes. A propensity for ready DNA unwinding is proposed as the functional role, explaining the evolutionary conservation of the huge excesses observed.
Collapse
Affiliation(s)
- Gad Yagil
- Dept of Molecular Cell Biology, The Weizmann Institute of Biology, Rehovot, Israel 76100.
| |
Collapse
|
248
|
Touchon M, Nicolay S, Arneodo A, d'Aubenton-Carafa Y, Thermes C. Transcription-coupled TA and GC strand asymmetries in the human genome. FEBS Lett 2004; 555:579-82. [PMID: 14675777 DOI: 10.1016/s0014-5793(03)01306-1] [Citation(s) in RCA: 59] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Analysis of the whole set of human genes reveals that most of them present TA and GC skews, that these biases are correlated to each other and are specific to gene sequences, exhibiting sharp transitions between transcribed and non-transcribed regions. The GC asymmetries cannot be explained solely by a model previously proposed for (G+T) skew based on transitions measured in a small set of human genes. We propose that the GC skew results from additional transcription-coupled mutation process that would include transversions. During evolution, both processes acting on a large majority of genes in germline cells would have produced these transcription-coupled strand asymmetries.
Collapse
Affiliation(s)
- M Touchon
- Centre de Génétique Moléculaire, CNRS, Allée de la Terrasse, 91198, Gif-sur-Yvette, France
| | | | | | | | | |
Collapse
|
249
|
Conde J. Twofold symmetries in nucleotide distribution in large domains of Saccharomyces cerevisiae Chromosome I. Mol Genet Genomics 2003; 270:287-95. [PMID: 14600830 DOI: 10.1007/s00438-003-0871-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2003] [Accepted: 05/27/2003] [Indexed: 11/26/2022]
Abstract
Single stranded chains of biological DNA show a widespread occurrence of parity for complementary nucleotides, i.e., A=T, G=C. This has been referred to as A-T, G-C symmetry. A distinction must be made between this, which this paper calls mirror symmetry, and twofold symmetry, where complementary nucleotide parity occurs between two segments, of the same length and equidistant from a symmetry center, along a single-stranded DNA chain. I have analysed the sequence of Chromosome I of Saccharomyces cerevisiae for the occurrence of complementary nucleotide symmetry. Open reading frame (ORF) sequences made up 63% of the total chromosome length and most of them were asymmetric for both A-T and G-C. The sign of A-T asymmetry was correlated with transcriptional orientation (A>T for sense and A<T for antisense ORFs), whereas G-C asymmetry was not. However, long single-stranded segments of Chromosome I were A-T mirror symmetric because they contained similar frequencies of ORFs in both transcriptional orientations. The same results were obtained with the AA-TT pair of complementary dinucleotides. Profiling of AA-TT symmetry along Chromosome I showed this chromosome to be organized as a succession of five domains that were twofold symmetric for AA-TT, placed between two subtelomeric regions without clear symmetry properties. This pattern was destroyed when ORF sequences were randomly repositioned along the chromosome. Based on the above findings, an architectural model is proposed for Chromosome I, in which the twofold symmetric domains, from 30 to 50 kb long, correspond to chromosome loops.
Collapse
|
250
|
Chattopadhyay S, Chakrabarti J. Temporal changes in phosphoglycerate kinase coding sequences: a quantitative measure. J Comput Biol 2003; 10:83-93. [PMID: 12676052 DOI: 10.1089/106652703763255688] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The ratio of the average of the square of the number of the nucleotides to that of the random sequence of the same strand bias is proposed as a quantitative measure of evolution in some coding DNA sequences. Applying this measure to the phosphoglycerate kinase gene we observe a monotonic rise of the ratio with evolution. We present an interpretation of this data on some bacteria.
Collapse
Affiliation(s)
- Sujay Chattopadhyay
- Department of Theoretical Physics, Indian Association for the Cultivation of Science, Calcutta 700 032,
| | | |
Collapse
|