101
|
Bohlin J, Skjerve E, Ussery DW. Analysis of genomic signatures in prokaryotes using multinomial regression and hierarchical clustering. BMC Genomics 2009; 10:487. [PMID: 19845945 PMCID: PMC2770534 DOI: 10.1186/1471-2164-10-487] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2009] [Accepted: 10/21/2009] [Indexed: 11/26/2022] Open
Abstract
Background Recently there has been an explosion in the availability of bacterial genomic sequences, making possible now an analysis of genomic signatures across more than 800 hundred different bacterial chromosomes, from a wide variety of environments. Using genomic signatures, we pair-wise compared 867 different genomic DNA sequences, taken from chromosomes and plasmids more than 100,000 base-pairs in length. Hierarchical clustering was performed on the outcome of the comparisons before a multinomial regression model was fitted. The regression model included the cluster groups as the response variable with AT content, phyla, growth temperature, selective pressure, habitat, sequence size, oxygen requirement and pathogenicity as predictors. Results Many significant factors were associated with the genomic signature, most notably AT content. Phyla was also an important factor, although considerably less so than AT content. Small improvements to the regression model, although significant, were also obtained by factors such as sequence size, habitat, growth temperature, selective pressure measured as oligonucleotide usage variance, and oxygen requirement. Conclusion The statistics obtained using hierarchical clustering and multinomial regression analysis indicate that the genomic signature is shaped by many factors, and this may explain the varying ability to classify prokaryotic organisms below genus level.
Collapse
Affiliation(s)
- Jon Bohlin
- Norwegian School of Veterinary Science, Oslo, Norway.
| | | | | |
Collapse
|
102
|
Yap VB, Lindsay H, Easteal S, Huttley G. Estimates of the effect of natural selection on protein-coding content. Mol Biol Evol 2009; 27:726-34. [PMID: 19815689 PMCID: PMC2822286 DOI: 10.1093/molbev/msp232] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
Analysis of natural selection is key to understanding many core biological processes, including the emergence of competition, cooperation, and complexity, and has important applications in the targeted development of vaccines. Selection is hard to observe directly but can be inferred from molecular sequence variation. For protein-coding nucleotide sequences, the ratio of nonsynonymous to synonymous substitutions (omega) distinguishes neutrally evolving sequences (omega = 1) from those subjected to purifying (omega < 1) or positive Darwinian (omega > 1) selection. We show that current models used to estimate omega are substantially biased by naturally occurring sequence compositions. We present a novel model that weights substitutions by conditional nucleotide frequencies and which escapes these artifacts. Applying it to the genomes of pathogens causing malaria, leprosy, tuberculosis, and Lyme disease gave significant discrepancies in estimates with approximately 10-30% of genes affected. Our work has substantial implications for how vaccine targets are chosen and for studying the molecular basis of adaptive evolution.
Collapse
Affiliation(s)
- Von Bing Yap
- Department of Statistics and Applied Probability, National University of Singapore, Singapore, Singapore.
| | | | | | | |
Collapse
|
103
|
A web server for interactive and zoomable Chaos Game Representation images. SOURCE CODE FOR BIOLOGY AND MEDICINE 2009; 4:6. [PMID: 19761591 PMCID: PMC2753581 DOI: 10.1186/1751-0473-4-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/05/2009] [Accepted: 09/17/2009] [Indexed: 11/10/2022]
Abstract
Chaos Game Representation (CGR) is a generalized scale-independent Markov transition table, which is useful for the visualization and comparative study of genomic signature, or for the study of characteristic sequence motifs. However, in order to fully utilize the scale-independent properties of CGR, it should be accessible through scale-independent user interface instead of static images. Here we describe a web server and Perl library for generating zoomable CGR images utilizing Google Maps API, which is also easily searchable for specific motifs. The web server is freely accessible at http://www.g-language.org/wiki/cgr/, and the Perl library as well as the source code is distributed with the G-language Genome Analysis Environment under GNU General Public License.
Collapse
|
104
|
The genome of Nectria haematococca: contribution of supernumerary chromosomes to gene expansion. PLoS Genet 2009; 5:e1000618. [PMID: 19714214 PMCID: PMC2725324 DOI: 10.1371/journal.pgen.1000618] [Citation(s) in RCA: 317] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2009] [Accepted: 07/27/2009] [Indexed: 11/19/2022] Open
Abstract
The ascomycetous fungus Nectria haematococca, (asexual name Fusarium solani), is a member of a group of >50 species known as the "Fusarium solani species complex". Members of this complex have diverse biological properties including the ability to cause disease on >100 genera of plants and opportunistic infections in humans. The current research analyzed the most extensively studied member of this complex, N. haematococca mating population VI (MPVI). Several genes controlling the ability of individual isolates of this species to colonize specific habitats are located on supernumerary chromosomes. Optical mapping revealed that the sequenced isolate has 17 chromosomes ranging from 530 kb to 6.52 Mb and that the physical size of the genome, 54.43 Mb, and the number of predicted genes, 15,707, are among the largest reported for ascomycetes. Two classes of genes have contributed to gene expansion: specific genes that are not found in other fungi including its closest sequenced relative, Fusarium graminearum; and genes that commonly occur as single copies in other fungi but are present as multiple copies in N. haematococca MPVI. Some of these additional genes appear to have resulted from gene duplication events, while others may have been acquired through horizontal gene transfer. The supernumerary nature of three chromosomes, 14, 15, and 17, was confirmed by their absence in pulsed field gel electrophoresis experiments of some isolates and by demonstrating that these isolates lacked chromosome-specific sequences found on the ends of these chromosomes. These supernumerary chromosomes contain more repeat sequences, are enriched in unique and duplicated genes, and have a lower G+C content in comparison to the other chromosomes. Although the origin(s) of the extra genes and the supernumerary chromosomes is not known, the gene expansion and its large genome size are consistent with this species' diverse range of habitats. Furthermore, the presence of unique genes on supernumerary chromosomes might account for individual isolates having different environmental niches.
Collapse
|
105
|
Lichtenberg J, Jacox E, Welch JD, Kurz K, Liang X, Yang MQ, Drews F, Ecker K, Lee SS, Elnitski L, Welch LR. Word-based characterization of promoters involved in human DNA repair pathways. BMC Genomics 2009; 10 Suppl 1:S18. [PMID: 19594877 PMCID: PMC2709261 DOI: 10.1186/1471-2164-10-s1-s18] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background DNA repair genes provide an important contribution towards the surveillance and repair of DNA damage. These genes produce a large network of interacting proteins whose mRNA expression is likely to be regulated by similar regulatory factors. Full characterization of promoters of DNA repair genes and the similarities among them will more fully elucidate the regulatory networks that activate or inhibit their expression. To address this goal, the authors introduce a technique to find regulatory genomic signatures, which represents a specific application of the genomic signature methodology to classify DNA sequences as putative functional elements within a single organism. Results The effectiveness of the regulatory genomic signatures is demonstrated via analysis of promoter sequences for genes in DNA repair pathways of humans. The promoters are divided into two classes, the bidirectional promoters and the unidirectional promoters, and distinct genomic signatures are calculated for each class. The genomic signatures include statistically overrepresented words, word clusters, and co-occurring words. The robustness of this method is confirmed by the ability to identify sequences that exist as motifs in TRANSFAC and JASPAR databases, and in overlap with verified binding sites in this set of promoter regions. Conclusion The word-based signatures are shown to be effective by finding occurrences of known regulatory sites. Moreover, the signatures of the bidirectional and unidirectional promoters of human DNA repair pathways are clearly distinct, exhibiting virtually no overlap. In addition to providing an effective characterization method for related DNA sequences, the signatures elucidate putative regulatory aspects of DNA repair pathways, which are notably under-characterized.
Collapse
Affiliation(s)
- Jens Lichtenberg
- Bioinformatics Laboratory, School of Electrical Engineering and Computer Science, Ohio University, Athens, Ohio, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
106
|
Suzuki H, Saito R, Tomita M. Measure of synonymous codon usage diversity among genes in bacteria. BMC Bioinformatics 2009; 10:167. [PMID: 19480720 PMCID: PMC2697163 DOI: 10.1186/1471-2105-10-167] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2008] [Accepted: 06/01/2009] [Indexed: 11/10/2022] Open
Abstract
Background In many bacteria, intragenomic diversity in synonymous codon usage among genes has been reported. However, no quantitative attempt has been made to compare the diversity levels among different genomes. Here, we introduce a mean dissimilarity-based index (Dmean) for quantifying the level of diversity in synonymous codon usage among all genes within a genome. Results The application of Dmean to 268 bacterial genomes shows that in bacteria with extremely biased genomic G+C compositions there is little diversity in synonymous codon usage among genes. Furthermore, our findings contradict previous reports. For example, a low level of diversity in codon usage among genes has been reported for Helicobacter pylori, but based on Dmean, the diversity level of this species is higher than those of more than half of bacteria tested here. The discrepancies between our findings and previous reports are probably due to differences in the methods used for measuring codon usage diversity. Conclusion We recommend that Dmean be used to measure the diversity level of codon usage among genes. This measure can be applied to other compositional features such as amino acid usage and dinucleotide relative abundance as a genomic signature.
Collapse
Affiliation(s)
- Haruo Suzuki
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Yamagata, 997-0017, Japan.
| | | | | |
Collapse
|
107
|
Tzahor S, Man-Aharonovich D, Kirkup BC, Yogev T, Berman-Frank I, Polz MF, Béjà O, Mandel-Gutfreund Y. A supervised learning approach for taxonomic classification of core-photosystem-II genes and transcripts in the marine environment. BMC Genomics 2009; 10:229. [PMID: 19445709 PMCID: PMC2696472 DOI: 10.1186/1471-2164-10-229] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2008] [Accepted: 05/16/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Cyanobacteria of the genera Synechococcus and Prochlorococcus play a key role in marine photosynthesis, which contributes to the global carbon cycle and to the world oxygen supply. Recently, genes encoding the photosystem II reaction center (psbA and psbD) were found in cyanophage genomes. This phenomenon suggested that the horizontal transfer of these genes may be involved in increasing phage fitness. To date, a very small percentage of marine bacteria and phages has been cultured. Thus, mapping genomic data extracted directly from the environment to its taxonomic origin is necessary for a better understanding of phage-host relationships and dynamics. RESULTS To achieve an accurate and rapid taxonomic classification, we employed a computational approach combining a multi-class Support Vector Machine (SVM) with a codon usage position specific scoring matrix (cuPSSM). Our method has been applied successfully to classify core-photosystem-II gene fragments, including partial sequences coming directly from the ocean, to seven different taxonomic classes. Applying the method on a large set of DNA and RNA psbA clones from the Mediterranean Sea, we studied the distribution of cyanobacterial psbA genes and transcripts in their natural environment. Using our approach, we were able to simultaneously examine taxonomic and ecological distributions in the marine environment. CONCLUSION The ability to accurately classify the origin of individual genes and transcripts coming directly from the environment is of great importance in studying marine ecology. The classification method presented in this paper could be applied further to classify other genes amplified from the environment, for which training data is available.
Collapse
Affiliation(s)
- Shani Tzahor
- Faculty of Biology, Technion – Israel Institute of Technology, Haifa 32000, Israel
- Inter-Departmental Program for Biotechnology, Technion – Israel Institute of Technology, Haifa 32000, Israel
| | | | - Benjamin C Kirkup
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Tali Yogev
- Faculty of Life Sciences, Bar-Ilan University, Ramat Gan 52900, Israel
| | | | - Martin F Polz
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Oded Béjà
- Faculty of Biology, Technion – Israel Institute of Technology, Haifa 32000, Israel
| | | |
Collapse
|
108
|
Tzahor S, Man-Aharonovich D, Kirkup BC, Yogev T, Berman-Frank I, Polz MF, Béjà O, Mandel-Gutfreund Y. A supervised learning approach for taxonomic classification of core-photosystem-II genes and transcripts in the marine environment. BMC Genomics 2009. [PMID: 19445709 DOI: 10.1186/1471-2164-10-229.] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Cyanobacteria of the genera Synechococcus and Prochlorococcus play a key role in marine photosynthesis, which contributes to the global carbon cycle and to the world oxygen supply. Recently, genes encoding the photosystem II reaction center (psbA and psbD) were found in cyanophage genomes. This phenomenon suggested that the horizontal transfer of these genes may be involved in increasing phage fitness. To date, a very small percentage of marine bacteria and phages has been cultured. Thus, mapping genomic data extracted directly from the environment to its taxonomic origin is necessary for a better understanding of phage-host relationships and dynamics. RESULTS To achieve an accurate and rapid taxonomic classification, we employed a computational approach combining a multi-class Support Vector Machine (SVM) with a codon usage position specific scoring matrix (cuPSSM). Our method has been applied successfully to classify core-photosystem-II gene fragments, including partial sequences coming directly from the ocean, to seven different taxonomic classes. Applying the method on a large set of DNA and RNA psbA clones from the Mediterranean Sea, we studied the distribution of cyanobacterial psbA genes and transcripts in their natural environment. Using our approach, we were able to simultaneously examine taxonomic and ecological distributions in the marine environment. CONCLUSION The ability to accurately classify the origin of individual genes and transcripts coming directly from the environment is of great importance in studying marine ecology. The classification method presented in this paper could be applied further to classify other genes amplified from the environment, for which training data is available.
Collapse
Affiliation(s)
- Shani Tzahor
- Faculty of Biology, Technion - Israel Institute of Technology, Haifa, Israel.
| | | | | | | | | | | | | | | |
Collapse
|
109
|
Guo WJ, Ling J, Li P. Consensus features of microsatellite distribution: Microsatellite contents are universally correlated with recombination rates and are preferentially depressed by centromeres in multicellular eukaryotic genomes. Genomics 2009; 93:323-31. [DOI: 10.1016/j.ygeno.2008.12.009] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2008] [Revised: 12/14/2008] [Accepted: 12/16/2008] [Indexed: 10/21/2022]
|
110
|
Ilatovskiy A, Petukhov M. Genome-Wide Search for Local DNA Segments with Anomalous GC-Content. J Comput Biol 2009; 16:555-64. [DOI: 10.1089/cmb.2008.0159] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Andrey Ilatovskiy
- Division of Molecular and Radiation Biophysics, Petersburg Nuclear Physics Institute, Russian Academy of Sciences, Gatchina/St. Petersburg, and Research and Education Centre “Biophysics,” PNPI RAS and St. Petersburg State Polytecnic University, St. Petersburg, Russia
| | - Michael Petukhov
- Division of Molecular and Radiation Biophysics, Petersburg Nuclear Physics Institute, Russian Academy of Sciences, Gatchina/St. Petersburg, and Research and Education Centre “Biophysics,” PNPI RAS and St. Petersburg State Polytecnic University, St. Petersburg, Russia
| |
Collapse
|
111
|
Willner D, Thurber RV, Rohwer F. Metagenomic signatures of 86 microbial and viral metagenomes. Environ Microbiol 2009; 11:1752-66. [PMID: 19302541 DOI: 10.1111/j.1462-2920.2009.01901.x] [Citation(s) in RCA: 110] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
Previous studies have shown that dinucleotide abundances capture the majority of variation in genome signatures and are useful for quantifying lateral gene transfer and building molecular phylogenies. Metagenomes contain a mixture of individual genomes, and might be expected to lack compositional signatures. In many metagenomic data sets the majority of sequences have no significant similarities to known sequences and are effectively excluded from subsequent analyses. To circumvent this limitation, di-, tri- and tetranucleotide abundances of 86 microbial and viral metagenomes consisting of short pyrosequencing reads were analysed to provide a method which includes all sequences that can be used in combination with other analysis to increase our knowledge about microbial and viral communities. Both principal component analysis and hierarchical clustering showed definitive groupings of metagenomes drawn from similar environments. Together these analyses showed that dinucleotide composition, as opposed to tri- and tetranucleotides, defines a metagenomic signature which can explain up to 80% of the variance between biomes, which is comparable to that obtained by functional genomics. Metagenomes with anomalous content were also identified using dinucleotide abundances. Subsequent analyses determined that these metagenomes were contaminated with exogenous DNA, suggesting that this approach is a useful metric for quality control. The predictive strength of the dinucleotide composition also opens the possibility of assigning ecological classifications to unknown fragments. Environmental selection may be responsible for this dinucleotide signature through direct selection of specific compositional signals; however, simulations suggest that the environment may select indirectly by promoting the increased abundance of a few dominant taxa.
Collapse
Affiliation(s)
- Dana Willner
- Department of Biology, San Diego State University, 5500 Campanile Dr., San Diego, CA 92182, USA.
| | | | | |
Collapse
|
112
|
Mrazek J. Phylogenetic Signals in DNA Composition: Limitations and Prospects. Mol Biol Evol 2009; 26:1163-9. [DOI: 10.1093/molbev/msp032] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
|
113
|
Takahashi M, Kryukov K, Saitou N. Estimation of bacterial species phylogeny through oligonucleotide frequency distances. Genomics 2009; 93:525-33. [PMID: 19442633 DOI: 10.1016/j.ygeno.2009.01.009] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2008] [Revised: 01/30/2009] [Accepted: 01/30/2009] [Indexed: 10/21/2022]
Abstract
Classification of bacteria is mainly based on sequence comparisons of certain homologous genes such as 16S rRNA. Recently there are challenges to classify bacteria using oligonucleotide frequency pattern of nonhomologous sequences. However, the evolutionary significance of oligonucleotides longer than tetra-nucleotide is not studied well. We performed phylogenetic analysis by using the Euclidean distances calculated from the di to deca-nucleotide frequencies in bacterial genomes, and compared these oligonucleotide frequency-based tree topologies with those for 16S rRNA gene and concatenated seven genes. When oligonucleotide frequency-based trees were constructed for bacterial species with similar GC content, their topologies at genus and family level were congruent with those based on homologous genes. Our results suggest that oligonucleotide frequency is useful not only for classification of bacteria, but also for estimation of their phylogenetic relationships for closely related species.
Collapse
Affiliation(s)
- Mahoko Takahashi
- Department of Genetics, School of Life Science, Graduate University for Advanced Studies, Mishima 411-8540, Japan
| | | | | |
Collapse
|
114
|
Ahmed S, Saito A, Suzuki M, Nemoto N, Nishigaki K. Host-parasite relations of bacteria and phages can be unveiled by oligostickiness, a measure of relaxed sequence similarity. Bioinformatics 2009; 25:563-70. [PMID: 19126576 DOI: 10.1093/bioinformatics/btp003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The recent metagenome analysis has been producing a large number of host-unassigned viruses. Although assigning viruses to their hosts is basically important not only for virology but also for prevention of epidemic, it has been a laborious and difficult task to date. The only effective method for this purpose has been to find them in a same microscopic view. Now, we tried a computational approach based on genome sequences of bacteria and phages, introducing a physicochemical parameter, SOSS (set of oligostickiness similarity score) derived from oligostickiness, a measure of binding affinity of oligonucleotides to template DNA. RESULTS We could confirm host-parasite relationships of bacteria and their phages by SOSS analysis: all phages tested (25 species) had a remarkably higher SOSS value with its host than with unrelated bacteria. Interestingly, according to SOSS values, lysogenic phages such as lambda phage (host: Escherichia coli) or SPP1 (host: Bacillus subtilis) have distinctively higher similarity with its host than its non-lysogenic (excretive or virulent) ones such as fd and T4 (host: E.coli) or phages gamma and PZA (host: B.subtilis). This finding is very promising for assigning host-unknown viruses to its host. We also investigated the relationship in codon usage frequency or G+C content of genomes to interpret the phenomenon revealed by SOSS analysis, obtaining evidences which support the hypothesis that higher SOSS values resulted from the cohabitation in the same environment which may cause the common biased mutation. Thus, lysogenic phages which stay inside longer resemble the host.
Collapse
Affiliation(s)
- Shamim Ahmed
- Graduate School of Science and Engineering, Saitama University, Saitama 338-8570, Japan
| | | | | | | | | |
Collapse
|
115
|
Huttley G. Do genomic datasets resolve the correct relationship among the placental, marsupial and monotreme lineages? AUST J ZOOL 2009. [DOI: 10.1071/zo09049] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Did the mammal radiation arise through initial divergence of prototherians from a common ancestor of metatherians and eutherians, the Theria hypothesis, or of eutherians from a common ancestor of metatherians and prototherians, the Marsupionta hypothesis? Molecular phylogenetic analyses of point substitutions applied to this problem have been contradictory – mtDNA-encoded sequences supported Marsupionta, nuclear-encoded sequences and RY (purine–pyrimidine)-recoded mtDNA supported Theria. The consistency property of maximum likelihood guarantees convergence on the true tree only with longer alignments. Results from analyses of genome datasets should therefore be impervious to choice of outgroup. We assessed whether important hypotheses concerning mammal evolution, including Theria/Marsupionta and the branching order of rodents, carnivorans and primates, are resolved by phylogenetic analyses using ~2.3 megabases of protein-coding sequence from genome projects. In each case, only two tree topologies were being compared and thus inconsistency in resolved topologies can only derive from flawed models of sequence divergence. The results from all substitution models strongly supported Theria. For the eutherian lineages, all models were sensitive to the outgroup. We argue that phylogenetic inference from point substitutions will remain unreliable until substitution models that better match biological mechanisms of sequence divergence have been developed.
Collapse
|
116
|
|
117
|
van Passel MWJ, de Graaff LH. Mononucleotide repeats are asymmetrically distributed in fungal genes. BMC Genomics 2008; 9:596. [PMID: 19077233 PMCID: PMC2621210 DOI: 10.1186/1471-2164-9-596] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2008] [Accepted: 12/11/2008] [Indexed: 11/10/2022] Open
Abstract
Background Systematic analyses of sequence features have resulted in a better characterisation of the organisation of the genome. A previous study in prokaryotes on the distribution of sequence repeats, which are notoriously variable and can disrupt the reading frame in genes, showed that these motifs are skewed towards gene termini, specifically the 5' end of genes. For eukaryotes no such intragenic analysis has been performed, though this could indicate the pervasiveness of this distribution bias, thereby helping to expose the selective pressures causing it. Results In fungal gene repertoires we find a similar 5' bias of intragenic mononucleotide repeats, most notably for Candida spp., whereas e.g. Coccidioides spp. display no such bias. With increasing repeat length, ever larger discrepancies are observed in genome repertoire fractions containing such repeats, with up to an 80-fold difference in gene fractions at repeat lengths of 10 bp and longer. This species-specific difference in gene fractions containing large repeats could be attributed to variations in intragenic repeat tolerance. Furthermore, long transcripts experience an even more prominent bias towards the gene termini, with possibly a more adaptive role for repeat-containing short transcripts. Conclusion Mononucleotide repeats are intragenically biased in numerous fungal genomes, similar to earlier studies on prokaryotes, indicative of a similar selective pressure in gene organization.
Collapse
Affiliation(s)
- Mark W J van Passel
- Laboratory of Microbiology, Wageningen University, Wageningen, the Netherlands.
| | | |
Collapse
|
118
|
Wang A, Ren L, Abenes G, Hai R. Genome sequence divergences and functional variations in human cytomegalovirus strains. ACTA ACUST UNITED AC 2008; 55:23-33. [PMID: 19076227 DOI: 10.1111/j.1574-695x.2008.00489.x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
Genome sequences of numerous and wide-ranging species have been completed, but genome-wide sequence variation patterns linked to biological functions are just starting to be investigated. Here, by comparatively analyzing the genome variation patterns of human cytomegalovirus (HCMV) genomes, we revealed large sequence divergences and functional variations existing in HCMV genomes. They are divergent in genome-size, inversion, orientation and coding potential, even within conserved genes, including nucleotide polymorphism, DNA strand composition asymmetry, and evolutionary rate variation in conserved genes. These divergences in conserved genes are linked to HCMV biology. Codon usage variation of conserved genes located in the negative DNA strand is significantly different between HCMV strains, and this variation associates with virion production and virulence factor, suggesting that the negative DNA strand primarily contributes to virion production and virulence factor in HCMV. In addition, we also revealed that genes functioning for entry and egress are the most adaptable, and that those for transcription and replication are the most conserved in HCMV genomes. The conserved-transcription system is generally controlled by a genome-wide motif GCGC revealed in this study by Chaos map analysis. Our findings demonstrated that genome sequences of HCMV are generally divergent and these divergences directly reflect viral biology.
Collapse
Affiliation(s)
- Anyou Wang
- School of Public Health, University of California, Berkeley, CA, USA.
| | | | | | | |
Collapse
|
119
|
Suzuki H, Sota M, Brown CJ, Top EM. Using Mahalanobis distance to compare genomic signatures between bacterial plasmids and chromosomes. Nucleic Acids Res 2008; 36:e147. [PMID: 18953039 PMCID: PMC2602791 DOI: 10.1093/nar/gkn753] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Plasmids are ubiquitous mobile elements that serve as a pool of many host beneficial traits such as antibiotic resistance in bacterial communities. To understand the importance of plasmids in horizontal gene transfer, we need to gain insight into the ‘evolutionary history’ of these plasmids, i.e. the range of hosts in which they have evolved. Since extensive data support the proposal that foreign DNA acquires the host's nucleotide composition during long-term residence, comparison of nucleotide composition of plasmids and chromosomes could shed light on a plasmid's evolutionary history. The average absolute dinucleotide relative abundance difference, termed δ-distance, has been commonly used to measure differences in dinucleotide composition, or ‘genomic signature’, between bacterial chromosomes and plasmids. Here, we introduce the Mahalanobis distance, which takes into account the variance–covariance structure of the chromosome signatures. We demonstrate that the Mahalanobis distance is better than the δ-distance at measuring genomic signature differences between plasmids and chromosomes of potential hosts. We illustrate the usefulness of this metric for proposing candidate long-term hosts for plasmids, focusing on the virulence plasmids pXO1 from Bacillus anthracis, and pO157 from Escherichia coli O157:H7, as well as the broad host range multi-drug resistance plasmid pB10 from an unknown host.
Collapse
Affiliation(s)
- Haruo Suzuki
- Department of Biological Sciences, University of Idaho, Moscow, ID 83844, USA
| | | | | | | |
Collapse
|
120
|
Stabler RA, Dawson LF, Oyston PCF, Titball RW, Wade J, Hinds J, Witney AA, Wren BW. Development and application of the active surveillance of pathogens microarray to monitor bacterial gene flux. BMC Microbiol 2008; 8:177. [PMID: 18844996 PMCID: PMC2607285 DOI: 10.1186/1471-2180-8-177] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2008] [Accepted: 10/09/2008] [Indexed: 11/23/2022] Open
Abstract
Background Human and animal health is constantly under threat by emerging pathogens that have recently acquired genetic determinants that enhance their survival, transmissibility and virulence. We describe the construction and development of an Active Surveillance of Pathogens (ASP) oligonucleotide microarray, designed to 'actively survey' the genome of a given bacterial pathogen for virulence-associated genes. Results The microarray consists of 4958 reporters from 151 bacterial species and include genes for the identification of individual bacterial species as well as mobile genetic elements (transposons, plasmid and phage), virulence genes and antibiotic resistance genes. The ASP microarray was validated with nineteen bacterial pathogens species, including Francisella tularensis, Clostridium difficile, Staphylococcus aureus, Enterococcus faecium and Stenotrophomonas maltophilia. The ASP microarray identified these bacteria, and provided information on potential antibiotic resistance (eg sufamethoxazole resistance and sulfonamide resistance) and virulence determinants including genes likely to be acquired by horizontal gene transfer (e.g. an alpha-haemolysin). Conclusion The ASP microarray has potential in the clinic as a diagnostic tool, as a research tool for both known and emerging pathogens, and as an early warning system for pathogenic bacteria that have been recently modified either naturally or deliberately.
Collapse
Affiliation(s)
- Richard A Stabler
- Department of Infectious and Tropical Diseases, Keppel Street, London School of Hygiene and Tropical Medicine, London, WC1E 7HT, UK.
| | | | | | | | | | | | | | | |
Collapse
|
121
|
Cutler RW, Chantawannakul P. Synonymous codon usage bias dependent on local nucleotide context in the class Deinococci. J Mol Evol 2008; 67:301-14. [PMID: 18696025 DOI: 10.1007/s00239-008-9152-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2008] [Accepted: 07/14/2008] [Indexed: 11/25/2022]
Abstract
To study the evolution of mutation biased synonymous codon usage, we examined nucleotide co-occurrence patterns in the Deinococcus radiodurans, D. geothermalis, and Thermus thermophilus genomes for nucleotide replacement dependent on the surrounding nucleotide context. Nucleotides on the third codon site were found to be strongly correlated with nucleotide sites at most six nucleotides away in all three species, where abundance patterns were dependent on whether two nucleotides share the same purine(R)/pyrimidine(Y) status. In the class Deinococci adjacent third site nucleotides were strongly correlated, where NNR|NNR and NNY|NNY codon pairs were overabundant while NNR|NNY and NNY|NNR codon pairs were underabundant. By far the largest deviations in all three species occur for NN(YR)|(YR)NN codon pairs. In the Thermus species, the NNY|YNN and NNR|RNN codon pairs were overabundant versus the underabundant NNY|RNN and NNR|YNN codon pairs, whereas in the Deinococcus species the opposite over-/underabundance relationship held for adjacent (GC) bases. We also observed a weaker overabundance of NNR|NRN and NNY|NYN codon pairs versus the underabundant NNR|NYN and NNY|NRN codon pairs. The perfect purine/pyrimidine symmetry of each of these cases, plus the lack of significant deviations for nucleotide pairs on other length scales up to 20 codons apart demonstrates that a pervasive pattern of nucleotide replacement dependent on local nucleotide context, and not codon bias, has occurred in these species. This nucleotide replacement has led to modified synonymous codon usage within the class Deinococci that affects which codons are positioned at particular codon sites dependent on the local nucleotide context.
Collapse
|
122
|
Biro JC. Does codon bias have an evolutionary origin? Theor Biol Med Model 2008; 5:16. [PMID: 18667081 PMCID: PMC2519059 DOI: 10.1186/1742-4682-5-16] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2008] [Accepted: 07/30/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND There is a 3-fold redundancy in the Genetic Code; most amino acids are encoded by more than one codon. These synonymous codons are not used equally; there is a Codon Usage Bias (CUB). This article will provide novel information about the origin and evolution of this bias. RESULTS Codon Usage Bias (CUB, defined here as deviation from equal usage of synonymous codons) was studied in 113 species. The average CUB was 29.3 +/- 1.1% (S.E.M, n = 113) of the theoretical maximum and declined progressively with evolution and increasing genome complexity. A Pan-Genomic Codon Usage Frequency (CUF) Table was constructed to describe genome-wide relationships among codons. Significant correlations were found between the number of synonymous codons and (i) the frequency of the respective amino acids (ii) the size of CUB. Numerous, statistically highly significant, internal correlations were found among codons and the nucleic acids they comprise. These strong correlations made it possible to predict missing synonymous codons (wobble bases) reliably from the remaining codons or codon residues. CONCLUSION The results put the concept of "codon bias" into a novel perspective. The internal connectivity of codons indicates that all synonymous codons might be integrated parts of the Genetic Code with equal importance in maintaining its functional integrity.
Collapse
Affiliation(s)
- Jan C Biro
- Homulus Foundation, 612 S Flower St, Los Angeles, CA 90017, USA.
| |
Collapse
|
123
|
Fang X, Xu H, Zhang C, Chen H, Hu X, Gao X, Gu C, Yue W. Polymorphism in BMP4 gene and its association with growth traits in goats. Mol Biol Rep 2008; 36:1339-44. [DOI: 10.1007/s11033-008-9317-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2008] [Accepted: 07/03/2008] [Indexed: 02/06/2023]
|
124
|
Genomic mid-range inhomogeneity correlates with an abundance of RNA secondary structures. BMC Genomics 2008; 9:284. [PMID: 18549495 PMCID: PMC2442090 DOI: 10.1186/1471-2164-9-284] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2008] [Accepted: 06/12/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Genomes possess different levels of non-randomness, in particular, an inhomogeneity in their nucleotide composition. Inhomogeneity is manifest from the short-range where neighboring nucleotides influence the choice of base at a site, to the long-range, commonly known as isochores, where a particular base composition can span millions of nucleotides. A separate genomic issue that has yet to be thoroughly elucidated is the role that RNA secondary structure (SS) plays in gene expression. RESULTS We present novel data and approaches that show that a mid-range inhomogeneity (~30 to 1000 nt) not only exists in mammalian genomes but is also significantly associated with strong RNA SS. A whole-genome bioinformatics investigation of local SS in a set of 11,315 non-redundant human pre-mRNA sequences has been carried out. Four distinct components of these molecules (5'-UTRs, exons, introns and 3'-UTRs) were considered separately, since they differ in overall nucleotide composition, sequence motifs and periodicities. For each pre-mRNA component, the abundance of strong local SS (< -25 kcal/mol) was a factor of two to ten greater than a random expectation model. The randomization process preserves the short-range inhomogeneity of the corresponding natural sequences, thus, eliminating short-range signals as possible contributors to any observed phenomena. CONCLUSION We demonstrate that the excess of strong local SS in pre-mRNAs is linked to the little explored phenomenon of genomic mid-range inhomogeneity (MRI). MRI is an interdependence between nucleotide choice and base composition over a distance of 20-1000 nt. Additionally, we have created a public computational resource to support further study of genomic MRI.
Collapse
|
125
|
Ishoey T, Woyke T, Stepanauskas R, Novotny M, Lasken RS. Genomic sequencing of single microbial cells from environmental samples. Curr Opin Microbiol 2008; 11:198-204. [PMID: 18550420 DOI: 10.1016/j.mib.2008.05.006] [Citation(s) in RCA: 106] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2008] [Revised: 04/30/2008] [Accepted: 05/07/2008] [Indexed: 10/22/2022]
Abstract
Recently developed techniques allow genomic DNA sequencing from single microbial cells [Lasken RS: Single-cell genomic sequencing using multiple displacement amplification. Curr Opin Microbiol 2007, 10:510-516]. Here, we focus on research strategies for putting these methods into practice in the laboratory setting. An immediate consequence of single-cell sequencing is that it provides an alternative to culturing organisms as a prerequisite for genomic sequencing. The microgram amounts of DNA required as template are amplified from a single bacterium by a method called multiple displacement amplification (MDA) avoiding the need to grow cells. The ability to sequence DNA from individual cells will likely have an immense impact on microbiology considering the vast numbers of novel organisms, which have been inaccessible unless culture-independent methods could be used. However, special approaches have been necessary to work with amplified DNA. MDA may not recover the entire genome from the single copy present in most bacteria. Also, some sequence rearrangements can occur during the DNA amplification reaction. Over the past two years many research groups have begun to use MDA, and some practical approaches to single-cell sequencing have been developed. We review the consensus that is emerging on optimum methods, reliability of amplified template, and the proper interpretation of 'composite' genomes which result from the necessity of combining data from several single-cell MDA reactions in order to complete the assembly. Preferred laboratory methods are considered on the basis of experience at several large sequencing centers where >70% of genomes are now often recovered from single cells. Methods are reviewed for preparation of bacterial fractions from environmental samples, single-cell isolation, DNA amplification by MDA, and DNA sequencing.
Collapse
Affiliation(s)
- Thomas Ishoey
- J. Craig Venter Institute, 10355 Science Center Drive, San Diego, CA 92121, United States
| | | | | | | | | |
Collapse
|
126
|
The mosaic genome of Anaeromyxobacter dehalogenans strain 2CP-C suggests an aerobic common ancestor to the delta-proteobacteria. PLoS One 2008; 3:e2103. [PMID: 18461135 PMCID: PMC2330069 DOI: 10.1371/journal.pone.0002103] [Citation(s) in RCA: 109] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2008] [Accepted: 03/19/2008] [Indexed: 11/29/2022] Open
Abstract
Anaeromyxobacter dehalogenans strain 2CP-C is a versaphilic delta-Proteobacterium distributed throughout many diverse soil and sediment environments. 16S rRNA gene phylogenetic analysis groups A. dehalogenans together with the myxobacteria, which have distinguishing characteristics including strictly aerobic metabolism, sporulation, fruiting body formation, and surface motility. Analysis of the 5.01 Mb strain 2CP-C genome substantiated that this organism is a myxobacterium but shares genotypic traits with the anaerobic majority of the delta-Proteobacteria (i.e., the Desulfuromonadales). Reflective of its respiratory versatility, strain 2CP-C possesses 68 genes coding for putative c-type cytochromes, including one gene with 40 heme binding motifs. Consistent with its relatedness to the myxobacteria, surface motility was observed in strain 2CP-C and multiple types of motility genes are present, including 28 genes for gliding, adventurous (A-) motility and 17 genes for type IV pilus-based motility (i.e., social (S-) motility) that all have homologs in Myxococcus xanthus. Although A. dehalogenans shares many metabolic traits with the anaerobic majority of the delta-Proteobacteria, strain 2CP-C grows under microaerophilic conditions and possesses detoxification systems for reactive oxygen species. Accordingly, two gene clusters coding for NADH dehydrogenase subunits and two cytochrome oxidase gene clusters in strain 2CP-C are similar to those in M. xanthus. Remarkably, strain 2CP-C possesses a third NADH dehydrogenase gene cluster and a cytochrome cbb3 oxidase gene cluster, apparently acquired through ancient horizontal gene transfer from a strictly anaerobic green sulfur bacterium. The mosaic nature of the A. dehalogenans strain 2CP-C genome suggests that the metabolically versatile, anaerobic members of the delta-Proteobacteria may have descended from aerobic ancestors with complex lifestyles.
Collapse
|
127
|
Kuo CH, Kissinger JC. Consistent and contrasting properties of lineage-specific genes in the apicomplexan parasites Plasmodium and Theileria. BMC Evol Biol 2008; 8:108. [PMID: 18405380 PMCID: PMC2330040 DOI: 10.1186/1471-2148-8-108] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2007] [Accepted: 04/11/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Lineage-specific genes, the genes that are restricted to a limited subset of related organisms, may be important in adaptation. In parasitic organisms, lineage-specific gene products are possible targets for vaccine development or therapeutics when these genes are absent from the host genome. RESULTS In this study, we utilized comparative approaches based on a phylogenetic framework to characterize lineage-specific genes in the parasitic protozoan phylum Apicomplexa. Genes from species in two major apicomplexan genera, Plasmodium and Theileria, were categorized into six levels of lineage specificity based on a nine-species phylogeny. In both genera, lineage-specific genes tend to have a higher level of sequence divergence among sister species. In addition, species-specific genes possess a strong codon usage bias compared to other genes in the genome. We found that a large number of genus- or species-specific genes are putative surface antigens that may be involved in host-parasite interactions. Interestingly, the two parasite lineages exhibit several notable differences. In Plasmodium, the (G + C) content at the third codon position increases with lineage specificity while Theileria shows the opposite trend. Surface antigens in Plasmodium are species-specific and mainly located in sub-telomeric regions. In contrast, surface antigens in Theileria are conserved at the genus level and distributed across the entire lengths of chromosomes. CONCLUSION Our results provide further support for the model that gene duplication followed by rapid divergence is a major mechanism for generating lineage-specific genes. The result that many lineage-specific genes are putative surface antigens supports the hypothesis that lineage-specific genes could be important in parasite adaptation. The contrasting properties between the lineage-specific genes in two major apicomplexan genera indicate that the mechanisms of generating lineage-specific genes and the subsequent evolutionary fates can differ between related parasite lineages. Future studies that focus on improving functional annotation of parasite genomes and collection of genetic variation data at within- and between-species levels will be important in facilitating our understanding of parasite adaptation and natural selection.
Collapse
Affiliation(s)
- Chih-Horng Kuo
- Department of Genetics, University of Georgia, Athens, GA 30602, USA.
| | | |
Collapse
|
128
|
Larsson P, Hinas A, Ardell DH, Kirsebom LA, Virtanen A, Söderbom F. De novo search for non-coding RNA genes in the AT-rich genome of Dictyostelium discoideum: performance of Markov-dependent genome feature scoring. Genome Res 2008; 18:888-99. [PMID: 18347326 DOI: 10.1101/gr.069104.107] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Genome data are increasingly important in the computational identification of novel regulatory non-coding RNAs (ncRNAs). However, most ncRNA gene-finders are either specialized to well-characterized ncRNA gene families or require comparisons of closely related genomes. We developed a method for de novo screening for ncRNA genes with a nucleotide composition that stands out against the background genome based on a partial sum process. We compared the performance when assuming independent and first-order Markov-dependent nucleotides, respectively, and used Karlin-Altschul and Karlin-Dembo statistics to evaluate the significance of hits. We hypothesized that a first-order Markov-dependent process might have better power to detect ncRNA genes since nearest-neighbor models have been shown to be successful in predicting RNA structures. A model based on a first-order partial sum process (analyzing overlapping dinucleotides) had better sensitivity and specificity than a zeroth-order model when applied to the AT-rich genome of the amoeba Dictyostelium discoideum. In this genome, we detected 94% of previously known ncRNA genes (at this sensitivity, the false positive rate was estimated to be 25% in a simulated background). The predictions were further refined by clustering candidate genes according to sequence similarity and/or searching for an ncRNA-associated upstream element. We experimentally verified six out of 10 tested ncRNA gene predictions. We conclude that higher-order models, in combination with other information, are useful for identification of novel ncRNA gene families in single-genome analysis of D. discoideum. Our generalizable approach extends the range of genomic data that can be searched for novel ncRNA genes using well-grounded statistical methods.
Collapse
Affiliation(s)
- Pontus Larsson
- Department of Cell and Molecular Biology, Biomedical Center, Uppsala University, SE-75124 Uppsala, Sweden
| | | | | | | | | | | |
Collapse
|
129
|
Chantawannakul P, Cutler RW. Convergent host-parasite codon usage between honeybee and bee associated viral genomes. J Invertebr Pathol 2008; 98:206-10. [PMID: 18397791 DOI: 10.1016/j.jip.2008.02.016] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2007] [Revised: 02/25/2008] [Accepted: 02/27/2008] [Indexed: 10/22/2022]
Abstract
By correlating the codon usage in four insects (the honeybee, red flour beetle, mosquito and fruit fly) with six honeybee host specific viruses, we found that the codon usage patterns of the bee viruses were strongly related to that of the honeybee and only weakly related to the red flour beetle. The insects shared varying degrees of codon usage similarity which roughly follow the known phylogenetic relatedness. All of the codon usage similarity can be described by relatedness-by-descent except for the high codon usage similarity between the honeybee and honeybee associated viruses. This evidence for the convergent evolution of the honeybee viruses toward the codon usage of the honeybee suggests that small host specific viral genomes have the freedom to quickly optimize codon usage to successfully parasitize their preferred host. The codon usage co-evolution of the six host specific honeybee viruses towards the codon usage of the honeybee described in this paper is the first evidence for codon usage correlation between an insect host and a single stranded RNA virus.
Collapse
Affiliation(s)
- Panuwan Chantawannakul
- Department of Biology, Faculty of Science, Chiang Mai University, Thailand 50200, Thailand.
| | | |
Collapse
|
130
|
Arnau V, Gallach M, Marín I. Fast comparison of DNA sequences by oligonucleotide profiling. BMC Res Notes 2008; 1:5. [PMID: 18710530 PMCID: PMC2518268 DOI: 10.1186/1756-0500-1-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2008] [Accepted: 02/28/2008] [Indexed: 11/24/2022] Open
Abstract
Background The comparison of DNA sequences is a traditional problem in genomics and bioinformatics. Many new opportunities emerge due to the improvement of personal computers, allowing the implementation of novel strategies of analysis. Findings We describe a new program, called UVWORD, which determines the number of times that each DNA word present in a sequence (target) is found in a second sequence (source), a procedure that we have called oligonucleotide profiling. On a standard computer, the user may search for words of a size ranging from k = 1 to k = 14 nucleotides. Average counts for groups of contiguous words may also be established. The rate of analysis on standard computers is from 3.4 (k = 14) to 16 millions of words per second (1 ≤ k ≤ 8). This makes feasible the fast screening of even the longest known DNA molecules. Discussion We show that the combination of the ability of analyzing words of relatively long size, which occur very rarely by chance, and the fast speed of the program allows to perform novel types of screenings, complementary to those provided by standard programs such as BLAST. This method can be used to determine oligonucleotide content, to characterize the distribution of repetitive sequences in chromosomes, to determine the evolutionary conservation of sequences in different species, to establish regions of similar DNA among chromosomes or genomes, etc.
Collapse
Affiliation(s)
- Vicente Arnau
- Departmento de Informática. Universidad de Valencia, Spain. vicente@
| | | | | |
Collapse
|
131
|
Evans KJ. Genomic DNA from animals shows contrasting strand bias in large and small subsequences. BMC Genomics 2008; 9:43. [PMID: 18221531 PMCID: PMC2267173 DOI: 10.1186/1471-2164-9-43] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2007] [Accepted: 01/25/2008] [Indexed: 01/09/2023] Open
Abstract
Background For eukaryotes, there is almost no strand bias with regard to base composition, with exceptions for origins of replication and transcription start sites and transcribed regions. This paper revisits the question for subsequences of DNA taken at random from the genome. Results For a typical mammal, for example mouse or human, there is a small strand bias throughout the genomic DNA: there is a correlation between (G - C) and (A - T) on the same strand, (that is between the difference in the number of guanine and cytosine bases and the difference in the number of adenine and thymine bases). For small subsequences – up to 1 kb – this correlation is weak but positive; but for large windows – around 50 kb to 2 Mb – the correlation is strong and negative. This effect is largely independent of GC%. Transcribed and untranscribed regions give similar correlations both for small and large subsequences, but there is a difference in these regions for intermediate sized subsequences. An analysis of the human genome showed that position within the isochore structure did not affect these correlations. An analysis of available genomes of different species shows that this contrast between large and small windows is a general feature of mammals and birds. Further down the evolutionary tree, other organisms show a similar but smaller effect. Except for the nematode, all the animals analysed showed at least a small effect. Conclusion The correlations on the large scale may be explained by DNA replication. Transcription may be a modifier of these effects but is not the fundamental cause. These results cast light on how DNA mutations affect the genome over evolutionary time. At least for vertebrates, there is a broad relationship between body temperature and the size of the correlation. The genome of mammals and birds has a structure marked by strand bias segments.
Collapse
Affiliation(s)
- Kenneth J Evans
- School of Crystallography, Birkbeck College, University of London, Malet Street, London, WC1E 7HX, UK.
| |
Collapse
|
132
|
Evans KJ. Strand bias structure in mouse DNA gives a glimpse of how chromatin structure affects gene expression. BMC Genomics 2008; 9:16. [PMID: 18194530 PMCID: PMC2266913 DOI: 10.1186/1471-2164-9-16] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2007] [Accepted: 01/14/2008] [Indexed: 12/20/2022] Open
Abstract
Background On a single strand of genomic DNA the number of As is usually about equal to the number of Ts (and similarly for Gs and Cs), but deviations have been noted for transcribed regions and origins of replication. Results The mouse genome is shown to have a segmented structure defined by strand bias. Transcription is known to cause a strand bias and numerous analyses are presented to show that the strand bias in question is not caused by transcription. However, these strand bias segments influence the position of genes and their unspliced length. The position of genes within the strand bias structure affects the probability that a gene is switched on and its expression level. Transcription has a highly directional flow within this structure and the peak volume of transcription is around 20 kb from the A-rich/T-rich segment boundary on the T-rich side, directed away from the boundary. The A-rich/T-rich boundaries are SATB1 binding regions, whereas the T-rich/A-rich boundary regions are not. Conclusion The direct cause of the strand bias structure may be DNA replication. The strand bias segments represent a further biological feature, the chromatin structure, which in turn influences the ease of transcription.
Collapse
Affiliation(s)
- Kenneth J Evans
- School of Crystallography, Birkbeck College, University of London, Malet Street, London, WC1E 7HX, UK.
| |
Collapse
|
133
|
Demirev PA, Fenselau C. Mass spectrometry for rapid characterization of microorganisms. ANNUAL REVIEW OF ANALYTICAL CHEMISTRY (PALO ALTO, CALIF.) 2008; 1:71-93. [PMID: 20636075 DOI: 10.1146/annurev.anchem.1.031207.112838] [Citation(s) in RCA: 96] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Advances in instrumentation, proteomics, and bioinformatics have contributed to the successful applications of mass spectrometry (MS) for detection, identification, and classification of microorganisms. These MS applications are based on the detection of organism-specific biomarker molecules, which allow differentiation between organisms to be made. Intact proteins, their proteolytic peptides, and nonribosomal peptides have been successfully utilized as biomarkers. Sequence-specific fragments for biomarkers are generated by tandem MS of intact proteins or proteolytic peptides, obtained after, for instance, microwave-assisted acid hydrolysis. In combination with proteome database searching, individual biomarker proteins are unambiguously identified from their tandem mass spectra, and from there the source microorganism is also identified. Such top-down or bottom-up proteomics approaches permit rapid, sensitive, and confident characterization of individual microorganisms in mixtures and are reviewed here. Examples of MS-based functional assays for detection of targeted microorganisms, e.g., Bacillus anthracis, in environmental or clinically relevant backgrounds are also reviewed.
Collapse
|
134
|
Demongeot J, Moreira A. A possible circular RNA at the origin of life. J Theor Biol 2007; 249:314-24. [PMID: 17825325 DOI: 10.1016/j.jtbi.2007.07.010] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2007] [Revised: 07/04/2007] [Accepted: 07/05/2007] [Indexed: 11/24/2022]
Abstract
The increasing volume of sequenced genomes and the recent techniques for performing in vitro molecular evolution have rekindled the interest for questions on the origin of life. Nevertheless, a gap continues to exist between the research on prebiotic chemistry and molecule generation, on one hand, and the study of molecular fossils preserved in genomes, on the other. Here we attempt to fill this gap by using some assumptions about the prebiotic scenario (including a strong stereochemical basis for the genetic code) to determine the RNA sequences more likely to appear and subsist. A set of minimal RNA rings is exhaustively determined; a subset of them is then selected through stability arguments, and a particular ring ("AL ring") is finally singled out as the most likely winner of this prebiotic game. The rings happen to have several structural and statistical properties of modern genes: a repeated AUG codon appears spontaneously (and is thus made available for becoming a start signal), the form AUG/STOP emerges, and frequency patterns resemble those of present genes. The whole set of rings was also compared to a database of tRNAs, considering the conserved positions (located in the free parts of the molecule, essentially the loops); the ring that most closely matched tRNA sequences-and matched, in fact, the consensus of tRNA at all the aligned positions-was AL, the same ring independently selected before. The unselected emergence of gene-like features through two simple selection steps and the close similarity between the finally selected ring and tRNA (including some remarkable features of the resulting alignment) suggest a possible link between the prebiotic world and the first biological molecules, which is amenable for experimental testing. Even if our scenario is partially wrong, the unlikely coincidences should provide useful hints for other efforts.
Collapse
|
135
|
Shekar M, Karunasagar I, Karunasagar I. Abundance, composition and distribution of simple sequence repeats and dinucleotide compositional bias within WSSV genomes. J Genet 2007; 86:69-73. [PMID: 17656852 DOI: 10.1007/s12041-007-0010-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Malathi Shekar
- Department of Fishery Microbiology, UNESCO Centre for Marine Biotechnology, Karnataka Veterinary, Animal and Fishery Sciences University, College of Fisheries, Mangalore 575 002, India
| | | | | |
Collapse
|
136
|
Chen C, Chen CW. Quantitative analysis of mutation and selection pressures on base composition skews in bacterial chromosomes. BMC Genomics 2007; 8:286. [PMID: 17711583 PMCID: PMC2031905 DOI: 10.1186/1471-2164-8-286] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2007] [Accepted: 08/21/2007] [Indexed: 11/24/2022] Open
Abstract
Background Most bacterial chromosomes exhibit asymmetry of base composition with respect to leading vs. lagging strands (GC and AT skews). These skews reflect mainly those in protein coding sequences, which are driven by asymmetric mutation pressures during replication and transcription (notably asymmetric cytosine deamination) plus subsequent selection for preferred structures, signals, amino acid or codons. The transcription-associated effects but not the replication-associated effects contribute to the overall skews through the uneven distribution of the coding sequences on the leading and lagging strands. Results Analysis of 185 representative bacterial chromosomes showed diverse and characteristic patterns of skews among different clades. The base composition skews in the coding sequences were used to derive quantitatively the effect of replication-driven mutation plus subsequent selection ('replication-associated pressure', RAP), and the effect of transcription-driven mutation plus subsequent selection at translation level ('transcription-associate pressure', TAP). While different clades exhibit distinct patterns of RAP and TAP, RAP is absent or nearly absent in some bacteria, but TAP is present in all. The selection pressure at the translation level is evident in all bacteria based on the analysis of the skews at the three codon positions. Contribution of asymmetric cytosine deamination was found to be weak to TAP in most phyla, and strong to RAP in all the Proteobacteria but weak in most of the Firmicutes. This possibly reflects the differences in their chromosomal replication machineries. A strong negative correlation between TAP and G+C content and between TAP and chromosomal size were also revealed. Conclusion The study reveals the diverse mutation and selection forces associated with replication and transcription in various groups of bacteria that shape the distinct patterns of base composition skews in the chromosomes during evolution. Some closely relative species with distinct base composition parameters are uncovered in this study, which also provides opportunities for comparative bioinformatic and genetic investigations to uncover the underlying principles for mutation and selection.
Collapse
Affiliation(s)
- Chi Chen
- Institute of Biomedical Informatics, National Yang-Ming University, Shih-Pai, Taipei 111, Taiwan
| | - Carton W Chen
- Institute of Biomedical Informatics, National Yang-Ming University, Shih-Pai, Taipei 111, Taiwan
- Department of Life Sciences and Institute of Genome Sciences, National Yang-Ming University, Shih-Pai, Taipei 111, Taiwan
| |
Collapse
|
137
|
Bakkali M. Genome dynamics of short oligonucleotides: the example of bacterial DNA uptake enhancing sequences. PLoS One 2007; 2:e741. [PMID: 17710141 PMCID: PMC1939737 DOI: 10.1371/journal.pone.0000741] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2007] [Accepted: 06/29/2007] [Indexed: 11/19/2022] Open
Abstract
Among the many bacteria naturally competent for transformation by DNA uptake-a phenomenon with significant clinical and financial implications- Pasteurellaceae and Neisseriaceae species preferentially take up DNA containing specific short sequences. The genomic overrepresentation of these DNA uptake enhancing sequences (DUES) causes preferential uptake of conspecific DNA, but the function(s) behind this overrepresentation and its evolution are still a matter for discovery. Here I analyze DUES genome dynamics and evolution and test the validity of the results to other selectively constrained oligonucleotides. I use statistical methods and computer simulations to examine DUESs accumulation in Haemophilus influenzae and Neisseria gonorrhoeae genomes. I analyze DUESs sequence and nucleotide frequencies, as well as those of all their mismatched forms, and prove the dependence of DUESs genomic overrepresentation on their preferential uptake by quantifying and correlating both characteristics. I then argue that mutation, uptake bias, and weak selection against DUESs in less constrained parts of the genome combined are sufficient enough to cause DUESs accumulation in susceptible parts of the genome with no need for other DUES function. The distribution of overrepresentation values across sequences with different mismatch loads compared to the DUES suggests a gradual yet not linear molecular drive of DNA sequences depending on their similarity to the DUES. Other genomically overrepresented sequences, both pro- and eukaryotic, show similar distribution of frequencies suggesting that the molecular drive reported above applies to other frequent oligonucleotides. Rare oligonucleotides, however, seem to be gradually drawn to genomic underrepresentation, thus, suggesting a molecular drag. To my knowledge this work provides the first clear evidence of the gradual evolution of selectively constrained oligonucleotides, including repeated, palindromic and protein/transcription factor-binding DNAs.
Collapse
Affiliation(s)
- Mohammed Bakkali
- Institute of Genetics, Queen's Medical Center, University of Nottingham, Nottingham, United Kingdom.
| |
Collapse
|
138
|
Butler JE, He Q, Nevin KP, He Z, Zhou J, Lovley DR. Genomic and microarray analysis of aromatics degradation in Geobacter metallireducens and comparison to a Geobacter isolate from a contaminated field site. BMC Genomics 2007; 8:180. [PMID: 17578578 PMCID: PMC1924859 DOI: 10.1186/1471-2164-8-180] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2006] [Accepted: 06/19/2007] [Indexed: 12/03/2022] Open
Abstract
Background Groundwater and subsurface environments contaminated with aromatic compounds can be remediated in situ by Geobacter species that couple oxidation of these compounds to reduction of Fe(III)-oxides. Geobacter metallireducens metabolizes many aromatic compounds, but the enzymes involved are not well known. Results The complete G. metallireducens genome contained a 300 kb island predicted to encode enzymes for the degradation of phenol, p-cresol, 4-hydroxybenzaldehyde, 4-hydroxybenzoate, benzyl alcohol, benzaldehyde, and benzoate. Toluene degradation genes were encoded in a separate region. None of these genes was found in closely related species that cannot degrade aromatic compounds. Abundant transposons and phage-like genes in the island suggest mobility, but nucleotide composition and lack of synteny with other species do not suggest a recent transfer. The inferred degradation pathways are similar to those in species that anaerobically oxidize aromatic compounds with nitrate as an electron acceptor. In these pathways the aromatic compounds are converted to benzoyl-CoA and then to 3-hydroxypimelyl-CoA. However, in G. metallireducens there were no genes for the energetically-expensive dearomatizing enzyme. Whole-genome changes in transcript levels were identified in cells oxidizing benzoate. These supported the predicted pathway, identified induced fatty-acid oxidation genes, and identified an apparent shift in the TCA cycle to a putative ATP-yielding succinyl-CoA synthase. Paralogs to several genes in the pathway were also induced, as were several putative molybdo-proteins. Comparison of the aromatics degradation pathway genes to the genome of an isolate from a contaminated field site showed very similar content, and suggested this strain degrades many of the same compounds. This strain also lacked a classical dearomatizing enzyme, but contained two copies of an eight-gene cluster encoding redox proteins that was 30-fold induced during benzoate oxidation. Conclusion G. metallireducens appears to convert aromatic compounds to benzoyl-CoA, then to acetyl-CoA via fatty acid oxidation, and then to carbon dioxide via the TCA cycle. The enzyme responsible for dearomatizing the aromatic ring may be novel, and energetic investments at this step may be offset by a change in succinate metabolism. Analysis of a field isolate suggests that the pathways inferred for G. metallireducens may be applicable to modeling in situ bioremediation.
Collapse
Affiliation(s)
- Jessica E Butler
- Department of Microbiology, University of Massachusetts, Amherst, MA 01003, USA
| | - Qiang He
- Department of Civil and Environmental Engineering, University of Tennessee, Knoxville, TN 37996, USA
| | - Kelly P Nevin
- Department of Microbiology, University of Massachusetts, Amherst, MA 01003, USA
| | - Zhili He
- Environmental Science Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - Jizhong Zhou
- Environmental Science Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - Derek R Lovley
- Department of Microbiology, University of Massachusetts, Amherst, MA 01003, USA
| |
Collapse
|
139
|
Wang HF, Hou WR, Niu DK. Strand compositional asymmetries in vertebrate large genes. Mol Biol Rep 2007; 35:163-9. [PMID: 17420956 DOI: 10.1007/s11033-007-9066-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2006] [Accepted: 02/26/2007] [Indexed: 10/23/2022]
Abstract
Both transcription-associated and replication-associated strand compositional asymmetries have recently been shown in vertebrate genomes. In this paper, we illustrate that transcription-associated strand compositional asymmetries and replication-associated ones coexist in most vertebrate large genes, although in most case the former conceals the latter. Furthermore, we found that the transcription-associated strand compositional asymmetries of housekeeping genes are stronger than those of somatic cell expressed genes. Together with other evidence, we suggest that germline transcription-associated strand asymmetric mutations may be the main cause of the transcription-associated strand compositional asymmetries.
Collapse
Affiliation(s)
- Hai-Fang Wang
- Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | | | | |
Collapse
|
140
|
Thakur V, Azad RK, Ramaswamy R. Markov models of genome segmentation. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2007; 75:011915. [PMID: 17358192 DOI: 10.1103/physreve.75.011915] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/02/2006] [Revised: 06/19/2006] [Indexed: 05/14/2023]
Abstract
We introduce Markov models for segmentation of symbolic sequences, extending a segmentation procedure based on the Jensen-Shannon divergence that has been introduced earlier. Higher-order Markov models are more sensitive to the details of local patterns and in application to genome analysis, this makes it possible to segment a sequence at positions that are biologically meaningful. We show the advantage of higher-order Markov-model-based segmentation procedures in detecting compositional inhomogeneity in chimeric DNA sequences constructed from genomes of diverse species, and in application to the E. coli K12 genome, boundaries of genomic islands, cryptic prophages, and horizontally acquired regions are accurately identified.
Collapse
Affiliation(s)
- Vivek Thakur
- Center for Computational Biology and Bioinformatics, School of Information Technology, Jawaharlal Nehru University, New Delhi 110 067, India
| | | | | |
Collapse
|
141
|
Revisiting the directional mutation pressure theory: The analysis of a particular genomic structure in Leishmania major. Gene 2006; 385:28-40. [DOI: 10.1016/j.gene.2006.04.031] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2005] [Accepted: 04/04/2006] [Indexed: 11/20/2022]
|
142
|
Janga SC, Lamboy WF, Huerta AM, Moreno-Hagelsieb G. The distinctive signatures of promoter regions and operon junctions across prokaryotes. Nucleic Acids Res 2006; 34:3980-7. [PMID: 16914446 PMCID: PMC1557821 DOI: 10.1093/nar/gkl563] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Here we show that regions upstream of first transcribed genes have oligonucleotide signatures that distinguish them from regions upstream of genes in the middle of operons. Databases of experimentally confirmed transcription units do not exist for most genomes. Thus, to expand the analyses into genomes with no experimentally confirmed data, we used genes conserved adjacent in evolutionarily distant genomes as representatives of genes inside operons. Likewise, we used divergently transcribed genes as representative examples of first transcribed genes. In model organisms, the trinucleotide signatures of regions upstream of these representative genes allow for operon predictions with accuracies close to those obtained with known operon data (0.8). Signature-based operon predictions have more similar phylogenetic profiles and higher proportions of genes in the same pathways than predicted transcription unit boundaries (TUBs). These results confirm that we are separating genes with related functions, as expected for operons, from genes not necessarily related, as expected for genes in different transcription units. We also test the quality of the predictions using microarray data in six genomes and show that the signature-predicted operons tend to have high correlations of expression. Oligonucleotide signatures should expand the number of tools available to identify operons even in poorly characterized genomes.
Collapse
Affiliation(s)
- Sarath Chandra Janga
- Department of Biology, Wilfrid Laurier University, 75 University Avenue West, Waterloo, ON, Canada, N2L 3C5.
| | | | | | | |
Collapse
|
143
|
Dehnert M, Helm WE, Hütt MT. Informational structure of two closely related eukaryotic genomes. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2006; 74:021913. [PMID: 17025478 DOI: 10.1103/physreve.74.021913] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2006] [Indexed: 05/12/2023]
Abstract
Attempts to identify a species on the basis of its DNA sequence on purely statistical grounds have been formulated for more than a decade. The most prominent of such genome signatures relies on neighborhood correlations (i.e., dinucleotide frequencies) and, consequently, attributes species identification to mechanisms operating on the dinucleotide level (e.g., neighbor-dependent mutations). For the examples of Mus musculus and Rattus norvegicus we analyze short- and intermediate-range statistical correlations in DNA sequences. These correlation profiles are computed for all chromosomes of the two species. We find that with increasing range of correlations the capacity to distinguish between the species on the basis of this correlation profile is getting better and requires ever shorter sequence segments for obtaining a full species separation. This finding suggests that distinctive traits within the sequence are situated beyond the level of few nucleotides. The large-scale statistical patterning of DNA sequences on which such genome signatures are based is thus substantially determined by mobile elements (e.g., transposons and retrotransposons). The study and interspecies comparison of such correlation profiles can, therefore, reveal features of retrotransposition, segmental duplications, and other processes of genome evolution.
Collapse
Affiliation(s)
- Manuel Dehnert
- Computational Systems Biology, School of Engineering and Science, International University Bremen, Campus Ring 1, D-28759 Bremen, Germany
| | | | | |
Collapse
|
144
|
Chang CH, Hsieh LC, Chen TY, Chen HD, Luo L, Lee HC. Shannon information in complete genomes. PROCEEDINGS. IEEE COMPUTATIONAL SYSTEMS BIOINFORMATICS CONFERENCE 2006:20-30. [PMID: 16447996 DOI: 10.1109/csb.2004.1332413] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Shannon information in the genomes of all completely sequenced prokaryotes and eukaryotes are measured in word lengths of two to ten letters. It is found that in a scale-dependent way, the Shannon information in complete genomes are much greater than that in matching random sequences - thousands of times greater in the case of short words. Furthermore, with the exception of the 14 chromosomes of Plasmodium falciparum, the Shannon information in all available complete genomes belong to a universality class given by an extremely simple formula. The data are consistent with a model for genome growth composed of two main ingredients: random segmental duplications that increase the Shannon information in a scale-independent way, and random point mutations that preferentially reduces the larger-scale Shannon information. The inference drawn from the present study is that the large-scale and coarse-grained growth of genomes was selectively neutral and this suggests an independent corroboration of Kimura's neutral theory of evolution.
Collapse
Affiliation(s)
- Chang-Heng Chang
- Department of Physics, National Central University, Chungli, Taiwan, ROC
| | | | | | | | | | | |
Collapse
|
145
|
Chen LL. Identification of genomic islands in six plant pathogens. Gene 2006; 374:134-41. [PMID: 16581205 DOI: 10.1016/j.gene.2006.01.029] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2005] [Revised: 12/30/2005] [Accepted: 01/24/2006] [Indexed: 10/24/2022]
Abstract
Genomic islands (GIs) play important roles in microbial evolution, which are acquired by horizontal gene transfer. In this paper, the GIs of six completely sequenced plant pathogens are identified using a windowless method based on Z curve representation of DNA sequences. Consequently, four, eight, four, one, two and four GIs are recognized with the length greater than 20-Kb in plant pathogens Agrobacterium tumefaciens str. C58, Rolstonia solanacearum GMI1000, Xanthomonas axonopodis pv. citri str. 306 (Xac), Xanthomonas campestris pv. campestris str. ATCC33913 (Xcc), Xylella fastidiosa 9a5c and Pseudomonas syringae pv. tomato str. DC3000, respectively. Most of these regions share a set of conserved features of GIs, including an abrupt change in GC content compared with that of the rest of the genome, the existence of integrase genes at the junction, the use of tRNA as the integration sites, the presence of genetic mobility genes, the difference of codon usage, codon preference and amino acid usage, etc. The identification of these GIs will benefit the research for the six important phytopathogens.
Collapse
Affiliation(s)
- Ling-Ling Chen
- Shandong Provincial Research Center for Bioinformatic Engineering and Technique, Center for Advanced Study, Shandong University of Technology, Zibo 255049, PR China.
| |
Collapse
|
146
|
Foerstner KU, von Mering C, Bork P. Comparative analysis of environmental sequences: potential and challenges. Philos Trans R Soc Lond B Biol Sci 2006; 361:519-23. [PMID: 16524840 PMCID: PMC1609345 DOI: 10.1098/rstb.2005.1809] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Environmental sequencing, also dubbed metagenomics, is increasingly being used to obtain insights into organismal communities in diverse habitats, and has a variety of potential applications foreseeable in biotechnology and medicine. The first public large-scale data provide already a wealth of information hidden in vast amounts of fragmented pieces of DNA from unknown species residing in these environments. Comparative sequence analysis is essential for the interpretation of such data. However, different layers of complexity that are intrinsic to each sample require the establishment of some baselines for comparison: how to normalize for the differences in phylogenetic and functional diversity, how to avoid biases from incomplete data, and how to deal with differences in species dominance or genome sizes? Here we discuss a few of these items and delineate some simple discriminative sequence properties for four distinct habitats.
Collapse
Affiliation(s)
- Konrad U Foerstner
- European Molecular Biology LaboratoryMeyerhofstrasse 1, Heidelberg 69117, Germany
| | - Christian von Mering
- European Molecular Biology LaboratoryMeyerhofstrasse 1, Heidelberg 69117, Germany
| | - Peer Bork
- European Molecular Biology LaboratoryMeyerhofstrasse 1, Heidelberg 69117, Germany
- Max-Delbrück-Centre for Molecular MedicineRobert-Rössle-Strasse 10, Berlin 13092, Germany
- Author for correspondence ()
| |
Collapse
|
147
|
Gao B, Paramanathan R, Gupta RS. Signature proteins that are distinctive characteristics of Actinobacteria and their subgroups. Antonie van Leeuwenhoek 2006; 90:69-91. [PMID: 16670965 DOI: 10.1007/s10482-006-9061-2] [Citation(s) in RCA: 83] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/28/2005] [Accepted: 01/20/2006] [Indexed: 10/24/2022]
Abstract
The Actinobacteria constitute one of the main phyla of Bacteria. Presently, no morphological and very few molecular characteristics are known which can distinguish species of this highly diverse group. In this work, we have analyzed the genomes of four actinobacteria (viz. Mycobacterium leprae TN, Leifsonia xyli subsp. xyli str. CTCB07, Bifidobacterium longum NCC2705 and Thermobifida fusca YX) to search for proteins that are unique to Actinobacteria. Our analyses have identified 233 actinobacteria-specific proteins, homologues of which are generally not present in any other bacteria. These proteins can be grouped as follows: (i) 29 proteins uniquely present in most sequenced actinobacterial genomes; (ii) 6 proteins present in almost all actinobacteria except Bifidobacterium longum and another 37 proteins absent in B. longum and few other species; (iii) 11 proteins which are mainly present in Corynebacterium, Mycobacterium and Nocardia (CMN) subgroup as well as Streptomyces, T. fusca and Frankia sp., but they are not found in Bifidobacterium and Micrococcineae; (iv) 8 proteins that are specific for T. fusca and Streptomyces species, plus 2 proteins also present in the Frankia species; (v) 13 proteins that are specific for the Corynebacterineae or the CMN group; (vi) 14 proteins only found in Mycobacterium and Nocardia; (vii) 24 proteins unique to different Mycobacterium species; (viii) 8 proteins specific to the Micrococcineae; (ix) 85 proteins which are distributed sporadically in actinobacterial species. Additionally, many examples of lateral gene transfer from Actinobacteria to Magnetospirillum magnetotacticum have also been identified. The identified proteins provide novel molecular means for defining and circumscribing the Actinobacteria phylum and a number of subgroups within it. The distribution of these proteins also provides useful information regarding interrelationships among the actinobacterial subgroups. Most of these proteins are of unknown function and studies aimed at understanding their cellular functions should reveal common biochemical and physiological characteristics unique to either all actinobacteria or particular subgroups of them. The identified proteins also provide potential targets for development of drugs that are specific for actinobacteria.
Collapse
Affiliation(s)
- Beile Gao
- Department of Biochemistry and Biomedical Science, McMaster University, L8N3Z5, Hamilton, Canada
| | | | | |
Collapse
|
148
|
Hou WR, Wang HF, Niu DK. Replication-associated strand asymmetries in vertebrate genomes and implications for replicon size, DNA replication origin, and termination. Biochem Biophys Res Commun 2006; 344:1258-62. [PMID: 16650814 DOI: 10.1016/j.bbrc.2006.04.039] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2006] [Accepted: 04/17/2006] [Indexed: 11/16/2022]
Abstract
Strand compositional asymmetry has been observed in prokaryotes and used in predicting prokaryotic DNA replication origins and termini. However, it was not found in eukaryotic genomes by the same methods. We propose that transcription-associated strand asymmetries mask the replication-associated ones. By analyzing the nucleotide composition of intergenic sequences larger than 50 kb by cumulative skew diagrams (CSD), we found replication-associated strand asymmetry in vertebrate genomes. Furthermore, we found that the most common replicon sizes in vertebrates are 50-100 kb, and show evidence that the replication origin and termination regions of vertebrate genomes range from a discrete site to a broad zone.
Collapse
Affiliation(s)
- Wen-Ru Hou
- Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing 100875, China
| | | | | |
Collapse
|
149
|
Mrázek J. Analysis of distribution indicates diverse functions of simple sequence repeats in Mycoplasma genomes. Mol Biol Evol 2006; 23:1370-85. [PMID: 16618962 DOI: 10.1093/molbev/msk023] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Simple sequence repeats (SSRs) composed of extensive tandem iterations of a single nucleotide or a short oligonucleotide are rare in most bacterial genomes, but they are common among Mycoplasma. Some of these repeats act as contingency loci in association with families of surface antigens. By contraction or expansion during replication, these SSRs increase genetic variance of the population and facilitate avoidance of the immune response of the host. Occurrence and distribution of SSRs are analyzed in complete genomes of 11 Mycoplasma and 3 related Mollicutes in order to gain insights into functional and evolutionary diversity of the SSRs in Mycoplasma. The results revealed an unexpected variety of SSRs with respect to their distribution and composition and suggest that it is unlikely that all SSRs function as contingency loci or recombination hot spots. Various types of SSRs are most abundant in Mycoplasma hyopneumoniae, whereas Mycoplasma penetrans, Mycoplasma mobile, and Mycoplasma synoviae do not contain unusually long SSRs. Mycoplasma hyopneumoniae and Mycoplasma pulmonis feature abundant short adenine and thymine runs periodically spaced at 11 and 12 bp, respectively, which likely affect the supercoiling propensities of the DNA molecule. Physiological roles of long adenine and thymine runs in M. hyopneumoniae appear independent of location upstream or downstream of genes, unlike contingency loci that are typically located in protein-coding regions or upstream regulatory regions. Comparisons among 3 M. hyopneumoniae strains suggest that the adenine and thymine runs are rarely involved in genome rearrangements. The results indicate that the SSRs in the Mycoplasma genomes play diverse roles, including modulating gene expression as contingency loci, facilitating genome rearrangements via recombination, affecting protein structure and possibly protein-protein interactions, and contributing to the organization of the DNA molecule in the cell.
Collapse
Affiliation(s)
- Jan Mrázek
- Department of Microbiology and Institute of Bioinformatics, University of Georgia, USA.
| |
Collapse
|
150
|
Bukovska G, Klucar L, Vlcek C, Adamovic J, Turna J, Timko J. Complete nucleotide sequence and genome analysis of bacteriophage BFK20 — A lytic phage of the industrial producer Brevibacterium flavum. Virology 2006; 348:57-71. [PMID: 16457869 DOI: 10.1016/j.virol.2005.12.010] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2005] [Revised: 11/14/2005] [Accepted: 12/11/2005] [Indexed: 10/25/2022]
Abstract
The entire double-stranded DNA genome of bacteriophage BFK20, a lytic phage of the Brevibacterium flavum CCM 251--industrial producer of L-lysine--was sequenced and analyzed. It consists of 42,968 base pairs with an overall molar G + C content of 56.2%. Fifty-five potential open reading frames were identified and annotated using various bioinformatics tools. Clusters of functionally related putative genes were defined (structural, lytic, replication and regulatory). To verify the annotation of structural proteins, they were resolved by 2D gel electrophoresis and were submitted to N-terminal amino acid sequencing. Structural proteins identified included the portal and major and minor tail proteins. Based on the overall genome sequence comparison, similarities with other known bacteriophage genomes include primarily bacteriophages from Mycobacterium spp. and some regions of Corynebacterium spp. genomes--possible prophages. Our results support the theory that phage genomes are mosaics with respect to each other.
Collapse
Affiliation(s)
- Gabriela Bukovska
- Institute of Molecular Biology, Centre of Excellence for Molecular Medicine, Slovak Academy of Sciences, Dubravska cesta 21, 845 51 Bratislava, Slovakia.
| | | | | | | | | | | |
Collapse
|