1
|
Veldsman WP, Yang C, Zhang Z, Huang Y, Chowdhury D, Zhang L. Structural and Functional Disparities within the Human Gut Virome in Terms of Genome Topology and Representative Genome Selection. Viruses 2024; 16:134. [PMID: 38257834 PMCID: PMC10820185 DOI: 10.3390/v16010134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 01/12/2024] [Accepted: 01/16/2024] [Indexed: 01/24/2024] Open
Abstract
Circularity confers protection to viral genomes where linearity falls short, thereby fulfilling the form follows function aphorism. However, a shift away from morphology-based classification toward the molecular and ecological classification of viruses is currently underway within the field of virology. Recent years have seen drastic changes in the International Committee on Taxonomy of Viruses' operational definitions of viruses, particularly for the tailed phages that inhabit the human gut. After the abolition of the order Caudovirales, these tailed phages are best defined as members of the class Caudoviricetes. To determine the epistemological value of genome topology in the context of the human gut virome, we designed a set of seven experiments to assay the impact of genome topology and representative viral selection on biological interpretation. Using Oxford Nanopore long reads for viral genome assembly coupled with Illumina short-read polishing, we showed that circular and linear virus genomes differ remarkably in terms of genome quality, GC skew, transfer RNA gene frequency, structural variant frequency, cross-reference functional annotation (COG, KEGG, Pfam, and TIGRfam), state-of-the-art marker-based classification, and phage-host interaction. Furthermore, the disparity profile changes during dereplication. In particular, our phage-host interaction results demonstrated that proportional abundances cannot be meaningfully compared without due regard for genome topology and dereplication threshold, which necessitates the need for standardized reporting. As a best practice guideline, we recommend that comparative studies of the human gut virome always report the ratio of circular to linear viral genomes along with the dereplication threshold so that structural and functional metrics can be placed into context when assessing biologically relevant metagenomic properties such as proportional abundance.
Collapse
Affiliation(s)
- Werner P. Veldsman
- Department of Computer Science, Hong Kong Baptist University, Kowloon, Hong Kong SAR, China; (W.P.V.); (C.Y.); (Z.Z.)
| | - Chao Yang
- Department of Computer Science, Hong Kong Baptist University, Kowloon, Hong Kong SAR, China; (W.P.V.); (C.Y.); (Z.Z.)
| | - Zhenmiao Zhang
- Department of Computer Science, Hong Kong Baptist University, Kowloon, Hong Kong SAR, China; (W.P.V.); (C.Y.); (Z.Z.)
| | | | - Debajyoti Chowdhury
- School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, China;
- Computational Medicine Laboratory, Hong Kong Baptist University, Hong Kong SAR, China
| | - Lu Zhang
- Department of Computer Science, Hong Kong Baptist University, Kowloon, Hong Kong SAR, China; (W.P.V.); (C.Y.); (Z.Z.)
- Institute for Research and Continuing Education, Hong Kong Baptist University, Shenzhen 518057, China
| |
Collapse
|
2
|
Cornman RS. Data mining reveals tissue-specific expression and host lineage-associated forms of Apis mellifera filamentous virus. PeerJ 2023; 11:e16455. [PMID: 38025724 PMCID: PMC10655722 DOI: 10.7717/peerj.16455] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 10/23/2023] [Indexed: 12/01/2023] Open
Abstract
Background Apis mellifera filamentous virus (AmFV) is a large double-stranded DNA virus of uncertain phylogenetic position that infects honey bees (Apis mellifera). Little is known about AmFV evolution or molecular aspects of infection. Accurate annotation of open-reading frames (ORFs) is challenged by weak homology to other known viruses. This study was undertaken to evaluate ORFs (including coding-frame conservation, codon bias, and purifying selection), quantify genetic variation within AmFV, identify host characteristics that covary with infection rate, and examine viral expression patterns in different tissues. Methods Short-read data were accessed from the Sequence Read Archive (SRA) of the National Center for Biotechnology Information (NCBI). Sequence reads were downloaded from accessions meeting search criteria and scanned for kmers representative of AmFV genomic sequence. Samples with kmer counts above specified thresholds were downloaded in full for mapping to reference sequences and de novo assembly. Results At least three distinct evolutionary lineages of AmFV exist. Clade 1 predominates in Europe but in the Americas and Africa it is replaced by the other clades as infection level increases in hosts. Only clade 3 was found at high relative abundance in hosts with African ancestry, whereas all clades achieved high relative abundance in bees of non-African ancestry. In Europe and Africa, clade 2 was generally detected only in low-level infections but was locally dominant in some North American samples. The geographic distribution of clade 3 was consistent with an introduction to the Americas with 'Africanized' honey bees in the 1950s. Localized genomic regions of very high nucleotide divergence in individual isolates suggest recombination with additional, as-yet unidentified AmFV lineages. A set of 155 high-confidence ORFs was annotated based on evolutionary conservation in six AmFV genome sequences representative of the three clades. Pairwise protein-level identity averaged 94.6% across ORFs (range 77.1-100%), which generally exhibited low evolutionary rates and moderate to strong codon bias. However, no robust example of positive diversifying selection on coding sequence was found in these alignments. Most of the genome was detected in RNA short-read alignments. Transcriptome assembly often yielded contigs in excess of 50 kb and containing ORFs in both orientations, and the termini of long transcripts were associated with tandem repeats. Lower levels of AmFV RNA were detected in brain tissue compared to abdominal tissue, and a distinct set of ORFs had minimal to no detectable expression in brain tissue. A scan of DNA accessions from the parasitic mite Varroa destructor was inconclusive with respect to replication in that species. Discussion Collectively, these results expand our understanding of this enigmatic virus, revealing transcriptional complexity and co-evolutionary associations with host lineage.
Collapse
|
3
|
Sianga-Mete R, Hartnady P, Mandikumba WC, Rutherford K, Currin CB, Phelanyane F, Stefan S, Kosakovsky Pond SL, Martin DP. Viral genome sequence datasets display pervasive evidence of strand-specific substitution biases that are best described using non-reversible nucleotide substitution models. Res Sq 2022:rs.3.rs-2407778. [PMID: 36597548 PMCID: PMC9810213 DOI: 10.21203/rs.3.rs-2407778/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Background The vast majority of phylogenetic trees are inferred from molecular sequence data (nucleotides or amino acids) using time-reversible evolutionary models which assume that, for any pair of nucleotide or amino acid characters, the relative rate of X to Y substitution is the same as the relative rate of Y to X substitution. However, this reversibility assumption is unlikely to accurately reflect the actual underlying biochemical and/or evolutionary processes that lead to the fixation of substitutions. Here, we use empirical viral genome sequence data to reveal that evolutionary non-reversibility is pervasive among most groups of viruses. Specifically, we consider two non-reversible nucleotide substitution models: (1) a 6-rate non-reversible model (NREV6) in which Watson-Crick complementary substitutions occur at identical relative rates and which might therefor be most applicable to analyzing the evolution of genomes where both complementary strands are subject to the same mutational processes (such as might be expected for double-stranded (ds) RNA or dsDNA genomes); and (2) a 12-rate non-reversible model (NREV12) in which all relative substitution types are free to occur at different rates and which might therefore be applicable to analyzing the evolution of genomes where the complementary genome strands are subject to different mutational processes (such as might be expected for viruses with single-stranded (ss) RNA or ssDNA genomes). Results Using likelihood ratio and Akaike Information Criterion-based model tests, we show that, surprisingly, NREV12 provided a significantly better fit to 21/31 dsRNA and 20/30 dsDNA datasets than did the general time reversible (GTR) and NREV6 models with NREV6 providing a better fit than NREV12 and GTR in only 5/30 dsDNA and 2/31 dsRNA datasets. As expected, NREV12 provided a significantly better fit to 24/33 ssDNA and 40/47 ssRNA datasets. Next, we used simulations to show that increasing degrees of strand-specific substitution bias decrease the accuracy of phylogenetic inference irrespective of whether GTR or NREV12 is used to describe mutational processes. However, in cases where strand-specific substitution biases are extreme (such as in SARS-CoV-2 and Torque teno sus virus datasets) NREV12 tends to yield more accurate phylogenetic trees than those obtained using GTR. Conclusion We show that NREV12 should, be seriously considered during the model selection phase of phylogenetic analyses involving viral genomic sequences.
Collapse
|
4
|
Almirantis Y, Provata A, Li W. Noether's Theorem as a Metaphor for Chargaff's 2nd Parity Rule in Genomics. J Mol Evol 2022; 90:231-238. [PMID: 35704064 DOI: 10.1007/s00239-022-10062-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Accepted: 05/18/2022] [Indexed: 10/18/2022]
Abstract
In the present note, the genomic compositional rule largely known as 'Chargaff's 2nd parity rule' (asserting equimolarity between Adenine-Thymine and Guanine-Cytosine in any of the two DNA strands) is regarded in association with Noether's theorem linking symmetries with conservation laws in physics. In the case of the genome, the strict physical and mathematical prerequisites of Noether's theorem do not hold. However, we conclude that a metaphor can be established with Noether's theorem, as inter-strand symmetry concerning DNA functionality engenders specific features in genome composition. Inversely, when inter-strand symmetry does not hold, the corresponding quantitative relations fail to appear. This association is also considered from the point of view of the existence of emergent laws and properties in evolutionary genomics.
Collapse
Affiliation(s)
- Yannis Almirantis
- Theoretical Biology and Computational Genomics Laboratory, Institute of Bioscience and Applications, National Center for Scientific Research "Demokritos", 15341, Athens, Greece.
| | - Astero Provata
- Statistical Mechanics and Dynamical Systems Laboratory, Institute of Nanoscience and Nanotechnology, National Center for Scientific Research, "Demokritos", 15341, Athens, Greece
| | - Wentian Li
- The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, USA
| |
Collapse
|
5
|
Georgakopoulos-Soares I, Mouratidis I, Parada GE, Matharu N, Hemberg M, Ahituv N. Asymmetron: a toolkit for the identification of strand asymmetry patterns in biological sequences. Nucleic Acids Res 2021; 49:e4. [PMID: 33211865 PMCID: PMC7797064 DOI: 10.1093/nar/gkaa1052] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Revised: 10/15/2020] [Accepted: 10/20/2020] [Indexed: 11/23/2022] Open
Abstract
DNA strand asymmetries can have a major effect on several biological functions, including replication, transcription and transcription factor binding. As such, DNA strand asymmetries and mutational strand bias can provide information about biological function. However, a versatile tool to explore this does not exist. Here, we present Asymmetron, a user-friendly computational tool that performs statistical analysis and visualizations for the evaluation of strand asymmetries. Asymmetron takes as input DNA features provided with strand annotation and outputs strand asymmetries for consecutive occurrences of a single DNA feature or between pairs of features. We illustrate the use of Asymmetron by identifying transcriptional and replicative strand asymmetries of germline structural variant breakpoints. We also show that the orientation of the binding sites of 45% of human transcription factors analyzed have a significant DNA strand bias in transcribed regions, that is also corroborated in ChIP-seq analyses, and is likely associated with transcription. In summary, we provide a novel tool to assess DNA strand asymmetries and show how it can be used to derive new insights across a variety of biological disciplines.
Collapse
Affiliation(s)
- Ilias Georgakopoulos-Soares
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| | - Ioannis Mouratidis
- Aristotle University of Thessaloniki, Department of Mathematics, Thessaloniki, GR, Greece
| | - Guillermo E Parada
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
- Wellcome Trust Cancer Research UK Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge CB2 1QN, UK
| | - Navneet Matharu
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
- Innovative Genomics Institute, University of California San Francisco, San Francisco, CA, USA
| | - Martin Hemberg
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
- Wellcome Trust Cancer Research UK Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge CB2 1QN, UK
| | - Nadav Ahituv
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| |
Collapse
|
6
|
Demongeot J, Seligmann H. Deamination gradients within codons after 1<->2 position swap predict amino acid hydrophobicity and parallel β-sheet conformational preference. Biosystems 2020; 191-192:104116. [PMID: 32081715 DOI: 10.1016/j.biosystems.2020.104116] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2019] [Revised: 12/04/2019] [Accepted: 02/10/2020] [Indexed: 12/16/2022]
Abstract
Deaminations C->T and A->G are frequent mutations producing nucleotide content gradients across genomes proportional to singlestrandedness during replication/transcription. Hence, within single codons, deamination risks increase from first to third codon positions, while second codon positions are functionally most crucial. Here genetic codes are analyzed assuming that after anticodons protected codons from deaminations, first and second codon positions swapped (N2N1N3->N1N2N3), with lowest deamination risks for N2 in presumed primitive N2N1N3 codons. N2N1N3, not standard N1N2N3, codon structure minimizes deaminations inversely proportionally to cognate amino acid hydrophobicity and parallel betasheet conformational preference. For N1N2N3, deamination minimization increases with genetic code integration order of cognate amino acids: during the presumed N2N1N3->N1N2N3 codon structure transition, protein synthesis combined direct codon-amino acid interactions for late amino acids and tRNA-based translation for early amino acids. Hence N2N1N3 codons would correspond to tRNA-free translation by spontaneous codon-amino acid affinities, and tRNA-mediated translation presumably caused N2N1N3->N1N2N3 swaps. Results show that rational, not arbitrary rules link codon and amino acid structures. Some analyses detect mitochondrial RNAs and peptides in public data corresponding to systematic position swaps, suggesting occasional swapping polymerase activity.
Collapse
Affiliation(s)
- Jacques Demongeot
- Université Grenoble Alpes, Faculty of Medicine, Laboratory AGEIS EA 7407, Team Tools for e-Gnosis Medical, F-38700, La Tronche, France.
| | - Hervé Seligmann
- Université Grenoble Alpes, Faculty of Medicine, Laboratory AGEIS EA 7407, Team Tools for e-Gnosis Medical, F-38700, La Tronche, France; The National Natural History Collections, The Hebrew University of Jerusalem, 91404, Jerusalem, Israel.
| |
Collapse
|
7
|
Demongeot J, Seligmann H. Theoretical minimal RNA rings designed according to coding constraints mimic deamination gradients. Sci Nat 2019; 106:44. [DOI: 10.1007/s00114-019-1638-5] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2018] [Revised: 06/18/2019] [Accepted: 06/19/2019] [Indexed: 11/27/2022]
|
8
|
Akhter S, Aziz RK, Kashef MT, Ibrahim ES, Bailey B, Edwards RA. Kullback Leibler divergence in complete bacterial and phage genomes. PeerJ 2017; 5:e4026. [PMID: 29204318 PMCID: PMC5712468 DOI: 10.7717/peerj.4026] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 10/22/2017] [Indexed: 12/11/2022] Open
Abstract
The amino acid content of the proteins encoded by a genome may predict the coding potential of that genome and may reflect lifestyle restrictions of the organism. Here, we calculated the Kullback–Leibler divergence from the mean amino acid content as a metric to compare the amino acid composition for a large set of bacterial and phage genome sequences. Using these data, we demonstrate that (i) there is a significant difference between amino acid utilization in different phylogenetic groups of bacteria and phages; (ii) many of the bacteria with the most skewed amino acid utilization profiles, or the bacteria that host phages with the most skewed profiles, are endosymbionts or parasites; (iii) the skews in the distribution are not restricted to certain metabolic processes but are common across all bacterial genomic subsystems; (iv) amino acid utilization profiles strongly correlate with GC content in bacterial genomes but very weakly correlate with the G+C percent in phage genomes. These findings might be exploited to distinguish coding from non-coding sequences in large data sets, such as metagenomic sequence libraries, to help in prioritizing subsequent analyses.
Collapse
Affiliation(s)
- Sajia Akhter
- Computational Science Research Center, San Diego State University, San Diego, CA, USA
| | - Ramy K Aziz
- Department of Microbiology and Immunology, Faculty of Pharmacy, Cairo University, Cairo, Egypt.,Department of Computer Science, San Diego State University, San Diego, CA, United States of America
| | - Mona T Kashef
- Department of Microbiology and Immunology, Faculty of Pharmacy, Cairo University, Cairo, Egypt
| | - Eslam S Ibrahim
- Department of Microbiology and Immunology, Faculty of Pharmacy, Cairo University, Cairo, Egypt
| | - Barbara Bailey
- Department of Mathematics & Statistics, San Diego State University, San Diego, CA, USA
| | - Robert A Edwards
- Computational Science Research Center, San Diego State University, San Diego, CA, USA.,Department of Computer Science, San Diego State University, San Diego, CA, United States of America.,Department of Mathematics & Statistics, San Diego State University, San Diego, CA, USA.,Department of Biology, San Diego State University, San Diego, CA, USA
| |
Collapse
|
9
|
Skliros D, Kalatzis PG, Katharios P, Flemetakis E. Comparative Functional Genomic Analysis of Two Vibrio Phages Reveals Complex Metabolic Interactions with the Host Cell. Front Microbiol 2016; 7:1807. [PMID: 27895630 PMCID: PMC5107563 DOI: 10.3389/fmicb.2016.01807] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2016] [Accepted: 10/27/2016] [Indexed: 01/21/2023] Open
Abstract
Sequencing and annotation was performed for two large double stranded DNA bacteriophages, φGrn1 and φSt2 of the Myoviridae family, considered to be of great interest for phage therapy against Vibrios in aquaculture live feeds. In addition, phage–host metabolic interactions and exploitation was studied by transcript profiling of selected viral and host genes. Comparative genomic analysis with other large Vibrio phages was also performed to establish the presence and location of homing endonucleases highlighting distinct features for both phages. Phylogenetic analysis revealed that they belong to the “schizoT4like” clade. Although many reports of newly sequenced viruses have provided a large set of information, basic research related to the shift of the bacterial metabolism during infection remains stagnant. The function of many viral protein products in the process of infection is still unknown. Genome annotation identified the presence of several viral open reading frames (ORFs) participating in metabolism, including a Sir2/cobB (sirtuin) protein and a number of genes involved in auxiliary NAD+ and nucleotide biosynthesis, necessary for phage DNA replication. Key genes were subsequently selected for detail study of their expression levels during infection. This work suggests a complex metabolic interaction and exploitation of the host metabolic pathways and biochemical processes, including a possible post-translational protein modification, by the virus during infection.
Collapse
Affiliation(s)
- Dimitrios Skliros
- Laboratory of Molecular Biology, Department of Biotechnology, School of Food, Biotechnology and Development, Agricultural University of Athens Athens, Greece
| | - Panos G Kalatzis
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, HeraklionCrete, Greece; Marine Biological Section, University of CopenhagenHelsingør, Denmark
| | - Pantelis Katharios
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion Crete, Greece
| | - Emmanouil Flemetakis
- Laboratory of Molecular Biology, Department of Biotechnology, School of Food, Biotechnology and Development, Agricultural University of Athens Athens, Greece
| |
Collapse
|
10
|
Tatarinova TV, Chekalin E, Nikolsky Y, Bruskin S, Chebotarov D, McNally KL, Alexandrov N. Nucleotide diversity analysis highlights functionally important genomic regions. Sci Rep 2016; 6:35730. [PMID: 27774999 PMCID: PMC5075931 DOI: 10.1038/srep35730] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Accepted: 09/30/2016] [Indexed: 12/15/2022] Open
Abstract
We analyzed functionality and relative distribution of genetic variants across the complete Oryza sativa genome, using the 40 million single nucleotide polymorphisms (SNPs) dataset from the 3,000 Rice Genomes Project (http://snp-seek.irri.org), the largest and highest density SNP collection for any higher plant. We have shown that the DNA-binding transcription factors (TFs) are the most conserved group of genes, whereas kinases and membrane-localized transporters are the most variable ones. TFs may be conserved because they belong to some of the most connected regulatory hubs that modulate transcription of vast downstream gene networks, whereas signaling kinases and transporters need to adapt rapidly to changing environmental conditions. In general, the observed profound patterns of nucleotide variability reveal functionally important genomic regions. As expected, nucleotide diversity is much higher in intergenic regions than within gene bodies (regions spanning gene models), and protein-coding sequences are more conserved than untranslated gene regions. We have observed a sharp decline in nucleotide diversity that begins at about 250 nucleotides upstream of the transcription start and reaches minimal diversity exactly at the transcription start. We found the transcription termination sites to have remarkably symmetrical patterns of SNP density, implying presence of functional sites near transcription termination. Also, nucleotide diversity was significantly lower near 3′ UTRs, the area rich with regulatory regions.
Collapse
Affiliation(s)
- Tatiana V Tatarinova
- Center for Personalized Medicine and Spatial Sciences Institute, University of Southern California, Los Angeles, CA, USA.,Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russian Federation
| | | | - Yuri Nikolsky
- Vavilov Institute of General Genetics, Moscow, Russia.,F1 Genomics, San Diego, CA, USA.,School of Systems Biology, George Mason University, VA, USA
| | | | - Dmitry Chebotarov
- International Rice Research Institute, Los Baños, Laguna 4031, Philippines
| | - Kenneth L McNally
- International Rice Research Institute, Los Baños, Laguna 4031, Philippines
| | | |
Collapse
|
11
|
Aljarbou AN, Aljofan M. Genotyping, morphology and molecular characteristics of a lytic phage of Neisseria strain obtained from infected human dental plaque. J Microbiol 2014; 52:609-18. [PMID: 24879345 DOI: 10.1007/s12275-014-3380-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2013] [Revised: 03/03/2014] [Accepted: 03/12/2014] [Indexed: 11/26/2022]
Abstract
The lytic bacteriaphage (phage) A2 was isolated from human dental plaques along with its bacterial host. The virus was found to have an icosahedron-shaped head (60±3 nm), a sheathed and rigid long tail (∼175 nm) and was categorized into the family Siphoviridae of the order Caudovirales, which are dsDNA viral family, characterised by their ability to infect bacteria and are nonenveloped with a noncontractile tail. The isolated phage contained a linear dsDNA genome having 31,703 base pairs of unique sequence, which were sorted into three contigs and 12 single sequences. A latent period of 25 minutes and burst size of 24±2 particles was determined for the virus. Bioinformatics approaches were used to identify ORFs in the genome. A phylogenetic analysis confirmed the species inter-relationship and its placement in the family.
Collapse
Affiliation(s)
- Ahmed N Aljarbou
- Department of Pharmaceutics, College of Pharmacy, Qassim University, Qassim, Saudi Arabia,
| | | |
Collapse
|
12
|
Sykilinda NN, Bondar AA, Gorshkova AS, Kurochkina LP, Kulikov EE, Shneider MM, Kadykov VA, Solovjeva NV, Kabilov MR, Mesyanzhinov VV, Vlassov VV, Drukker VV, Miroshnikov KA. Complete Genome Sequence of the Novel Giant Pseudomonas Phage PaBG. Genome Announc 2014; 2:e00929-13. [PMID: 24407628 PMCID: PMC3886941 DOI: 10.1128/genomea.00929-13] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 10/16/2013] [Accepted: 12/07/2013] [Indexed: 11/20/2022]
Abstract
The novel giant Pseudomonas aeruginosa bacteriophage PaBG was isolated from a water sample of the ultrafreshwater Lake Baikal. We report the complete genome sequence of this Myoviridae bacteriophage, comprising 258,139 bp of double-stranded DNA containing 308 predicted open reading frames.
Collapse
Affiliation(s)
- Nina N. Sykilinda
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry RAS, Moscow, Russia
| | - Alexander A. Bondar
- Genomics Core Facility, Institute of Chemical Biology and Fundamental Medicine SB RAS, Novosibirsk, Russia
| | | | | | | | | | - Vassily A. Kadykov
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry RAS, Moscow, Russia
| | | | - Marsel R. Kabilov
- Genomics Core Facility, Institute of Chemical Biology and Fundamental Medicine SB RAS, Novosibirsk, Russia
| | | | | | | | | |
Collapse
|
13
|
Abstract
The most bacteria-like mitochondrial genome known is that of the jakobid flagellate Reclinomonas americana NZ. This genome also encodes the largest known gene set among mitochondrial DNAs (mtDNAs), including the RNA subunit of RNase P (transfer RNA processing), a reduced form of transfer-messenger RNA (translational control), and a four-subunit bacteria-like RNA polymerase, which in other eukaryotes is substituted by a nucleus-encoded, single-subunit, phage-like enzyme. Further, protein-coding genes are preceded by potential Shine-Dalgarno translation initiation motifs. Whether similarly ancestral mitochondrial characters also exist in relatives of R. americana NZ is unknown. Here, we report a comparative analysis of nine mtDNAs from five distant jakobid genera: Andalucia, Histiona, Jakoba, Reclinomonas, and Seculamonas. We find that Andalucia godoyi has an even larger mtDNA gene complement than R. americana NZ. The extra genes are rpl35 (a large subunit mitoribosomal protein) and cox15 (involved in cytochrome oxidase assembly), which are nucleus encoded throughout other eukaryotes. Andalucia cox15 is strikingly similar to its homolog in the free-living α-proteobacterium Tistrella mobilis. Similarly, a long, highly conserved gene cluster in jakobid mtDNAs, which is a clear vestige of prokaryotic operons, displays a gene order more closely resembling that in free-living α-proteobacteria than in Rickettsiales species. Although jakobid mtDNAs, overall, are characterized by bacteria-like features, they also display a few remarkably divergent characters, such as 3'-tRNA editing in Seculamonas ecuadoriensis and genome linearization in Jakoba libera. Phylogenetic analysis with mtDNA-encoded proteins strongly supports monophyly of jakobids with Andalucia as the deepest divergence. However, it remains unclear which α-proteobacterial group is the closest mitochondrial relative.
Collapse
Affiliation(s)
- Gertraud Burger
- Department of Biochemistry, Robert-Cedergren Center in Bioinformatics and Genomics, Université de Montréal, Montreal, Quebec, Canada.
| | | | | | | |
Collapse
|
14
|
Lu S, Le S, Tan Y, Zhu J, Li M, Rao X, Zou L, Li S, Wang J, Jin X, Huang G, Zhang L, Zhao X, Hu F. Genomic and proteomic analyses of the terminally redundant genome of the Pseudomonas aeruginosa phage PaP1: establishment of genus PaP1-like phages. PLoS One 2013; 8:e62933. [PMID: 23675441 PMCID: PMC3652863 DOI: 10.1371/journal.pone.0062933] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2012] [Accepted: 03/26/2013] [Indexed: 11/22/2022] Open
Abstract
We isolated and characterized a new Pseudomonas aeruginosa myovirus named PaP1. The morphology of this phage was visualized by electron microscopy and its genome sequence and ends were determined. Finally, genomic and proteomic analyses were performed. PaP1 has an icosahedral head with an apex diameter of 68–70 nm and a contractile tail with a length of 138–140 nm. The PaP1 genome is a linear dsDNA molecule containing 91,715 base pairs (bp) with a G+C content of 49.36% and 12 tRNA genes. A strategy to identify the genome ends of PaP1 was designed. The genome has a 1190 bp terminal redundancy. PaP1 has 157 open reading frames (ORFs). Of these, 143 proteins are homologs of known proteins, but only 38 could be functionally identified. Sodium dodecyl sulfate-polyacrylamide gel electrophoresis and high-performance liquid chromatography-mass spectrometry allowed identification of 12 ORFs as structural protein coding genes within the PaP1 genome. Comparative genomic analysis indicated that the Pseudomonas aeruginosa phage PaP1, JG004, PAK_P1 and vB_PaeM_C2-10_Ab1 share great similarity. Besides their similar biological characteristics, the phages contain 123 core genes and have very close phylogenetic relationships, which distinguish them from other known phage genera. We therefore propose that these four phages be classified as PaP1-like phages, a new phage genus of Myoviridae that infects Pseudomonas aeruginosa.
Collapse
Affiliation(s)
- Shuguang Lu
- Department of Microbiology, College of Basic Medical Science, Third Military Medical University, Chongqing, China
| | - Shuai Le
- Department of Microbiology, College of Basic Medical Science, Third Military Medical University, Chongqing, China
| | - Yinling Tan
- Department of Microbiology, College of Basic Medical Science, Third Military Medical University, Chongqing, China
| | - Junmin Zhu
- Department of Microbiology, College of Basic Medical Science, Third Military Medical University, Chongqing, China
| | - Ming Li
- Department of Microbiology, College of Basic Medical Science, Third Military Medical University, Chongqing, China
| | - Xiancai Rao
- Department of Microbiology, College of Basic Medical Science, Third Military Medical University, Chongqing, China
| | - Lingyun Zou
- Department of Microbiology, College of Basic Medical Science, Third Military Medical University, Chongqing, China
| | - Shu Li
- Department of Microbiology, College of Basic Medical Science, Third Military Medical University, Chongqing, China
| | - Jing Wang
- Department of Microbiology, College of Basic Medical Science, Third Military Medical University, Chongqing, China
| | - Xiaolin Jin
- Department of Microbiology, College of Basic Medical Science, Third Military Medical University, Chongqing, China
| | - Guangtao Huang
- Department of Microbiology, College of Basic Medical Science, Third Military Medical University, Chongqing, China
| | - Lin Zhang
- Department of Microbiology, College of Basic Medical Science, Third Military Medical University, Chongqing, China
| | - Xia Zhao
- Department of Microbiology, College of Basic Medical Science, Third Military Medical University, Chongqing, China
| | - Fuquan Hu
- Department of Microbiology, College of Basic Medical Science, Third Military Medical University, Chongqing, China
- * E-mail:
| |
Collapse
|
15
|
Kropinski AM, Waddell T, Meng J, Franklin K, Ackermann HW, Ahmed R, Mazzocco A, Yates J, Lingohr EJ, Johnson RP. The host-range, genomics and proteomics of Escherichia coli O157:H7 bacteriophage rV5. Virol J 2013; 10:76. [PMID: 23497209 PMCID: PMC3606486 DOI: 10.1186/1743-422x-10-76] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2012] [Accepted: 02/28/2013] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND Bacteriophages (phages) have been used extensively as analytical tools to type bacterial cultures and recently for control of zoonotic foodborne pathogens in foods and in animal reservoirs. METHODS We examined the host range, morphology, genome and proteome of the lytic E. coli O157 phage rV5, derived from phage V5, which is a member of an Escherichia coli O157:H7 phage typing set. RESULTS Phage rV5 is a member of the Myoviridae family possessing an icosahedral head of 91 nm between opposite apices. The extended tail measures 121 x 17 nm and has a sheath of 44 x 20 nm and a 7 nm-wide core in the contracted state. It possesses a 137,947 bp genome (43.6 mol%GC) which encodes 233 ORFs and six tRNAs. Until recently this virus appeared to be phylogenetically isolated with almost 70% of its gene products ORFans. rV5 is closely related to coliphages Delta and vB-EcoM-FY3, and more distantly related to Salmonella phages PVP-SE1 and SSE-121, Cronobacter sakazakii phage vB_CsaM_GAP31, and coliphages phAPEC8 and phi92. A complete shotgun proteomic analysis was carried out on rV5, extending what had been gleaned from the genomic analyses. Host range studies revealed that rV5 is active against several other E. coli.
Collapse
Affiliation(s)
- Andrew M Kropinski
- Public Health Agency of Canada, Laboratory for Foodborne Diseases, 110 Stone Road West, Guelph, ON N1G 3W4, Canada
- Department of Molecular and Cellular Biology, University of Guelph, Guelph, ON, N1G 2W1, Canada
| | - Tom Waddell
- Abbott Point of Care, 185 Corkstown Road, Ottawa, ON, K2H 8V4, Canada
| | - Juncai Meng
- Merck Research Laboratories, 126E Lincoln Avenue, Rahway, NJ, 07065, USA
| | - Kristyn Franklin
- Public Health Agency of Canada, Laboratory for Foodborne Diseases, 110 Stone Road West, Guelph, ON N1G 3W4, Canada
| | - Hans-Wolfgang Ackermann
- Département de Microbiologie-infectiologie et immunologie, Faculté de médecine, Université Laval, Québec, QC, G1K 7P4, Canada
| | - Rafiq Ahmed
- Enteric Diseases Program, National Microbiology Laboratory, Public Health Agency of Canada, 1015 Arlington Street, Winnipeg, MB, R3E 3R2, Canada
| | - Amanda Mazzocco
- Public Health Agency of Canada, Laboratory for Foodborne Diseases, 110 Stone Road West, Guelph, ON N1G 3W4, Canada
| | - John Yates
- The Scripps Research Institute, Department of Cell Biology, Proteomic Mass Spectrometry Laboratory, 10550 North Torrey Pines Road, La Jolla, CA, 92037, USA
| | - Erika J Lingohr
- Public Health Agency of Canada, Laboratory for Foodborne Diseases, 110 Stone Road West, Guelph, ON N1G 3W4, Canada
| | - Roger P Johnson
- Public Health Agency of Canada, Laboratory for Foodborne Diseases, 110 Stone Road West, Guelph, ON N1G 3W4, Canada
| |
Collapse
|
16
|
Akhter S, Bailey BA, Salamon P, Aziz RK, Edwards RA. Applying Shannon's information theory to bacterial and phage genomes and metagenomes. Sci Rep 2013; 3:1033. [PMID: 23301154 DOI: 10.1038/srep01033] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2012] [Accepted: 11/20/2012] [Indexed: 01/12/2023] Open
Abstract
All sequence data contain inherent information that can be measured by Shannon's uncertainty theory. Such measurement is valuable in evaluating large data sets, such as metagenomic libraries, to prioritize their analysis and annotation, thus saving computational resources. Here, Shannon's index of complete phage and bacterial genomes was examined. The information content of a genome was found to be highly dependent on the genome length, GC content, and sequence word size. In metagenomic sequences, the amount of information correlated with the number of matches found by comparison to sequence databases. A sequence with more information (higher uncertainty) has a higher probability of being significantly similar to other sequences in the database. Measuring uncertainty may be used for rapid screening for sequences with matches in available database, prioritizing computational resources, and indicating which sequences with no known similarities are likely to be important for more detailed analysis.
Collapse
|
17
|
Seligmann H. Coding constraints modulate chemically spontaneous mutational replication gradients in mitochondrial genomes. Curr Genomics 2012; 13:37-54. [PMID: 22942674 PMCID: PMC3269015 DOI: 10.2174/138920212799034802] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2011] [Revised: 09/07/2011] [Accepted: 09/20/2011] [Indexed: 11/30/2022] Open
Abstract
Distances from heavy and light strand replication origins determine duration mitochondrial DNA remains singlestranded during replication. Hydrolytic deaminations from A->G and C->T occur more on single- than doublestranded DNA. Corresponding replicational nucleotide gradients exist across mitochondrial genomes, most at 3rd, least 2nd codon positions. DNA singlestrandedness during RNA transcription causes gradients mainly in long-lived species with relatively slow metabolism (high transcription/replication ratios). Third codon nucleotide contents, evolutionary results of mutation cumulation, follow replicational, not transcriptional gradients in Homo; observed human mutations follow transcriptional gradients. Synonymous third codon position transitions potentially alter adaptive off frame information. No mutational gradients occur at synonymous positions forming off frame stops (these adaptively stop early accidental frameshifted protein synthesis), nor in regions coding for putative overlapping genes according to an overlapping genetic code reassigning stop codons to amino acids. Deviation of 3rd codon nucleotide contents from deamination gradients increases with coding importance of main frame 3rd codon positions in overlapping genes (greatest if these are 2nd position in overlapping genes). Third codon position deamination gradients calculated separately for each codon family are strongest where synonymous transitions are rarely pathogenic; weakest where transitions are frequently pathogenic. Synonymous mutations affect translational accuracy, such as error compensation of misloaded tRNAs by codon-anticodon mismatches (prevents amino acid misinsertion despite tRNA misacylation), a potential cause of pathogenic mutations at synonymous codon positions. Indeed, codon-family-specific gradients are inversely proportional to error compensation associated with gradient-promoted transitions. Deamination gradients reflect spontaneous chemical reactions in singlestranded DNA, but functional coding constraints modulate gradients.
Collapse
Affiliation(s)
- Hervé Seligmann
- National Collections of Natural History at the Hebrew University of Jerusalem, Jerusalem 91404; Department of Life Sciences, Ben Gurion University, 84105 Beer Sheva, Israel
| |
Collapse
|
18
|
Baker A, Julienne H, Chen CL, Audit B, d'Aubenton-Carafa Y, Thermes C, Arneodo A. Linking the DNA strand asymmetry to the spatio-temporal replication program. I. About the role of the replication fork polarity in genome evolution. Eur Phys J E Soft Matter 2012; 35:92. [PMID: 23001787 DOI: 10.1140/epje/i2012-12092-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/06/2012] [Revised: 08/08/2012] [Accepted: 08/21/2012] [Indexed: 06/01/2023]
Abstract
Two key cellular processes, namely transcription and replication, require the opening of the DNA double helix and act differently on the two DNA strands, generating different mutational patterns (mutational asymmetry) that may result, after long evolutionary time, in different nucleotide compositions on the two DNA strands (compositional asymmetry). We elaborate on the simplest model of neutral substitution rates that takes into account the strand asymmetries generated by the transcription and replication processes. Using perturbation theory, we then solve the time evolution of the DNA composition under strand-asymmetric substitution rates. In our minimal model, the compositional and substitutional asymmetries are predicted to decompose into a transcription- and a replication-associated components. The transcription-associated asymmetry increases in magnitude with transcription rate and changes sign with gene orientation while the replication-associated asymmetry is proportional to the replication fork polarity. These results are confirmed experimentally in the human genome, using substitution rates obtained by aligning the human and chimpanzee genomes using macaca and orangutan as outgroups, and replication fork polarity determined in the HeLa cell line as estimated from the derivative of the mean replication timing. When further investigating the dynamics of compositional skew evolution, we show that it is not at equilibrium yet and that its evolution is an extremely slow process with characteristic time scales of several hundred Myrs.
Collapse
Affiliation(s)
- A Baker
- Université de Lyon, Lyon, France
| | | | | | | | | | | | | |
Collapse
|
19
|
Abstract
Several families of plasmids and viruses (PVs) have now been described in hyperthermophilic archaea of the order Thermococcales. One family of plasmids replicates by the rolling circle mechanism, whereas most other PVs probably replicate by the θ mode. PVs from Thermococcales encode novel families of DNA replication proteins that have only detectable homologues in other archaeal PVs. PVs from different families share a common gene pool and co-evolve with their hosts. Most Thermococcales also produce virus-like membrane vesicles similar to eukaryotic microparticles (ectosomes). Some membrane vesicles of Thermococcus nautilus harbour the plasmid pTN1, suggesting that vesicles can be involved in plasmid transfer between species.
Collapse
|
20
|
Chen CL, Duquenne L, Audit B, Guilbaud G, Rappailles A, Baker A, Huvet M, d'Aubenton-Carafa Y, Hyrien O, Arneodo A, Thermes C. Replication-associated mutational asymmetry in the human genome. Mol Biol Evol 2011; 28:2327-37. [PMID: 21368316 DOI: 10.1093/molbev/msr056] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
During evolution, mutations occur at rates that can differ between the two DNA strands. In the human genome, nucleotide substitutions occur at different rates on the transcribed and non-transcribed strands that may result from transcription-coupled repair. These mutational asymmetries generate transcription-associated compositional skews. To date, the existence of such asymmetries associated with replication has not yet been established. Here, we compute the nucleotide substitution matrices around replication initiation zones identified as sharp peaks in replication timing profiles and associated with abrupt jumps in the compositional skew profile. We show that the substitution matrices computed in these regions fully explain the jumps in the compositional skew profile when crossing initiation zones. In intergenic regions, we observe mutational asymmetries measured as differences between complementary substitution rates; their sign changes when crossing initiation zones. These mutational asymmetries are unlikely to result from cryptic transcription but can be explained by a model based on replication errors and strand-biased repair. In transcribed regions, mutational asymmetries associated with replication superimpose on the previously described mutational asymmetries associated with transcription. We separate the substitution asymmetries associated with both mechanisms, which allows us to determine for the first time in eukaryotes, the mutational asymmetries associated with replication and to reevaluate those associated with transcription. Replication-associated mutational asymmetry may result from unequal rates of complementary base misincorporation by the DNA polymerases coupled with DNA mismatch repair (MMR) acting with different efficiencies on the leading and lagging strands. Replication, acting in germ line cells during long evolutionary times, contributed equally with transcription to produce the present abrupt jumps in the compositional skew. These results demonstrate that DNA replication is one of the major processes that shape human genome composition.
Collapse
Affiliation(s)
- Chun-Long Chen
- Centre de Génétique Moléculaire, Centre National de la Recherche Scientifique (CNRS), Gif-sur-Yvette, France
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
21
|
Khrustalev VV, Barkovsky EV. The level of cytosine is usually much higher than the level of guanine in two-fold degenerated sites from third codon positions of genes from Simplex- and Varicelloviruses with G+C higher than 50%. J Theor Biol 2010; 266:88-98. [PMID: 20600145 DOI: 10.1016/j.jtbi.2010.06.023] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2010] [Revised: 05/05/2010] [Accepted: 06/15/2010] [Indexed: 11/26/2022]
Abstract
We studied usage of cytosine and guanine in 914 genes from completely sequenced genomes of five Simplex- and seven Varicelloviruses. In genes with total GC-content higher than 50% usage of cytosine is usually higher than usage of guanine (an average difference for genes with G+C higher than 70% reaches 4.0%). This difference is caused mostly by the elevated usage of cytosine in two-fold degenerated sites situated in third codon positions relatively to the usage of guanine in two-fold degenerated sites situated in third codon positions (an average difference for genes with G+C higher than 70% is equal to 28.2%). The usage of amino acids that are encoded by codons containing cytosine in two-fold degenerated sites situated in third codon positions (AA2TC) is much higher than the usage of amino acids encoded by codons containing guanine in two-fold degenerated sites situated in third codon positions (AA2AG). The usage of AA2AG declines much more steeply with the growth of GC-content than the usage of AA2TC. This effect is the consequence of the nature of genetic code and of the negative selection. In GC-rich genes the usage of cytosine in four-fold degenerated sites is only a little (but significantly) higher than the usage of guanine (in genes with G+C higher than 70% an average difference is equal to 4.3%). This difference may be caused by transcription-associated mutational pressure.
Collapse
Affiliation(s)
- Vladislav Victorovich Khrustalev
- Department of General Chemistry, Belarussian State Medical University, Communisticheskaya 7-24, Dzerzinskogo 83, Minsk 220029, Belarus.
| | | |
Collapse
|
22
|
Kropinski AM, Borodovsky M, Carver TJ, Cerdeño-Tárraga AM, Darling A, Lomsadze A, Mahadevan P, Stothard P, Seto D, Van Domselaar G, Wishart DS. In silico identification of genes in bacteriophage DNA. Methods Mol Biol 2009; 502:57-89. [PMID: 19082552 DOI: 10.1007/978-1-60327-565-1_6] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/27/2023]
Abstract
One of the most satisfying aspects of a genome sequencing project is the identification of the genes contained within it.These are of two types: those which encode tRNAs and those which produce proteins. After a general introduction on the properties of protein-encoding genes and the utility of the Basic Local Alignment Search Tool (BLASTX) to identify genes through homologs, a variety of tools are discussed by their creators. These include for genome annotation: GeneMark, Artemis, and BASys; and, for genome comparisons: Artemis Comparison Tool (ACT), Mauve, CoreGenes, and GeneOrder.
Collapse
|
23
|
Uchiyama J, Rashel M, Matsumoto T, Sumiyama Y, Wakiguchi H, Matsuzaki S. Characteristics of a novel Pseudomonas aeruginosa bacteriophage, PAJU2, which is genetically related to bacteriophage D3. Virus Res 2008; 139:131-4. [PMID: 19010363 DOI: 10.1016/j.virusres.2008.10.005] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2008] [Revised: 10/15/2008] [Accepted: 10/15/2008] [Indexed: 11/24/2022]
Abstract
Pseudomonas aeruginosa bacteriophage (phage) is one of the most taxonomically and genetically diverse phages. Although phage D3 is one of well-studied P. aeruginosa phages, no D3-related P. aeruginosa phage has been reported. We report a novel P. aeruginosa siphovirus, PAJU2, which is genetically related to but morphology distinct (highly elongated head) from phage D3. A PAJU2 capsid protein, Orf3, is thought to be synthesized as a protein fused to a prohead protease and is autocatalytically cleaved, which may form the head chain mail. Despite such morphological differences, PAJU2 is expected to be a useful genetic reference for phage D3.
Collapse
Affiliation(s)
- Jumpei Uchiyama
- Department of Pediatrics, Kochi Medical School, Kochi, Japan
| | | | | | | | | | | |
Collapse
|
24
|
Mugal CF, von Grünberg HH, Peifer M. Transcription-induced mutational strand bias and its effect on substitution rates in human genes. Mol Biol Evol 2008; 26:131-42. [PMID: 18974087 DOI: 10.1093/molbev/msn245] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
If substitution rates are not the same on the two complementary DNA strands, a substitution is considered strand asymmetric. Such substitutional strand asymmetries are determined here for the three most frequent types of substitution on the human genome (C --> T, A --> G, and G --> T). Substitution rate differences between both strands are estimated for 4,590 human genes by aligning all repeats occurring within the introns with their ancestral consensus sequences. For 1,630 of these genes, both coding strand and noncoding strand rates could be compared with rates in gene-flanking regions. All three rates considered are found to be on average higher on the coding strand and lower on the transcribed strand in comparison to their values in the gene-flanking regions. This finding points to the simultaneous action of rate-increasing effects on the coding strand--such as increased adenine and cytosine deamination--and transcription-coupled repair as a rate-reducing effect on the transcribed strand. The common behavior of the three rates leads to strong correlations of the rate asymmetries: Whenever one rate is strand biased, the other two rates are likely to show the same bias. Furthermore, we determine all three rate asymmetries as a function of time: the A --> G and G --> T rate asymmetries are both found to be constant in time, whereas the C --> T rate asymmetry shows a pronounced time dependence, an observation that explains the difference between our results and those of an earlier work by Green et al. (2003. Transcription-associated mutational asymmetry in mammalian evolution. Nat Genet. 33:514-517.). Finally, we show that in addition to transcription also the replication process biases the substitution rates in genes.
Collapse
Affiliation(s)
- Carina F Mugal
- Institute of Chemistry, Karl-Franzens University Graz, Graz, Austria
| | | | | |
Collapse
|
25
|
Uchiyama J, Rashel M, Takemura I, Wakiguchi H, Matsuzaki S. In silico and in vivo evaluation of bacteriophage phiEF24C, a candidate for treatment of Enterococcus faecalis infections. Appl Environ Microbiol 2008; 74:4149-63. [PMID: 18456848 DOI: 10.1128/AEM.02371-07] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Along with the increasing threat of nosocomial infections by vancomycin-resistant Enterococcus faecalis, bacteriophage (phage) therapy has been expected as an alternative therapy against infectious disease. Although genome information and proof of applicability are prerequisites for a modern therapeutic phage, E. faecalis phage has not been analyzed in terms of these aspects. Previously, we reported a novel virulent phage, phiEF24C, and its biology indicated its therapeutic potential against E. faecalis infection. In this study, the phiEF24C genome was analyzed and the in vivo therapeutic applicability of phiEF24C was also briefly assessed. Its complete genome (142,072 bp) was predicted to have 221 open reading frames (ORFs) and five tRNA genes. In our functional analysis of the ORFs by use of a public database, no proteins undesirable in phage therapy, such as pathogenic and integration-related proteins, were predicted. The noncompetitive directions of replication and transcription and the host-adapted translation of the phage were deduced bioinformatically. Its genomic features indicated that phiEF24C is a member of the SPO1-like phage genus and especially that it has a close relationship to the Listeria phage P100, which is authorized for prophylactic use. Thus, these bioinformatics analyses rationalized the therapeutic eligibility of phiEF24C. Moreover, the in vivo therapeutic potential of phiEF24C, which was effective at a low concentration and was not affected by host sensitivity to the phage, was proven by use of sepsis BALB/c mouse models. Furthermore, no change in mouse lethality was observed under either single or repeated phage exposures. Although further study is required, phiEF24C can be a promising therapeutic phage against E. faecalis infections.
Collapse
|
26
|
Kropinski AM, Kovalyova IV, Billington SJ, Patrick AN, Butts BD, Guichard JA, Pitcher TJ, Guthrie CC, Sydlaske AD, Barnhill LM, Havens KA, Day KR, Falk DR, McConnell MR. The genome of epsilon15, a serotype-converting, Group E1 Salmonella enterica-specific bacteriophage. Virology 2007; 369:234-44. [PMID: 17825342 PMCID: PMC2698709 DOI: 10.1016/j.virol.2007.07.027] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2006] [Revised: 07/17/2007] [Accepted: 07/19/2007] [Indexed: 01/06/2023]
Abstract
The genome sequence of the Salmonella enterica serovar Anatum-specific, serotype-converting bacteriophage epsilon15 has been completed. The nonredundant genome contains 39,671 bp and 51 putative genes. It most closely resembles the genome of phiV10, an Escherichia coli O157:H7-specific temperate phage, with which it shares 36 related genes. More distant relatives include the Burkholderia cepacia-specific phage, BcepC6B (8 similar genes), the Bordetella bronchiseptica-specific phage, BPP-1 (8 similar genes) and the Photobacterium profundum prophage, P Pphipr1 (6 similar genes). epsilon15 gene identifications based on homologies with known gene families include the terminase small and large subunits, integrase, endolysin, two holins, two DNA methylase enzymes (one adenine-specific and one cytosine-specific) and a RecT-like enzyme. Genes identified experimentally include those coding for the serotype conversion proteins, the tail fiber, the major capsid protein and the major repressor. epsilon15's attP site and the Salmonella attB site with which it interacts during lysogenization have also been determined.
Collapse
Affiliation(s)
- Andrew M. Kropinski
- Department of Microbiology and Immunology, Queens University, Kingston, Ontario K7L 3N6, Canada
- Public Health Agency of Canada, Laboratory for Foodborne Zoonoses, Guelph, Ontario N1G 3W4, Canada
| | - Irina V. Kovalyova
- Department of Microbiology and Immunology, Queens University, Kingston, Ontario K7L 3N6, Canada
- Biology Department, Simon Fraser University, Burnaby, British Columbia V5A 1S6, Canada
| | | | - Aaron N. Patrick
- Department of Biology, Point Loma Nazarene University, San Diego, CA 92106
| | - Brent D. Butts
- Department of Biology, Point Loma Nazarene University, San Diego, CA 92106
| | - Jared A. Guichard
- Department of Biology, Point Loma Nazarene University, San Diego, CA 92106
| | - Trevor J. Pitcher
- Department of Biology, Point Loma Nazarene University, San Diego, CA 92106
| | - Carly C. Guthrie
- Department of Biology, Point Loma Nazarene University, San Diego, CA 92106
| | - Anya D. Sydlaske
- Department of Biology, Point Loma Nazarene University, San Diego, CA 92106
| | - Lisa M. Barnhill
- Department of Biology, Point Loma Nazarene University, San Diego, CA 92106
| | - Kyle A. Havens
- Department of Biology, Point Loma Nazarene University, San Diego, CA 92106
| | - Kenneth R. Day
- Department of Biology, Point Loma Nazarene University, San Diego, CA 92106
| | - Darrel R. Falk
- Department of Biology, Point Loma Nazarene University, San Diego, CA 92106
| | | |
Collapse
|
27
|
Monier A, Claverie JM, Ogata H. Horizontal gene transfer and nucleotide compositional anomaly in large DNA viruses. BMC Genomics 2007; 8:456. [PMID: 18070355 PMCID: PMC2211322 DOI: 10.1186/1471-2164-8-456] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2007] [Accepted: 12/10/2007] [Indexed: 12/02/2022] Open
Abstract
Background DNA viruses have a wide range of genome sizes (5 kb up to 1.2 Mb, compared to 0.16 Mb to 1.5 Mb for obligate parasitic bacteria) that do not correlate with their virulence or the taxonomic distribution of their hosts. The reasons for such large variation are unclear. According to the traditional view of viruses as gifted "gene pickpockets", large viral genome sizes could originate from numerous gene acquisitions from their hosts. We investigated this hypothesis by studying 67 large DNA viruses with genome sizes larger than 150 kb, including the recently characterized giant mimivirus. Given that horizontally transferred DNA often have anomalous nucleotide compositions differing from the rest of the genome, we conducted a detailed analysis of the inter- and intra-genome compositional properties of these viruses. We then interpreted their compositional heterogeneity in terms of possible causes, including strand asymmetry, gene function/expression, and horizontal transfer. Results We first show that the global nucleotide composition and nucleotide word usage of viral genomes are species-specific and distinct from those of their hosts. Next, we identified compositionally anomalous (cA) genes in viral genomes, using a method based on Bayesian inference. The proportion of cA genes is highly variable across viruses and does not exhibit a significant correlation with genome size. The vast majority of the cA genes were of unknown function, lacking homologs in the databases. For genes with known homologs, we found a substantial enrichment of cA genes in specific functional classes for some of the viruses. No significant association was found between cA genes and compositional strand asymmetry. A possible exogenous origin for a small fraction of the cA genes could be confirmed by phylogenetic reconstruction. Conclusion At odds with the traditional dogma, our results argue against frequent genetic transfers to large DNA viruses from their modern hosts. The large genome sizes of these viruses are not simply explained by an increased propensity to acquire foreign genes. This study also confirms that the anomalous nucleotide compositions of the cA genes is sometimes linked to particular biological functions or expression patterns, possibly leading to an overestimation of recent horizontal gene transfers.
Collapse
Affiliation(s)
- Adam Monier
- Structural and Genomic Information Laboratory, CNRS - UPR 2589, Institute for Structural Biology and Microbiology, Parc Scientifique de Luminy, 163 avenue de Luminy, FR-13288, Marseille cedex 09, France.
| | | | | |
Collapse
|
28
|
Pagaling E, Haigh RD, Grant WD, Cowan DA, Jones BE, Ma Y, Ventosa A, Heaphy S. Sequence analysis of an Archaeal virus isolated from a hypersaline lake in Inner Mongolia, China. BMC Genomics 2007; 8:410. [PMID: 17996081 PMCID: PMC2194725 DOI: 10.1186/1471-2164-8-410] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2007] [Accepted: 11/09/2007] [Indexed: 11/10/2022] Open
Abstract
Background We are profoundly ignorant about the diversity of viruses that infect the domain Archaea. Less than 100 have been identified and described and very few of these have had their genomic sequences determined. Here we report the genomic sequence of a previously undescribed archaeal virus. Results Haloarchaeal strains with 16S rRNA gene sequences 98% identical to Halorubrum saccharovorum were isolated from a hypersaline lake in Inner Mongolia. Two lytic viruses infecting these were isolated from the lake water. The BJ1 virus is described in this paper. It has an icosahedral head and tail morphology and most likely a linear double stranded DNA genome exhibiting terminal redundancy. Its genome sequence has 42,271 base pairs with a GC content of ~65 mol%. The genome of BJ1 is predicted to encode 70 ORFs, including one for a tRNA. Fifty of the seventy ORFs had no identity to data base entries; twenty showed sequence identity matches to archaeal viruses and to haloarchaea. ORFs possibly coding for an origin of replication complex, integrase, helicase and structural capsid proteins were identified. Evidence for viral integration was obtained. Conclusion The virus described here has a very low sequence identity to any previously described virus. Fifty of the seventy ORFs could not be annotated in any way based on amino acid identities with sequences already present in the databases. Determining functions for ORFs such as these is probably easier using a simple virus as a model system.
Collapse
Affiliation(s)
- Eulyn Pagaling
- Department of Infection Immunity and Inflammation, University of Leicester, University Road, Leicester, LE1 9HN, UK.
| | | | | | | | | | | | | | | |
Collapse
|
29
|
Abstract
Both transcription-associated and replication-associated strand compositional asymmetries have recently been shown in vertebrate genomes. In this paper, we illustrate that transcription-associated strand compositional asymmetries and replication-associated ones coexist in most vertebrate large genes, although in most case the former conceals the latter. Furthermore, we found that the transcription-associated strand compositional asymmetries of housekeeping genes are stronger than those of somatic cell expressed genes. Together with other evidence, we suggest that germline transcription-associated strand asymmetric mutations may be the main cause of the transcription-associated strand compositional asymmetries.
Collapse
Affiliation(s)
- Hai-Fang Wang
- Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | | | | |
Collapse
|
30
|
Thomas JM, Horspool D, Brown G, Tcherepanov V, Upton C. GraphDNA: a Java program for graphical display of DNA composition analyses. BMC Bioinformatics 2007; 8:21. [PMID: 17244370 DOI: 10.1186/1471-2105-8-21] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2006] [Accepted: 01/23/2007] [Indexed: 11/10/2022] Open
Abstract
Background Under conditions of no strand bias the number of Gs is equal to that of Cs for each DNA strand; similarly, the total number of Ts is equal to that of As. However, within each strand there are considerable local deviations from the A = T and G = C equality. These asymmetries in nucleotide composition have been extensively analyzed in prokaryotic and eukaryotic genomes and related to chromosome organization, transcription orientation and other processes in certain organisms. To carry out analysis of intra-strand nucleotide distribution several graphical methods have been developed. Results GraphDNA is a new Java application that provides a simple, user-friendly interface for the visualization of DNA nucleotide composition. The program accepts GenBank, EMBL and FASTA files as an input, and it displays multiple DNA nucleotide composition graphs (skews and walks) in a single window to allow direct comparisons between the sequences. We illustrate the use of DNA skews for characterization of poxvirus and coronavirus genomes. Conclusion GraphDNA is a platform-independent, Open Source, tool for the analysis of nucleotide trends in DNA sequences. Multiple sequence formats can be read and multiple sequences may be plotted in a single results window.
Collapse
|
31
|
Sewatanon J, Srichatrapimuk S, Auewarakul P. Compositional bias and size of genomes of human DNA viruses. Intervirology 2006; 50:123-32. [PMID: 17191014 DOI: 10.1159/000098238] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2006] [Accepted: 07/27/2006] [Indexed: 11/19/2022] Open
Abstract
Genomes of 144 human DNA viruses were analyzed in the aspect of their compositional asymmetry. DNA viruses were divided into two groups according to their genome sizes. The analysis revealed that the level of guanine and cytosine (GC content) in the coding sequences of small genome DNA viruses was significantly lower than that of large genome DNA viruses. Because small genome viruses replicate their genomes using cellular enzymes, while large genome viruses use their own enzymes for genome replication, the two groups of viruses may be under different mutational bias and/or selection pressure. In these viruses, GC content at the third codon position correlated with GC content at the first and second codon position. However, the relationship in small genome DNA viruses was weaker than that in large genome DNA viruses, suggesting that their genome composition may be more strongly influenced by codon usage preference or restriction on amino acid composition.
Collapse
Affiliation(s)
- Jaturong Sewatanon
- Department of Microbiology, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand
| | | | | |
Collapse
|
32
|
Abstract
Compositional replication strand bias, commonly referred to as GC skew, is present in many genomes of prokaryotes, eukaryotes, and viruses. Although cytosine deamination in ssDNA (resulting in C-->T changes on the leading strand) is often invoked as its major cause, the precise contributions of this and other substitution types are currently unknown. It is also unclear if the underlying mutational asymmetries are the same among taxa, are stable over time, or how closely the observed biases are to mutational equilibrium. We analyzed nearly neutral sites of seven taxa each with between three and six complete bacterial genomes, and inferred the substitution spectra of fourfold degenerate positions in nonhighly expressed genes. Using a bootstrap procedure, we extracted compositional biases associated with replication and identified the significant asymmetries. Although all taxa showed an overrepresentation of G relative to C on the leading strand (and imbalances between A and T), widely variable substitution asymmetries are noted. Surprisingly, all substitution types show significant asymmetry in at least one taxon, but none were universally biased in all taxa. Notably, in the two most biased genomes, A-->G, rather than C-->T, shapes the compositional bias. Given the variability in these biases, we propose that the process is multifactorial. Finally, we also find that most genomes are not at compositional equilibrium, and suggest that mutational-based heterotachy is deeply imprinted in the history of biological macromolecules. This shows that similar compositional biases associated with the same essential well-conserved process, replication, do not reflect similar mutational processes in different genomes, and that caution is required in inferring the roles of specific mutational biases on the basis of contemporary patterns of sequence composition.
Collapse
Affiliation(s)
- Eduardo P C Rocha
- Unité Génétique des Génomes Bactériens, URA 2171, Institut Pasteur, 75015 Paris, France.
| | | | | |
Collapse
|
33
|
Alexandrov NN, Troukhan ME, Brover VV, Tatarinova T, Flavell RB, Feldmann KA. Features of Arabidopsis genes and genome discovered using full-length cDNAs. Plant Mol Biol 2006; 60:69-85. [PMID: 16463100 DOI: 10.1007/s11103-005-2564-9] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/26/2004] [Accepted: 08/29/2005] [Indexed: 05/06/2023]
Abstract
Arabidopsis is currently the reference genome for higher plants. A new, more detailed statistical analysis of Arabidopsis gene structure is presented including intron and exon lengths, intergenic distances, features of promoters, and variant 5'-ends of mRNAs transcribed from the same transcription unit. We also provide a statistical characterization of Arabidopsis transcripts in terms of their size, UTR lengths, 3'-end cleavage sites, splicing variants, and coding potential. These analyses were facilitated by scrutiny of our collection of sequenced full-length cDNAs and much larger collection of 5'-ESTs, together with another set of full-length cDNAs from Salk/Stanford/Plant Gene Expression Center/RIKEN. Examples of alternative splicing are observed for transcripts from 7% of the genes and many of these genes display multiple spliced isoforms. Most splicing variants lie in non-coding regions of the transcripts. Non-canonical splice sites constitute less than 1% of all splice sites. Genes with fewer than four introns display reduced average mRNA levels. Putative alternative transcription start sites were observed in 30% of highly expressed genes and in more than 50% of the genes with low expression. Transcription start sites correlate remarkably well with a CG skew peak in the DNA sequences. The intergenic distances vary considerably, those where genes are transcribed towards one another being significantly shorter. New transcripts, missing in the current TIGR genome annotation and ESTs that are non-coding, including those antisense to known genes, are derived and cataloged in the Supplementary Material. They identify 148 new loci in the Arabidopsis genome. The conclusions drawn provide a better understanding of the Arabidopsis genome and how the gene transcripts are processed. The results also allow better predictions to be made for, as yet, poorly defined genes and provide a reference for comparisons with other plant genomes whose complete sequences are currently being determined. Some comparisons with rice are included in this paper.
Collapse
|
34
|
Abstract
In 1968, Chargaff and his colleagues discovered a rule in Bacillus subtilis: in single stranded DNA, A=T and C=G. This rule has since been confirmed many times in other bacterial and eukaryotic genomes. To the best of our knowledge, this rule has not been tested before in either single stranded DNA or RNA genomes. Over 3400 genomic sequences were examined here and included for the first time both double and single stranded DNA and RNA genomes. We found that: (1) with the exception of the organellar DNA, this parity rule holds for all types of double stranded DNA genomes and (2) that this rule fails to hold for other types of genomes. The parity rule appears to be a selective force on genome evolution and codon use.
Collapse
Affiliation(s)
- David Mitchell
- Vice Deanery of Genetics and Microbiology, Trinity College, Dublin, Ireland.
| | | |
Collapse
|
35
|
Nikolaou C, Almirantis Y. A study on the correlation of nucleotide skews and the positioning of the origin of replication: different modes of replication in bacterial species. Nucleic Acids Res 2005; 33:6816-22. [PMID: 16321966 PMCID: PMC1301597 DOI: 10.1093/nar/gki988] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Deviations from Chargaff's 2nd parity rule, according to which A approximately T and G approximately C in single stranded DNA, have been associated with replication as well as with transcription in prokaryotes. Based on observations regarding mainly the transcription-replication co-linearity in a large number of prokaryotic species, we formulate the hypothesis that the replication procedure may follow different modes between genomes throughout which the skews clearly follow different patterns. We draw the conclusion that multiple functional sites of origin of replication may exist in the genomes of most archaea and in some exceptional cases of eubacteria, while in the majority of eubacteria, replication occurs through a single fixed origin.
Collapse
Affiliation(s)
- Christoforos Nikolaou
- Institute of Biology, National Centre of Scientific Research Demokritos, 15310 Athens, Greece.
| | | |
Collapse
|
36
|
Das S, Paul S, Dutta C. Synonymous codon usage in adenoviruses: influence of mutation, selection and protein hydropathy. Virus Res 2005; 117:227-36. [PMID: 16307819 DOI: 10.1016/j.virusres.2005.10.007] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2005] [Revised: 10/19/2005] [Accepted: 10/19/2005] [Indexed: 11/23/2022]
Abstract
Trends in synonymous codon usage in adenoviruses have been examined through the multivariate statistical analysis on the annotated protein-coding regions of 22 adenoviral species, for which complete genome sequences are available. One of the major determinants of such trends is the G+C content at third codon positions of the genes, the average value of which varied from one viral genome to other depending on the overall mutational bias of the species. G3S and C3S interacted synergistically along the first principal axis of correspondence analysis on the Relative Synonymous Codon Usage of adenoviral genes, but antagonistically along the second principal axis. The intra-genomic variation in codon usage pattern in adenoviruses is generally influenced by asymmetrical mutational bias in two DNA strands. Other major determinants of the trends are the natural selection, putatively operative at the level of translation and quite interestingly, hydropathy of the encoded proteins. The trends in codon usage, though characterized by distinct virus-specific mutational bias, do not exhibit any sign of host-specificity. Significant variations are observed in synonymous codon choice in structural and nonstructural genes of adenoviruses.
Collapse
Affiliation(s)
- Sabyasachi Das
- Bioinformatics Centre, Indian Institute of Chemical Biology, 4, Raja S.C. Mullick Road, Kolkata 700032, India
| | | | | |
Collapse
|
37
|
Pyrc K, Jebbink MF, Berkhout B, van der Hoek L. Genome structure and transcriptional regulation of human coronavirus NL63. Virol J 2004; 1:7. [PMID: 15548333 PMCID: PMC538260 DOI: 10.1186/1743-422x-1-7] [Citation(s) in RCA: 92] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2004] [Accepted: 11/17/2004] [Indexed: 11/23/2022] Open
Abstract
Background Two human coronaviruses are known since the 1960s: HCoV-229E and HCoV-OC43. SARS-CoV was discovered in the early spring of 2003, followed by the identification of HCoV-NL63, the fourth member of the coronaviridae family that infects humans. In this study, we describe the genome structure and the transcription strategy of HCoV-NL63 by experimental analysis of the viral subgenomic mRNAs. Results The genome of HCoV-NL63 has the following gene order: 1a-1b-S-ORF3-E-M-N. The GC content of the HCoV-NL63 genome is extremely low (34%) compared to other coronaviruses, and we therefore performed additional analysis of the nucleotide composition. Overall, the RNA genome is very low in C and high in U, and this is also reflected in the codon usage. Inspection of the nucleotide composition along the genome indicates that the C-count increases significantly in the last one-third of the genome at the expense of U and G. We document the production of subgenomic (sg) mRNAs coding for the S, ORF3, E, M and N proteins. We did not detect any additional sg mRNA. Furthermore, we sequenced the 5' end of all sg mRNAs, confirming the presence of an identical leader sequence in each sg mRNA. Northern blot analysis indicated that the expression level among the sg mRNAs differs significantly, with the sg mRNA encoding nucleocapsid (N) being the most abundant. Conclusions The presented data give insight into the viral evolution and mutational patterns in coronaviral genome. Furthermore our data show that HCoV-NL63 employs the discontinuous replication strategy with generation of subgenomic mRNAs during the (-) strand synthesis. Because HCoV-NL63 has a low pathogenicity and is able to grow easily in cell culture, this virus can be a powerful tool to study SARS coronavirus pathogenesis.
Collapse
Affiliation(s)
- Krzysztof Pyrc
- Department of Human Retrovirology, University of Amsterdam, Meibergdreef 15, 1105 AZ, Amsterdam, The Netherlands
| | - Maarten F Jebbink
- Department of Human Retrovirology, University of Amsterdam, Meibergdreef 15, 1105 AZ, Amsterdam, The Netherlands
| | - Ben Berkhout
- Department of Human Retrovirology, University of Amsterdam, Meibergdreef 15, 1105 AZ, Amsterdam, The Netherlands
| | - Lia van der Hoek
- Department of Human Retrovirology, University of Amsterdam, Meibergdreef 15, 1105 AZ, Amsterdam, The Netherlands
| |
Collapse
|
38
|
Łobocka MB, Rose DJ, Plunkett G, Rusin M, Samojedny A, Lehnherr H, Yarmolinsky MB, Blattner FR. Genome of bacteriophage P1. J Bacteriol 2004; 186:7032-68. [PMID: 15489417 PMCID: PMC523184 DOI: 10.1128/jb.186.21.7032-7068.2004] [Citation(s) in RCA: 193] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2004] [Accepted: 07/09/2004] [Indexed: 11/20/2022] Open
Abstract
P1 is a bacteriophage of Escherichia coli and other enteric bacteria. It lysogenizes its hosts as a circular, low-copy-number plasmid. We have determined the complete nucleotide sequences of two strains of a P1 thermoinducible mutant, P1 c1-100. The P1 genome (93,601 bp) contains at least 117 genes, of which almost two-thirds had not been sequenced previously and 49 have no homologs in other organisms. Protein-coding genes occupy 92% of the genome and are organized in 45 operons, of which four are decisive for the choice between lysis and lysogeny. Four others ensure plasmid maintenance. The majority of the remaining 37 operons are involved in lytic development. Seventeen operons are transcribed from sigma(70) promoters directly controlled by the master phage repressor C1. Late operons are transcribed from promoters recognized by the E. coli RNA polymerase holoenzyme in the presence of the Lpa protein, the product of a C1-controlled P1 gene. Three species of P1-encoded tRNAs provide differential controls of translation, and a P1-encoded DNA methyltransferase with putative bifunctionality influences transcription, replication, and DNA packaging. The genome is particularly rich in Chi recombinogenic sites. The base content and distribution in P1 DNA indicate that replication of P1 from its plasmid origin had more impact on the base compositional asymmetries of the P1 genome than replication from the lytic origin of replication.
Collapse
Affiliation(s)
- Małgorzata B Łobocka
- Department of Microbial Biochemistry, Institute of Biochemistry and Biophysics of the Polish Academy of Sciences, Ul. Pawinskiego 5A, 02-106 Warsaw, Poland.
| | | | | | | | | | | | | | | |
Collapse
|
39
|
Abstract
The replication of the chromosome is among the most essential functions of the bacterial cell and influences many other cellular mechanisms, from gene expression to cell division. Yet the way it impacts on the bacterial chromosome was not fully acknowledged until the availability of complete genomes allowed one to look upon genomes as more than bags of genes. Chromosomal replication includes a set of asymmetric mechanisms, among which are a division in a lagging and a leading strand and a gradient between early and late replicating regions. These differences are the causes of many of the organizational features observed in bacterial genomes, in terms of both gene distribution and sequence composition along the chromosome. When asymmetries or gradients increase in some genomes, e.g. due to a different composition of the DNA polymerase or to a higher growth rate, so do the corresponding biases. As some of the features of the chromosome structure seem to be under strong selection, understanding such biases is important for the understanding of chromosome organization and adaptation. Inversely, understanding chromosome organization may shed further light on questions relating to replication and cell division. Ultimately, the understanding of the interplay between these different elements will allow a better understanding of bacterial genetics and evolution.
Collapse
Affiliation(s)
- Eduardo P C Rocha
- Atelier de Bioinformatique, Université Pierre et Marie Curie, 12, Rue Cuvier, 75005 Paris, and Unité Génétique des Génomes Bactériens, Institut Pasteur, 28 rue du Dr Roux, 75724 Paris Cedex 15, France
| |
Collapse
|
40
|
Abstract
Focused efforts by several international laboratories have resulted in the sequencing of the genome of the causative agent of severe acute respiratory syndrome (SARS), novel coronavirus SARS-CoV, in record time. Using cumulative skew diagrams, I found that mutational patterns in the SARS-CoV genome were strikingly different from other coronaviruses in terms of mutation rates, although they were in general agreement with the model of the coronavirus lifecycle. These findings might be relevant for the development of sequence-based diagnostics and the design of agents to treat SARS.
Collapse
|
41
|
Abstract
The genome of enterobacterial phage T1 has been sequenced, revealing that its 50.7-kb terminally redundant, circularly permuted sequence contains 48,836 bp of nonredundant nucleotides. Seventy-seven open reading frames (ORFs) were identified, with a high percentage of small genes located at the termini of the genomes displaying no homology to existing phage or prophage proteins. Of the genes showing homologs (47%), we identified those involved in host DNA degradation (three endonucleases) and T1 replication (DNA helicase, primase, and single-stranded DNA-binding proteins) and recombination (RecE and Erf homologs). While the tail genes showed homology to those from temperate coliphage N15, the capsid biosynthetic genes were unique. Phage proteins were resolved by 2D gel electrophoresis, and mass spectrometry was used to identify several of the spots including the major head, portal, and tail proteins, thus verifying the annotation.
Collapse
Affiliation(s)
- Mary D Roberts
- Biology Department, Radford University, Radford, VA 24142, USA
| | | | | |
Collapse
|
42
|
Abstract
BACKGROUND Chromosomal DNA replication in bacteria starts at the origin (ori) and the two replicores propagate in opposite directions up to the terminus (ter) region. We hypothesize that the two replicores need to reach ter at the same time to maintain a physical balance; DNA insertion would disrupt such a balance, requiring chromosomal rearrangements to restore the balance. To test this hypothesis, we needed to demonstrate that ori and ter are in a physical balance in bacterial chromosomes. Using wavelet analysis, we documented GC skew, AT skew, purine excess and keto excess on the published bacterial genomic sequences to locate the turning (minimum and maximum) points on the curves. Previously, the minimum point had been supposed to correlate with ori and the maximum to correlate with ter. RESULTS We observed a strong tendency of the bacterial chromosomes towards a physical balance, with the minima and maxima corresponding to the known or putative ori and ter and being about half chromosome separated in most of the bacteria studied. A nonparametric method based on wavelet transformation was employed to perform significance tests for the predicted loci. CONCLUSIONS The wavelet approach can reliably predict the ori and ter regions and the bacterial chromosomes have a strong tendency towards a physical balance between ori and ter.
Collapse
Affiliation(s)
- Jiuzhou Song
- Departments of Microbiology and Infectious Diseases, University of Calgary, Calgary, Canada
| | - Antony Ware
- Mathematics and Statistics, University of Calgary, Calgary, Canada
| | - Shu-Lin Liu
- Departments of Microbiology and Infectious Diseases, University of Calgary, Calgary, Canada
- Department of Microbiology, Peking University School of Basic Medical Sciences, Beijing, China
| |
Collapse
|
43
|
Ghosh S, Satish S, Tyagi S, Bhattacharya A, Bhattacharya S. Differential use of multiple replication origins in the ribosomal DNA episome of the protozoan parasite Entamoeba histolytica. Nucleic Acids Res 2003; 31:2035-44. [PMID: 12682354 PMCID: PMC153748 DOI: 10.1093/nar/gkg320] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The factors that control the initiation of eukaryotic DNA replication from defined origins (oris) on the chromosome remain incompletely resolved. Here we show that the circular rDNA episome of the human pathogen Entamoeba histolytica contains multiple potential oris, which are utilized in a differential manner. The primary ori in exponentially growing cells was mapped close to the promoter of rRNA genes in the upstream intergenic spacer (IGS) by two-dimensional gel electrophoresis. Replication initiated predominantly from the upstream IGS and terminated in the downstream IGS. However, when serum-starved cells were allowed to resume growth, the early oris which became activated were located in other parts of the molecule. Later the ori in the upstream IGS became activated, with concomitant silencing of the early oris. When the upstream IGS was located ectopically in an artificial plasmid, it again lost ori activity, while other parts of the rDNA episome could function as oris in this system. Therefore, the activation or silencing of the ori in this episome is context dependent, as is also the case with many eukaryotic replicons. This is the first replication origin to be mapped in this primitive protozoan and will provide an opportunity to define the factors involved in differential ori activity, and their comparison with metazoans.
Collapse
Affiliation(s)
- Soma Ghosh
- School of Life Sciences, School of Environmental Sciences, Jawaharlal Nehru University, New Delhi-110067, India
| | | | | | | | | |
Collapse
|
44
|
Spencer DH, Kas A, Smith EE, Raymond CK, Sims EH, Hastings M, Burns JL, Kaul R, Olson MV. Whole-genome sequence variation among multiple isolates of Pseudomonas aeruginosa. J Bacteriol 2003; 185:1316-25. [PMID: 12562802 PMCID: PMC142842 DOI: 10.1128/jb.185.4.1316-1325.2003] [Citation(s) in RCA: 143] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Whole-genome shotgun sequencing was used to study the sequence variation of three Pseudomonas aeruginosa isolates, two from clonal infections of cystic fibrosis patients and one from an aquatic environment, relative to the genomic sequence of reference strain PAO1. The majority of the PAO1 genome is represented in these strains; however, at least three prominent islands of PAO1-specific sequence are apparent. Conversely, approximately 10% of the sequencing reads derived from each isolate fail to align with the PAO1 backbone. While average sequence variation among all strains is roughly 0.5%, regions of pronounced differences were evident in whole-genome scans of nucleotide diversity. We analyzed two such divergent loci, the pyoverdine and O-antigen biosynthesis regions, by complete resequencing. A thorough analysis of isolates collected over time from one of the cystic fibrosis patients revealed independent mutations resulting in the loss of O-antigen synthesis alternating with a mucoid phenotype. Overall, we conclude that most of the PAO1 genome represents a core P. aeruginosa backbone sequence while the strains addressed in this study possess additional genetic material that accounts for at least 10% of their genomes. Approximately half of these additional sequences are novel.
Collapse
Affiliation(s)
- David H Spencer
- The University of Washington Genome Center, Department of Medicine, University of Washington. Children's Hospital and Regional Medical Center, Seattle, Washington 98195, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
45
|
Abstract
BACKGROUND When there are no strand-specific biases in mutation and selection rates (that is, in the substitution rates) between the two strands of DNA, the average nucleotide composition is theoretically expected to be A = T and G = C within each strand. Deviations from these equalities are therefore evidence for an asymmetry in selection and/or mutation between the two strands. By focusing on weakly selected regions that could be oriented with respect to replication in 43 out of 51 completely sequenced bacterial chromosomes, we have been able to detect asymmetric directional mutation pressures. RESULTS Most of the 43 chromosomes were found to be relatively enriched in G over C and T over A, and slightly depleted in G+C, in their weakly selected positions (intergenic regions and third codon positions) in the leading strand compared with the lagging strand. Deviations from A = T and G = C were highly correlated between third codon positions and intergenic regions, with a lower degree of deviation in intergenic regions, and were not correlated with overall genomic G+C content. CONCLUSIONS During the course of bacterial chromosome evolution, the effects of asymmetric directional mutation pressures are commonly observed in weakly selected positions. The degree of deviation from equality is highly variable among species, and within species is higher in third codon positions than in intergenic regions. The orientation of these effects is almost universal and is compatible in most cases with the hypothesis of an excess of cytosine deamination in the single-stranded state during DNA replication. However, the variation in G+C content between species is influenced by factors other than asymmetric mutation pressure.
Collapse
Affiliation(s)
- Jean R Lobry
- Laboratoire BBE CNRS UMR 5558, Université Claude Bernard, 43 Bd du 11 Novembre 1918, F-69622 Villeurbanne cedex, France.
| | | |
Collapse
|
46
|
Abstract
The ori locus of the prolate-headed lactococcal bacteriophage c2 supports plasmid replication in Lactococcus lactis in the absence of phage infection. To determine whether phage c2 DNA replication is initiated at the ori locus in vivo and to investigate the mechanism of phage DNA replication, replicating intermediates of phage c2 were analyzed using neutral/neutral two-dimensional agarose gel electrophoresis (2D). The 2D data revealed that c2 replicates via a theta mechanism and localized the initiation of theta replication to the ori region of the c2 genome.
Collapse
Affiliation(s)
- M J Callanan
- Institute of Molecular BioSciences, Massey University, Palmerston North, New Zealand.
| | | | | | | |
Collapse
|
47
|
|
48
|
Abstract
The human genome, as in other eukaryotes, has a wide heterogeneity in the DNA base composition. The evolutionary basis for this heterogeneity has been unknown. A previous study of the human genome (846 genes analyzed) has shown that, in the major range of the G+C content in the third codon position (0.25-0.75), biases from the Parity Rule 2 (PR2) among the synonymous codons of the four-codon amino acids are similar except in the highest G+C range (Sueoka, N., 1999. Translation-coupled violation of Parity Rule 2 in human genes is not the cause of heterogeneity of the DNA G+C content of third codon position. Gene 238, 53-58.). PR2 is an intra-strand rule where A=T and G=C are expected when there are no biases between the two complementary strands of DNA in mutation and selection rates (substitution rates). In this study, 14,026 human genes were analyzed. In addition, the third codon positions of two-codon amino acids were analyzed. New results show the following: (a) The G+C contents of the third codon position of human genes are scattered in the G+C range of 0.22-0.96 in the third codon position. (b) The PR2 biases are similar in the range of 0.25-0.75, whereas, in the high G+C range (0.75-0.96; 13% of the genes), the PR2-bias fingerprints are different from those of the major range. (c) Unlike the PR2 biases, the G+C contents of the third codon position for both four-codon and two-codon amino acids are all correlated almost perfectly with the G+C content of the third codon position over the total G+C ranges. These results support the notion that the directional mutation pressure, rather than the directional selection pressure, is mainly responsible for the heterogeneity of the G+C content of the third codon position.
Collapse
Affiliation(s)
- N Sueoka
- University of Colorado, Department of Molecular, Cellular, and Developmental Biology, Boulder, CO 80309-0347, USA.
| | | |
Collapse
|
49
|
Beletskii A, Grigoriev A, Joyce S, Bhagwat AS. Mutations induced by bacteriophage T7 RNA polymerase and their effects on the composition of the T7 genome. J Mol Biol 2000; 300:1057-65. [PMID: 10903854 DOI: 10.1006/jmbi.2000.3944] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
We show here that transcription by the bacteriophage T7 RNA polymerase increases the deamination of cytosine bases in the non-transcribed strand to uracil, causing C to T mutations in that strand. Under optimal conditions, the mutation frequency increases about fivefold over background, and is similar to that seen with the Escherichia coli RNA polymerase. Further, we found that a mutant T7 RNA polymerase with a slower rate of elongation caused more cytosine deaminations than its wild-type parent. These results suggest that promoting cytosine deamination in the non-transcribed strand is a general property of transcription in E. coli and is dependent on the length of time the transcription bubble stays open during elongation. To see if transcription-induced mutations have influenced the evolution of bacteriophage T7, we analyzed its genome for a bias in base composition. Our analysis showed a significant excess of thymine over cytosine bases in the highly transcribed regions of the genome. Moreover, the average value of this bias correlated well with the levels of transcription of different genomic regions. Our results indicate that transcription-induced mutations have altered the composition of bacteriophage T7 genome and suggest that this may be a significant force in genome evolution.
Collapse
Affiliation(s)
- A Beletskii
- Department of Chemistry, Wayne State University, Detroit, MI 48202, USA
| | | | | | | |
Collapse
|
50
|
Gierlik A, Kowalczuk M, Mackiewicz P, Dudek MR, Cebrat S. Is there replication-associated mutational pressure in the Saccharomyces cerevisiae genome? J Theor Biol 2000; 202:305-14. [PMID: 10666362 DOI: 10.1006/jtbi.1999.1062] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Compositional bias of yeast chromosomes was analysed using detrended DNA walks. Unlike eubacterial chromosomes, the yeast chromosomes did not show the specific asymmetry correlated with origin and terminus of replication. It is probably a result of a relative excess of autonomously replicating sequences (ARS) and of random choice of these sequences in each replication cycle. Nevertheless, the last ARS from both ends of chromosomes are responsible for unidirectional replication of subtelomeric sequences with pre-established leading/lagging roles of DNA strands. In these sequences a specific asymmetry is observed, resembling the asymmetry introduced by replication-associated mutational pressure into eubacterial chromosomes.
Collapse
Affiliation(s)
- A Gierlik
- Institute of Microbiology, Wroclaw University, ul. Przybyszewskiego 63/77, Wroclaw, 54-148, Poland
| | | | | | | | | |
Collapse
|