1
|
Serrano-Solís V, Toscano Soares PE, de Farías ST. Genomic Signatures Among Acanthamoeba polyphaga Entoorganisms Unveil Evidence of Coevolution. J Mol Evol 2018; 87:7-15. [PMID: 30456441 DOI: 10.1007/s00239-018-9877-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2018] [Accepted: 11/09/2018] [Indexed: 11/30/2022]
Abstract
The definition of a genomic signature (GS) is "the total net response to selective pressure". Recent isolation and sequencing of naturally occurring organisms, hereby named entoorganisms, within Acanthamoeba polyphaga, raised the hypothesis of a common genomic signature despite their diverse and unrelated evolutionary origin. Widely accepted and implemented tests for GS detection are oligonucleotide relative frequencies (OnRF) and relative codon usage (RCU) surveys. A common pattern and strong correlations were unveiled from OnRFs among A. polyphaga's Mimivirus and virophage Sputnik. RCU showed a common A-T bias at third codon position. We expanded tests to the amoebal mitochondrial genome and amoeba-resistant bacteria, achieving strikingly coherent results to the aforementioned viral analyses. The GSs in these entoorganisms of diverse evolutionary origin are coevolutionarily conserved within an intracellular environment that provides sanctuary for species of ecological and biomedical relevance.
Collapse
Affiliation(s)
- Víctor Serrano-Solís
- Laboratório de Genética Evolutiva Paulo Leminsk, Departamento de Biologia Molecular, Centro de Ciencias Exatas e da Natureza, Universidade Federal da Paraíba, João Pessoa, Brazil.
| | - Paulo Eduardo Toscano Soares
- Laboratório de Genética Evolutiva Paulo Leminsk, Departamento de Biologia Molecular, Centro de Ciencias Exatas e da Natureza, Universidade Federal da Paraíba, João Pessoa, Brazil
| | - Sávio T de Farías
- Laboratório de Genética Evolutiva Paulo Leminsk, Departamento de Biologia Molecular, Centro de Ciencias Exatas e da Natureza, Universidade Federal da Paraíba, João Pessoa, Brazil
| |
Collapse
|
2
|
Implications of human genome structural heterogeneity: functionally related genes tend to reside in organizationally similar genomic regions. BMC Genomics 2014; 15:252. [PMID: 24684786 PMCID: PMC4234528 DOI: 10.1186/1471-2164-15-252] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2012] [Accepted: 03/21/2014] [Indexed: 01/30/2023] Open
Abstract
Background In an earlier study, we hypothesized that genomic segments with different sequence
organization patterns (OPs) might display functional specificity despite their
similar GC content. Here we tested this hypothesis by dividing the human genome
into 100 kb segments, classifying these segments into five compositional
groups according to GC content, and then characterizing each segment within the
five groups by oligonucleotide counting (k-mer analysis; also referred to as
compositional spectrum analysis, or CSA), to examine the distribution of sequence
OPs in the segments. We performed the CSA on the entire DNA, i.e., its coding and
non-coding parts the latter being much more abundant in the genome than the
former. Results We identified 38 OP-type clusters of segments that differ in their compositional
spectrum (CS) organization. Many of the segments that shared the same OP type were
enriched with genes related to the same biological processes (developmental,
signaling, etc.), components of biochemical complexes, or organelles. Thirteen
OP-type clusters showed significant enrichment in genes connected to specific
gene-ontology terms. Some of these clusters seemed to reflect certain events
during periods of horizontal gene transfer and genome expansion, and subsequent
evolution of genomic regions requiring coordinated regulation. Conclusions There may be a tendency for genes that are involved in the same biological
process, complex or organelle to use the same OP, even at a distance of ~
100 kb from the genes. Although the intergenic DNA is non-coding, the general
pattern of sequence organization (e.g., reflected in over-represented
oligonucleotide “words”) may be important and were protected, to some
extent, in the course of evolution.
Collapse
|
3
|
Norberg P, Bergström M, Hermansson M. Complete nucleotide sequence and analysis of two conjugative broad host range plasmids from a marine microbial biofilm. PLoS One 2014; 9:e92321. [PMID: 24647540 PMCID: PMC3960245 DOI: 10.1371/journal.pone.0092321] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2013] [Accepted: 02/20/2014] [Indexed: 11/26/2022] Open
Abstract
The complete nucleotide sequence of plasmids pMCBF1 and pMCBF6 was determined and analyzed. pMCBF1 and pMCBF6 form a novel clade within the IncP-1 plasmid family designated IncP-1 ς. The plasmids were exogenously isolated earlier from a marine biofilm. pMCBF1 (62 689 base pairs; bp) and pMCBF6 (66 729 bp) have identical backbones, but differ in their mercury resistance transposons. pMCBF1 carries Tn5053 and pMCBF6 carries Tn5058. Both are flanked by 5 bp direct repeats, typical of replicative transposition. Both insertions are in the vicinity of a resolvase gene in the backbone, supporting the idea that both transposons are “res-site hunters” that preferably insert close to and use external resolvase functions. The similarity of the backbones indicates recent insertion of the two transposons and the ongoing dynamics of plasmid evolution in marine biofilms. Both plasmids also carry the insertion sequence ISPst1, albeit without flanking repeats. ISPs1is located in an unusual site within the control region of the plasmid. In contrast to most known IncP-1 plasmids the pMCBF1/pMCBF6 backbone has no insert between the replication initiation gene (trfA) and the vegetative replication origin (oriV). One pMCBF1/pMCBF6 block of about 2.5 kilo bases (kb) has no similarity with known sequences in the databases. Furthermore, insertion of three genes with similarity to the multidrug efflux pump operon mexEF and a gene from the NodT family of the tripartite multi-drug resistance-nodulation-division (RND) system in Pseudomonas aeruginosa was found. They do not seem to confer antibiotic resistance to the hosts of pMCBF1/pMCBF6, but the presence of RND on promiscuous plasmids may have serious implications for the spread of antibiotic multi-resistance.
Collapse
Affiliation(s)
- Peter Norberg
- Department of Infectious Diseases, University of Gothenburg, Göteborg, Sweden
| | - Maria Bergström
- Department of Chemistry and Molecular Biology, Microbiology, University of Gothenburg, Göteborg, Sweden
| | - Malte Hermansson
- Department of Chemistry and Molecular Biology, Microbiology, University of Gothenburg, Göteborg, Sweden
- * E-mail:
| |
Collapse
|
4
|
Patil KR, McHardy AC. Alignment-free genome tree inference by learning group-specific distance metrics. Genome Biol Evol 2013; 5:1470-84. [PMID: 23843191 PMCID: PMC3762195 DOI: 10.1093/gbe/evt105] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Understanding the evolutionary relationships between organisms is vital for their in-depth study. Gene-based methods are often used to infer such relationships, which are not without drawbacks. One can now attempt to use genome-scale information, because of the ever increasing number of genomes available. This opportunity also presents a challenge in terms of computational efficiency. Two fundamentally different methods are often employed for sequence comparisons, namely alignment-based and alignment-free methods. Alignment-free methods rely on the genome signature concept and provide a computationally efficient way that is also applicable to nonhomologous sequences. The genome signature contains evolutionary signal as it is more similar for closely related organisms than for distantly related ones. We used genome-scale sequence information to infer taxonomic distances between organisms without additional information such as gene annotations. We propose a method to improve genome tree inference by learning specific distance metrics over the genome signature for groups of organisms with similar phylogenetic, genomic, or ecological properties. Specifically, our method learns a Mahalanobis metric for a set of genomes and a reference taxonomy to guide the learning process. By applying this method to more than a thousand prokaryotic genomes, we showed that, indeed, better distance metrics could be learned for most of the 18 groups of organisms tested here. Once a group-specific metric is available, it can be used to estimate the taxonomic distances for other sequenced organisms from the group. This study also presents a large scale comparison between 10 methods--9 alignment-free and 1 alignment-based.
Collapse
Affiliation(s)
- Kaustubh R Patil
- Max-Planck Research Group for Computational Genomics and Epidemiology, Max-Planck Institute for Informatics, Saarbrücken, Germany.
| | | |
Collapse
|
5
|
Dutta C, Paul S. Microbial lifestyle and genome signatures. Curr Genomics 2012; 13:153-62. [PMID: 23024607 PMCID: PMC3308326 DOI: 10.2174/138920212799860698] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2011] [Revised: 09/13/2011] [Accepted: 09/28/2011] [Indexed: 12/29/2022] Open
Abstract
Microbes are known for their unique ability to adapt to varying lifestyle and environment, even to the extreme or adverse ones. The genomic architecture of a microbe may bear the signatures not only of its phylogenetic position, but also of the kind of lifestyle to which it is adapted. The present review aims to provide an account of the specific genome signatures observed in microbes acclimatized to distinct lifestyles or ecological niches. Niche-specific signatures identified at different levels of microbial genome organization like base composition, GC-skew, purine-pyrimidine ratio, dinucleotide abundance, codon bias, oligonucleotide composition etc. have been discussed. Among the specific cases highlighted in the review are the phenomena of genome shrinkage in obligatory host-restricted microbes, genome expansion in strictly intra-amoebal pathogens, strand-specific codon usage in intracellular species, acquisition of genome islands in pathogenic or symbiotic organisms, discriminatory genomic traits of marine microbes with distinct trophic strategies, and conspicuous sequence features of certain extremophiles like those adapted to high temperature or high salinity.
Collapse
Affiliation(s)
- Chitra Dutta
- Structural Biology & Bioinformatics Division, CSIR- Indian Institute of Chemical Biology, 4, Raja S. C. Mullick Road, Kolkata 700032, India
| | | |
Collapse
|
6
|
Frenkel S, Kirzhner V, Korol A. Organizational heterogeneity of vertebrate genomes. PLoS One 2012; 7:e32076. [PMID: 22384143 PMCID: PMC3288070 DOI: 10.1371/journal.pone.0032076] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2011] [Accepted: 01/23/2012] [Indexed: 01/06/2023] Open
Abstract
Genomes of higher eukaryotes are mosaics of segments with various structural, functional, and evolutionary properties. The availability of whole-genome sequences allows the investigation of their structure as "texts" using different statistical and computational methods. One such method, referred to as Compositional Spectra (CS) analysis, is based on scoring the occurrences of fixed-length oligonucleotides (k-mers) in the target DNA sequence. CS analysis allows generating species- or region-specific characteristics of the genome, regardless of their length and the presence of coding DNA. In this study, we consider the heterogeneity of vertebrate genomes as a joint effect of regional variation in sequence organization superimposed on the differences in nucleotide composition. We estimated compositional and organizational heterogeneity of genome and chromosome sequences separately and found that both heterogeneity types vary widely among genomes as well as among chromosomes in all investigated taxonomic groups. The high correspondence of heterogeneity scores obtained on three genome fractions, coding, repetitive, and the remaining part of the noncoding DNA (the genome dark matter--GDM) allows the assumption that CS-heterogeneity may have functional relevance to genome regulation. Of special interest for such interpretation is the fact that natural GDM sequences display the highest deviation from the corresponding reshuffled sequences.
Collapse
Affiliation(s)
| | | | - Abraham Korol
- Department of Evolutionary and Environmental Biology and Institute of Evolution, University of Haifa, Mount Carmel, Haifa, Israel
| |
Collapse
|
7
|
Norberg P, Bergström M, Jethava V, Dubhashi D, Hermansson M. The IncP-1 plasmid backbone adapts to different host bacterial species and evolves through homologous recombination. Nat Commun 2011; 2:268. [PMID: 21468020 PMCID: PMC3104523 DOI: 10.1038/ncomms1267] [Citation(s) in RCA: 92] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2011] [Accepted: 03/08/2011] [Indexed: 01/24/2023] Open
Abstract
Plasmids are important members of the bacterial mobile gene pool, and are among the most important contributors to horizontal gene transfer between bacteria. They typically harbour a wide spectrum of host beneficial traits, such as antibiotic resistance, inserted into their backbones. Although these inserted elements have drawn considerable interest, evolutionary information about the plasmid backbones, which encode plasmid related traits, is sparse. Here we analyse 25 complete backbone genomes from the broad-host-range IncP-1 plasmid family. Phylogenetic analysis reveals seven clades, in which two plasmids that we isolated from a marine biofilm represent a novel clade. We also found that homologous recombination is a prominent feature of the plasmid backbone evolution. Analysis of genomic signatures indicates that the plasmids have adapted to different host bacterial species. Globally circulating IncP-1 plasmids hence contain mosaic structures of segments derived from several parental plasmids that have evolved in, and adapted to, different, phylogenetically very distant host bacterial species. Plasmids are present in many bacteria and are often transferred between different species causing horizontal gene transfer. By comparing the sequences of 25 plasmid DNA backbones, the authors show that homologous recombination is prevalent in plasmids and that the plasmids have adapted to persist in different host bacteria.
Collapse
Affiliation(s)
- Peter Norberg
- Department of Cell and Molecular Biology, Microbiology, University of Gothenburg, Box 462, SE 413 46, Gothenburg, Sweden.
| | | | | | | | | |
Collapse
|
8
|
Schliep K, Lopez P, Lapointe FJ, Bapteste E. Harvesting evolutionary signals in a forest of prokaryotic gene trees. Mol Biol Evol 2010; 28:1393-405. [PMID: 21172835 DOI: 10.1093/molbev/msq323] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Phylogenomic studies produce increasingly large phylogenetic forests of trees with patchy taxonomical sampling. Typically, prokaryotic data generate thousands of gene trees of all sizes that are difficult, if not impossible, to root. Their topologies do not match the genealogy of lineages, as they are influenced not only by duplication, losses, and vertical descent but also by lateral gene transfer (LGT) and recombination. Because this complexity in part reflects the diversity of evolutionary processes, the study of phylogenetic forests is thus a great opportunity to improve our understanding of prokaryotic evolution. Here, we show how the rich evolutionary content of such novel phylogenetic objects can be exploited through the development of new approaches designed specifically for extracting the multiple evolutionary signals present in the forest of life, that is, by slicing up trees into remarkable bits and pieces: clans, slices, and clips. We harvested a forest of 6,901 unrooted gene trees comprising up to 100 prokaryotic genomes (41 archaea and 59 bacteria) to search for evolutionary events that a species tree would not account for. We identified 1) trees and partitions of trees that reflected the lifestyle of organisms rather than their taxonomy, 2) candidate lifestyle-specific genetic modules, used by distinct unrelated organisms to adapt to the same environment, 3) gene families, nonrandomly distributed in the functional space, that were frequently exchanged between archaea and bacteria, sometimes without major changes in their sequences. Finally, 4) we reconstructed polarized networks of genetic partnerships between archaea and bacteria to describe some of the rules affecting LGT between these two Domains.
Collapse
Affiliation(s)
- Klaus Schliep
- UMR CNRS 7138 Systématique, Adaptation, Evolution, Muséum National d'Histoire Naturelle, Paris, France
| | | | | | | |
Collapse
|
9
|
Perry SC, Beiko RG. Distinguishing microbial genome fragments based on their composition: evolutionary and comparative genomic perspectives. Genome Biol Evol 2010; 2:117-31. [PMID: 20333228 PMCID: PMC2839357 DOI: 10.1093/gbe/evq004] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/19/2010] [Indexed: 01/23/2023] Open
Abstract
It is well known that patterns of nucleotide composition vary within and among
genomes, although the reasons why these variations exist are not completely
understood. Between-genome compositional variation has been exploited to assign
environmental shotgun sequences to their most likely originating genomes,
whereas within-genome variation has been used to identify recently acquired
genetic material such as pathogenicity islands. Recent sequence assignment
techniques have achieved high levels of accuracy on artificial data sets, but
the relative difficulty of distinguishing lineages with varying degrees of
relatedness, and different types of genomic sequence, has not been examined in
depth. We investigated the compositional differences in a set of 774 sequenced
microbial genomes, finding rapid divergence among closely related genomes, but
also convergence of compositional patterns among genomes with similar habitats.
Support vector machines were then used to distinguish all pairs of genomes based
on genome fragments 500 nucleotides in length. The nearly 300,000 accuracy
scores obtained from these trials were used to construct general models of
distinguishability versus taxonomic and compositional indices of genomic
divergence. Unusual genome pairs were evident from their large residuals
relative to the fitted model, and we identified several factors including genome
reduction, putative lateral genetic transfer, and habitat convergence that
influence the distinguishability of genomes. The positional, compositional, and
functional context of a fragment within a genome has a strong influence on its
likelihood of correct classification, but in a way that depends on the taxonomic
and ecological similarity of the comparator genome.
Collapse
Affiliation(s)
- Scott C Perry
- Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada
| | | |
Collapse
|
10
|
Bohlin J, Hardy SP, Ussery DW. Stretches of alternating pyrimidine/purines and purines are respectively linked with pathogenicity and growth temperature in prokaryotes. BMC Genomics 2009; 10:346. [PMID: 19646265 PMCID: PMC2728739 DOI: 10.1186/1471-2164-10-346] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2009] [Accepted: 07/31/2009] [Indexed: 02/02/2023] Open
Abstract
Background The genomic fractions of purine (RR) and alternating pyrimidine/purine (YR) stretches of 10 base pairs or more, have been linked to genomic AT content, the formation of different DNA helices, strand-biased gene distribution, DNA structure, and more. Although some of these factors are a consequence of the chemical properties of purines and pyrimidines, a thorough statistical examination of the distributions of YR/RR stretches in sequenced prokaryotic chromosomes has to the best of our knowledge, not been undertaken. The aim of this study is to expand upon previous research by using regression analysis to investigate how AT content, habitat, growth temperature, pathogenicity, phyla, oxygen requirement and halotolerance correlated with the distribution of RR and YR stretches in prokaryotes. Results Our results indicate that RR and YR-stretches are differently distributed in prokaryotic phyla. RR stretches are overrepresented in all phyla except for the Actinobacteria and β-Proteobacteria. In contrast, YR tracts are underrepresented in all phyla except for the β-Proteobacterial group. YR-stretches are associated with phylum, pathogenicity and habitat, whilst RR-tracts are associated with phylum, AT content, oxygen requirement, growth temperature and halotolerance. All associations described were statistically significant with p < 0.001. Conclusion Analysis of chromosomal distributions of RR/YR sequences in prokaryotes reveals a set of associations with environmental factors not observed with mono- and oligonucleotide frequencies. This implies that important information can be found in the distribution of RR/YR stretches that is more difficult to obtain from genomic mono- and oligonucleotide frequencies. The association between pathogenicity and fractions of YR stretches is assumed to be linked to recombination and horizontal transfer.
Collapse
Affiliation(s)
- Jon Bohlin
- Norwegian School of Veterinary Science, Oslo, Norway.
| | | | | |
Collapse
|
11
|
Mrazek J. Phylogenetic Signals in DNA Composition: Limitations and Prospects. Mol Biol Evol 2009; 26:1163-9. [DOI: 10.1093/molbev/msp032] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
|