1
|
Enav H, Paz I, Ley RE. Strain tracking in complex microbiomes using synteny analysis reveals per-species modes of evolution. Nat Biotechnol 2025; 43:773-783. [PMID: 38898177 DOI: 10.1038/s41587-024-02276-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 05/10/2024] [Indexed: 06/21/2024]
Abstract
Microbial species diversify into strains through single-nucleotide mutations and structural changes, such as recombination, insertions and deletions. Most strain-comparison methods quantify differences in single-nucleotide polymorphisms (SNPs) and are insensitive to structural changes. However, recombination is an important driver of phenotypic diversification in many species, including human pathogens. We introduce SynTracker, a tool that compares microbial strains using genome synteny-the order of sequence blocks in homologous genomic regions-in pairs of metagenomic assemblies or genomes. Genome synteny is a rich source of genomic information untapped by current strain-comparison tools. SynTracker has low sensitivity to SNPs, has no database requirement and is robust to sequencing errors. It outperforms existing tools when tracking strains in metagenomic data and is particularly suited for phages, plasmids and other low-data contexts. Applied to single-species datasets and human gut metagenomes, SynTracker, combined with an SNP-based tool, detects strains enriched in either point mutations or structural changes, providing insights into microbial evolution in situ.
Collapse
Affiliation(s)
- Hagay Enav
- Department of Microbiome Science, Max Planck Institute for Biology, Tübingen, Germany
| | - Inbal Paz
- Department of Microbiome Science, Max Planck Institute for Biology, Tübingen, Germany
| | - Ruth E Ley
- Department of Microbiome Science, Max Planck Institute for Biology, Tübingen, Germany.
- Cluster of Excellence EXC 2124: Controlling Microbes to Fight Infections (CMFI), University of Tübingen, Tübingen, Germany.
| |
Collapse
|
2
|
Bryantseva IA, Kyndt JA, Imhoff JF. First genome sequence of a purple sulphur bacterium of the genus Thioalkalicoccus, its characterization as a new isolate of Thioalkalicoccus limnaeus and proposal of strain Um2 as neotype of this species. Int J Syst Evol Microbiol 2025; 75. [PMID: 39887044 DOI: 10.1099/ijsem.0.006657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2025] Open
Abstract
A new alkaliphilic strain of a purple sulphur bacterium designated as Um2 (=KCTC 25734=VKM B-3893=UQM 41073) with bacteriochlorophyll b and internal photosynthetic membranes of tubular type was isolated from the Umhei hydrothermal system (40 °C, pH 9.3 and salinity 0.42 g l-1) located in the Baikal rift zone (Russia). Based on morphological and physiological characteristics, this bacterium was classified as Thioalkalicoccus limnaeus. The 16S rRNA gene sequence similarity of strain Um2 was 96.69% with the type strain of Tac. limnaeus A26T, 95.41% with 'Thioflavicoccus mobilis' 8321T and 95.34% with Thiococcus pfennigii 4250. The level of similarity of the ribulose 1,5-bisphosphate carboxylase sequences of strain Um2 and known strains of Thioalkalicoccus showed that they belong to the same species. Comparison of the genome nt sequences of strain Um2 revealed that the new isolate was remote from all other described Chromatiaceae species both in digital DNA-DNA hybridization (21.5%) and in average nt identity (76.7%) at the genus level. However, a genome nt sequence had not been determined for any of the known Thioalkalicoccus strains; therefore, the first genome sequence of a member of the genus Thioalkalicoccus is presented here. Tac. limnaeus Um2 is proposed as the neotype, as strain A26T has been lost from culture collections.
Collapse
Affiliation(s)
- Irina A Bryantseva
- Winogradsky Institute of Microbiology, Research Center of Biotechnology, Russian Academy of Sciences, 33, bld. 2 Leninsky Ave., Moscow 119071, Russia
| | - John A Kyndt
- College of Science and Technology, Bellevue University, 1000 Galvin Rd., Bellevue 68005, Nebraska, USA
| | - Johannes F Imhoff
- GEOMAR Helmholtz Centre for Ocean Research Kiel, Wischhofstr. 1-3, D-24148 Kiel, Germany
| |
Collapse
|
3
|
Miardan MM, Jamshidpey A, Sankoff D. Escape from Parsimony of a Double-Cut-and-Join Genome Evolution Process. J Comput Biol 2023; 30:118-130. [PMID: 36595359 DOI: 10.1089/cmb.2021.0468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
We analyze models of genome evolution based on both restricted and unrestricted double-cut-and-join (DCJ) operations. Not only do our models allow different types of operations generated by DCJs (including reversals, translocations, transpositions, fissions, and fusions) to take different weights during the course of evolution, but they also let these weights fluctuate over time. We compare the number of operations along the evolutionary trajectory with the DCJ distance of the genome from its ancestor at each step, and determine at what point they diverge: the process escapes from parsimony. Adapting the method developed by Berestycki and Durrett, we approximate the number of cycles in the breakpoint graph of a random genome at time t and its ancestral genome by the number of tree components in a random graph (not necessarily an Erdös-Rényi one) constructed from the model of evolution. In both models, the process on a genome of size n is bound to its parsimonious estimate up to t≈n∕2 steps.
Collapse
Affiliation(s)
| | - Arash Jamshidpey
- Department of Mathematics, Columbia University, New York, New York, USA
| | - David Sankoff
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, Canada
| |
Collapse
|
4
|
|
5
|
Avdeyev P, Alexeev N, Rong Y, Alekseyev MA. A unified ILP framework for core ancestral genome reconstruction problems. Bioinformatics 2020; 36:2993-3003. [PMID: 32058559 DOI: 10.1093/bioinformatics/btaa100] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2018] [Revised: 12/06/2019] [Accepted: 02/07/2020] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION One of the key computational problems in comparative genomics is the reconstruction of genomes of ancestral species based on genomes of extant species. Since most dramatic changes in genomic architectures are caused by genome rearrangements, this problem is often posed as minimization of the number of genome rearrangements between extant and ancestral genomes. The basic case of three given genomes is known as the genome median problem. Whole-genome duplications (WGDs) represent yet another type of dramatic evolutionary events and inspire the reconstruction of preduplicated ancestral genomes, referred to as the genome halving problem. Generalization of WGDs to whole-genome multiplication events leads to the genome aliquoting problem. RESULTS In this study, we propose polynomial-size integer linear programming (ILP) formulations for the aforementioned problems. We further obtain such formulations for the restricted and conserved versions of the median and halving problems, which have been recently introduced to improve biological relevance of the solutions. Extensive evaluation of solutions to the different ILP problems demonstrates their good accuracy. Furthermore, since the ILP formulations for the conserved versions have linear size, they provide a novel practical approach to ancestral genome reconstruction, which combines the advantages of homology- and rearrangements-based methods. AVAILABILITY AND IMPLEMENTATION Code and data are available in https://github.com/AvdeevPavel/ILP-WGD-reconstructor. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pavel Avdeyev
- Department of Mathematics, The George Washington University, Washington, DC 20052, USA
| | - Nikita Alexeev
- Computer Technologies Laboratory, ITMO University, Saint Petersburg, 197101, Russia
| | - Yongwu Rong
- Department of Mathematics, Queens College, City University of New York, Flushing, NY 11367, USA
| | - Max A Alekseyev
- Department of Mathematics, The George Washington University, Washington, DC 20052, USA.,Department of Biostatistics and Bioinformatics, The George Washington University, Washington, DC 20052, USA
| |
Collapse
|
6
|
Simonaitis P, Chateau A, Swenson KM. A general framework for genome rearrangement with biological constraints. Algorithms Mol Biol 2019; 14:15. [PMID: 31360217 PMCID: PMC6642580 DOI: 10.1186/s13015-019-0149-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Accepted: 06/12/2019] [Indexed: 11/25/2022] Open
Abstract
This paper generalizes previous studies on genome rearrangement under biological constraints, using double cut and join (DCJ). We propose a model for weighted DCJ, along with a family of optimization problems called \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\varphi$$\end{document}φ-MCPS (Minimum Cost Parsimonious Scenario), that are based on labeled graphs. We show how to compute solutions to general instances of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\varphi$$\end{document}φ-MCPS, given an algorithm to compute \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\varphi$$\end{document}φ-MCPS on a circular genome with exactly one occurrence of each gene. These general instances can have an arbitrary number of circular and linear chromosomes, and arbitrary gene content. The practicality of the framework is displayed by presenting polynomial-time algorithms that generalize the results of Bulteau, Fertin, and Tannier on the Sorting by wDCJs and indels in intergenes problem, and that generalize previous results on the Minimum Local Parsimonious Scenario problem.
Collapse
|
7
|
Avdeyev P, Jiang S, Alekseyev MA. Implicit Transpositions in DCJ Scenarios. Front Genet 2018; 8:212. [PMID: 29312438 PMCID: PMC5733028 DOI: 10.3389/fgene.2017.00212] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Accepted: 11/29/2017] [Indexed: 11/13/2022] Open
Abstract
Genome rearrangements are large-scale evolutionary events that shuffle genomic architectures. The minimal number of such events between two genomes is often used in phylogenomic studies to measure the evolutionary distance between the genomes. Double-Cut-and-Join (DCJ) operations represent a convenient model of most common genome rearrangements (reversals, translocations, fissions, and fusions), while other genome rearrangements, such as transpositions, can be modeled by pairs of DCJs. Since the DCJ model does not directly account for transpositions, their impact on DCJ scenarios is unclear. In the present work, we study implicit appearance of transpositions (as pairs of DCJs) in DCJ scenarios. We consider shortest DCJ scenarios satisfying the maximum parsimony assumption, as well as more general DCJ scenarios based on some realistic but less restrictive assumptions. In both cases, we derive a uniform lower bound for the rate of implicit transpositions, which depends only on the genomes but not a particular DCJ scenario between them. Our results imply that implicit appearance of transpositions in DCJ scenarios may be unavoidable or even abundant for some pairs of genomes. We estimate that for mammalian genomes implicit transpositions constitute at least 6% of genome rearrangements.
Collapse
Affiliation(s)
- Pavel Avdeyev
- Department of Mathematics and the Computational Biology Institute, George Washington University, Washington, DC, United States
| | - Shuai Jiang
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, United States
| | - Max A Alekseyev
- Department of Mathematics and the Computational Biology Institute, George Washington University, Washington, DC, United States
| |
Collapse
|
8
|
Biller P, Guéguen L, Knibbe C, Tannier E. Breaking Good: Accounting for Fragility of Genomic Regions in Rearrangement Distance Estimation. Genome Biol Evol 2016; 8:1427-39. [PMID: 27190002 PMCID: PMC4898800 DOI: 10.1093/gbe/evw083] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Models of evolution by genome rearrangements are prone to two types of flaws: One is to ignore the diversity of susceptibility to breakage across genomic regions, and the other is to suppose that susceptibility values are given. Without necessarily supposing their precise localization, we call "solid" the regions that are improbably broken by rearrangements and "fragile" the regions outside solid ones. We propose a model of evolution by inversions where breakage probabilities vary across fragile regions and over time. It contains as a particular case the uniform breakage model on the nucleotidic sequence, where breakage probabilities are proportional to fragile region lengths. This is very different from the frequently used pseudouniform model where all fragile regions have the same probability to break. Estimations of rearrangement distances based on the pseudouniform model completely fail on simulations with the truly uniform model. On pairs of amniote genomes, we show that identifying coding genes with solid regions yields incoherent distance estimations, especially with the pseudouniform model, and to a lesser extent with the truly uniform model. This incoherence is solved when we coestimate the number of fragile regions with the rearrangement distance. The estimated number of fragile regions is surprisingly small, suggesting that a minority of regions are recurrently used by rearrangements. Estimations for several pairs of genomes at different divergence times are in agreement with a slowly evolvable colocalization of active genomic regions in the cell.
Collapse
Affiliation(s)
- Priscila Biller
- INRIA Grenoble Rhône-Alpes, Montbonnot, France University of Campinas, São Paulo, Brazil
| | | | - Carole Knibbe
- INRIA Grenoble Rhône-Alpes, Montbonnot, France Université Lyon 1, LIRIS, UMR5205, Villeurbanne, France
| | - Eric Tannier
- INRIA Grenoble Rhône-Alpes, Montbonnot, France Université Lyon 1, LBBE, UMR5558, Villeurbanne, France
| |
Collapse
|