Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Brown T, Didelot X, Wilson DJ, Maio ND. SimBac: simulation of whole bacterial genomes with homologous recombination. Microb Genom 2018;2. [PMID: 27713837 PMCID: PMC5049688 DOI: 10.1099/mgen.0.000044] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open

For:	Brown T, Didelot X, Wilson DJ, Maio ND. SimBac: simulation of whole bacterial genomes with homologous recombination. Microb Genom 2018;2. [PMID: 27713837 PMCID: PMC5049688 DOI: 10.1099/mgen.0.000044] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open

Number

Cited by Other Article(s)

Narechania A, Bobo D, DeSalle R, Mathema B, Kreiswirth B, Planet PJ. What Do We Gain When Tolerating Loss? The Information Bottleneck Wrings Out Recombination. Mol Biol Evol 2025;42:msaf029. [PMID: 39899343 PMCID: PMC11890988 DOI: 10.1093/molbev/msaf029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 12/03/2024] [Accepted: 01/14/2025] [Indexed: 02/04/2025] Open

Wittouck S, Eilers T, van Noort V, Lebeer S. SCARAP: scalable cross-species comparative genomics of prokaryotes. Bioinformatics 2024;41:btae735. [PMID: 39661475 PMCID: PMC11681940 DOI: 10.1093/bioinformatics/btae735] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Revised: 10/31/2024] [Accepted: 12/10/2024] [Indexed: 12/13/2024] Open

Moradigaravand D, Li L, Dechesne A, Nesme J, de la Cruz R, Ahmad H, Banzhaf M, Sørensen SJ, Smets BF, Kreft JU. Plasmid permissiveness of wastewater microbiomes can be predicted from 16S rRNA sequences by machine learning. Bioinformatics 2023;39:btad400. [PMID: 37348862 PMCID: PMC10318386 DOI: 10.1093/bioinformatics/btad400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2022] [Revised: 06/13/2023] [Accepted: 06/21/2023] [Indexed: 06/24/2023] Open

Affiliation(s)

Danesh Moradigaravand Laboratory of Infectious Disease Epidemiology, KAUST Smart-Health Initiative and Biological and Environmental Science and Engineering (BESE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
Liguan Li Department of Environmental Engineering, Technical University of Denmark, 2800 Kgs Lyngby, Denmark Department of Civil Engineering, The University of Hong Kong, Hong Kong, China
Arnaud Dechesne Department of Environmental Engineering, Technical University of Denmark, 2800 Kgs Lyngby, Denmark
Joseph Nesme Department of Biology, University of Copenhagen, 2100 Copenhagen, Denmark
Roberto de la Cruz Center for Computational Biology, University of Birmingham, Birmingham, B15 2TT, United Kingdom Institute of Microbiology and Infection, University of Birmingham, Birmingham, B15 2TT, United Kingdom School of Biosciences, University of Birmingham, Birmingham, B15 2TT, United Kingdom
Huda Ahmad Laboratory of Infectious Disease Epidemiology, KAUST Smart-Health Initiative and Biological and Environmental Science and Engineering (BESE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia Center for Computational Biology, University of Birmingham, Birmingham, B15 2TT, United Kingdom
Manuel Banzhaf Institute of Microbiology and Infection, University of Birmingham, Birmingham, B15 2TT, United Kingdom School of Biosciences, University of Birmingham, Birmingham, B15 2TT, United Kingdom
Søren J Sørensen Department of Biology, University of Copenhagen, 2100 Copenhagen, Denmark
Barth F Smets Department of Environmental Engineering, Technical University of Denmark, 2800 Kgs Lyngby, Denmark
Jan-Ulrich Kreft Center for Computational Biology, University of Birmingham, Birmingham, B15 2TT, United Kingdom Institute of Microbiology and Infection, University of Birmingham, Birmingham, B15 2TT, United Kingdom School of Biosciences, University of Birmingham, Birmingham, B15 2TT, United Kingdom

Collapse

Shikov AE, Malovichko YV, Nizhnikov AA, Antonets KS. Current Methods for Recombination Detection in Bacteria. Int J Mol Sci 2022;23:ijms23116257. [PMID: 35682936 PMCID: PMC9181119 DOI: 10.3390/ijms23116257] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 05/30/2022] [Accepted: 05/30/2022] [Indexed: 02/05/2023] Open

De Maio N, Boulton W, Weilguny L, Walker CR, Turakhia Y, Corbett-Detig R, Goldman N. phastSim: Efficient simulation of sequence evolution for pandemic-scale datasets. PLoS Comput Biol 2022;18:e1010056. [PMID: 35486906 PMCID: PMC9094560 DOI: 10.1371/journal.pcbi.1010056] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 05/11/2022] [Accepted: 03/25/2022] [Indexed: 11/26/2022] Open

De Maio N, Boulton W, Weilguny L, Walker CR, Turakhia Y, Corbett-Detig R, Goldman N. phastSim: efficient simulation of sequence evolution for pandemic-scale datasets. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021:2021.03.15.435416. [PMID: 33758852 PMCID: PMC7987011 DOI: 10.1101/2021.03.15.435416] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]

Didelot X. Phylogenetic Methods for Genome-Wide Association Studies in Bacteria. Methods Mol Biol 2021;2242:205-220. [PMID: 33961226 DOI: 10.1007/978-1-0716-1099-2_13] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]

Zhou Z, Charlesworth J, Achtman M. Accurate reconstruction of bacterial pan- and core genomes with PEPPAN. Genome Res 2020;30:1667-1679. [PMID: 33055096 PMCID: PMC7605250 DOI: 10.1101/gr.260828.120] [Citation(s) in RCA: 62] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Accepted: 09/01/2020] [Indexed: 12/22/2022]

Bobay LM. CoreSimul: a forward-in-time simulator of genome evolution for prokaryotes modeling homologous recombination. BMC Bioinformatics 2020;21:264. [PMID: 32580695 PMCID: PMC7315543 DOI: 10.1186/s12859-020-03619-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Accepted: 06/19/2020] [Indexed: 12/26/2022] Open

Saber MM, Shapiro BJ. Benchmarking bacterial genome-wide association study methods using simulated genomes and phenotypes. Microb Genom 2020;6:e000337. [PMID: 32100713 PMCID: PMC7200059 DOI: 10.1099/mgen.0.000337] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Accepted: 01/23/2020] [Indexed: 11/18/2022] Open

Abstract

Genome-wide association studies (GWASs) have the potential to reveal the genetics of microbial phenotypes such as antibiotic resistance and virulence. Capitalizing on the growing wealth of bacterial sequence data, microbial GWAS methods aim to identify causal genetic variants while ignoring spurious associations. Bacteria reproduce clonally, leading to strong population structure and genome-wide linkage, making it challenging to separate true 'hits' (i.e. mutations that cause a phenotype) from non-causal linked mutations. GWAS methods attempt to correct for population structure in different ways, but their performance has not yet been systematically and comprehensively evaluated under a range of evolutionary scenarios. Here, we developed a bacterial GWAS simulator (BacGWASim) to generate bacterial genomes with varying rates of mutation, recombination and other evolutionary parameters, along with a subset of causal mutations underlying a phenotype of interest. We assessed the performance (recall and precision) of three widely used single-locus GWAS approaches (cluster-based, dimensionality-reduction and linear mixed models, implemented in plink, pyseer and gemma) and one relatively new multi-locus model implemented in pyseer, across a range of simulated sample sizes, recombination rates and causal mutation effect sizes. As expected, all methods performed better with larger sample sizes and effect sizes. The performance of clustering and dimensionality reduction approaches to correct for population structure were considerably variable according to the choice of parameters. Notably, the multi-locus elastic net (lasso) approach was consistently amongst the highest-performing methods, and had the highest power in detecting causal variants with both low and high effect sizes. Most methods reached the level of good performance (recall >0.75) for identifying causal mutations of strong effect size [log odds ratio (OR) ≥2] with a sample size of 2000 genomes. However, only elastic nets reached the level of reasonable performance (recall=0.35) for detecting markers with weaker effects (log OR ~1) in smaller samples. Elastic nets also showed superior precision and recall in controlling for genome-wide linkage, relative to single-locus models. However, all methods performed relatively poorly on highly clonal (low-recombining) genomes, suggesting room for improvement in method development. These findings show the potential for multi-locus models to improve bacterial GWAS performance. BacGWASim code and simulated data are publicly available to enable further comparisons and benchmarking of new methods.

Collapse

Ferrés I, Fresia P, Iraola G. simurg: simulate bacterial pangenomes in R. Bioinformatics 2020;36:1273-1274. [PMID: 31584605 DOI: 10.1093/bioinformatics/btz735] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 08/06/2019] [Accepted: 09/25/2019] [Indexed: 11/13/2022] Open

Sipola A, Marttinen P, Corander J. Bacmeta: simulator for genomic evolution in bacterial metapopulations. Bioinformatics 2019;34:2308-2310. [PMID: 29474733 DOI: 10.1093/bioinformatics/bty093] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2017] [Accepted: 02/20/2018] [Indexed: 12/25/2022] Open

Bonnici V, Giugno R, Manca V. PanDelos: a dictionary-based method for pan-genome content discovery. BMC Bioinformatics 2018;19:437. [PMID: 30497358 PMCID: PMC6266927 DOI: 10.1186/s12859-018-2417-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open

Abstract

Background

Pan-genome approaches afford the discovery of homology relations in a set of genomes, by determining how some gene families are distributed among a given set of genomes. The retrieval of a complete gene distribution among a class of genomes is an NP-hard problem because computational costs increase with the number of analyzed genomes, in fact, all-against-all gene comparisons are required to completely solve the problem. In presence of phylogenetically distant genomes, due to the variability introduced in gene duplication and transmission, the task of recognizing homologous genes becomes even more difficult. A challenge on this field is that of designing fast and adaptive similarity measures in order to find a suitable pan-genome structure of homology relations.

Results

We present PanDelos, a stand alone tool for the discovery of pan-genome contents among phylogenetic distant genomes. The methodology is based on information theory and network analysis. It is parameter-free because thresholds are automatically deduced from the context. PanDelos avoids sequence alignment by introducing a measure based on k-mer multiplicity. The k-mer length is defined according to general arguments rather than empirical considerations. Homology candidate relations are integrated into a global network and groups of homologous genes are extracted by applying a community detection algorithm.

Conclusions

PanDelos outperforms existing approaches, Roary and EDGAR, in terms of running times and quality content discovery. Tests were run on collections of real genomes, previously used in analogous studies, and in synthetic benchmarks that represent fully trusted golden truth. The software is available at https://github.com/GiugnoLab/PanDelos.

Electronic supplementary material

The online version of this article (10.1186/s12859-018-2417-6) contains supplementary material, which is available to authorized users.

Collapse

Zhou Z, Alikhan NF, Sergeant MJ, Luhmann N, Vaz C, Francisco AP, Carriço JA, Achtman M. GrapeTree: visualization of core genomic relationships among 100,000 bacterial pathogens. Genome Res 2018;28:1395-1404. [PMID: 30049790 PMCID: PMC6120633 DOI: 10.1101/gr.232397.117] [Citation(s) in RCA: 610] [Impact Index Per Article: 87.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2017] [Accepted: 07/24/2018] [Indexed: 11/24/2022]

Akita T, Takuno S, Innan H. Coalescent framework for prokaryotes undergoing interspecific homologous recombination. Heredity (Edinb) 2018;120:474-484. [PMID: 29358726 PMCID: PMC5889408 DOI: 10.1038/s41437-017-0034-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Revised: 10/04/2017] [Accepted: 10/23/2017] [Indexed: 12/11/2022] Open

De Maio N, Worby CJ, Wilson DJ, Stoesser N. Bayesian reconstruction of transmission within outbreaks using genomic variants. PLoS Comput Biol 2018;14:e1006117. [PMID: 29668677 PMCID: PMC5927459 DOI: 10.1371/journal.pcbi.1006117] [Citation(s) in RCA: 52] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2017] [Revised: 04/30/2018] [Accepted: 04/03/2018] [Indexed: 01/19/2023] Open

Abstract

Pathogen genome sequencing can reveal details of transmission histories and is a powerful tool in the fight against infectious disease. In particular, within-host pathogen genomic variants identified through heterozygous nucleotide base calls are a potential source of information to identify linked cases and infer direction and time of transmission. However, using such data effectively to model disease transmission presents a number of challenges, including differentiating genuine variants from those observed due to sequencing error, as well as the specification of a realistic model for within-host pathogen population dynamics. Here we propose a new Bayesian approach to transmission inference, BadTrIP (BAyesian epiDemiological TRansmission Inference from Polymorphisms), that explicitly models evolution of pathogen populations in an outbreak, transmission (including transmission bottlenecks), and sequencing error. BadTrIP enables the inference of host-to-host transmission from pathogen sequencing data and epidemiological data. By assuming that genomic variants are unlinked, our method does not require the computationally intensive and unreliable reconstruction of individual haplotypes. Using simulations we show that BadTrIP is robust in most scenarios and can accurately infer transmission events by efficiently combining information from genetic and epidemiological sources; thanks to its realistic model of pathogen evolution and the inclusion of epidemiological data, BadTrIP is also more accurate than existing approaches. BadTrIP is distributed as an open source package (https://bitbucket.org/nicofmay/badtrip) for the phylogenetic software BEAST2. We apply our method to reconstruct transmission history at the early stages of the 2014 Ebola outbreak, showcasing the power of within-host genomic variants to reconstruct transmission events.

We present a new tool to reconstruct transmission events within outbreaks. Our approach makes use of pathogen genetic information, notably genetic variants at low frequency within host that are usually discarded, and combines it with epidemiological information of host exposure to infection. This leads to accurate reconstruction of transmission even in cases where abundant within-host pathogen genetic variation and weak transmission bottlenecks (multiple pathogen units colonising a new host at transmission) would otherwise make inference difficult due to the transmission history differing from the pathogen evolution history inferred from pathogen isolets. Also, the use of within-host pathogen genomic variants increases the resolution of the reconstruction of the transmission tree even in scenarios with limited within-outbreak pathogen genetic diversity: within-host pathogen populations that appear identical at the level of consensus sequences can be discriminated using within-host variants. Our Bayesian approach provides a measure of the confidence in different possible transmission histories, and is published as open source software. We show with simulations and with an analysis of the beginning of the 2014 Ebola outbreak that our approach is applicable in many scenarios, improves our understanding of transmission dynamics, and will contribute to finding and limiting sources and routes of transmission, and therefore preventing the spread of infectious disease.

Collapse

Yu X, Reva ON. SWPhylo - A Novel Tool for Phylogenomic Inferences by Comparison of Oligonucleotide Patterns and Integration of Genome-Based and Gene-Based Phylogenetic Trees. Evol Bioinform Online 2018;14:1176934318759299. [PMID: 29511354 PMCID: PMC5826093 DOI: 10.1177/1176934318759299] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2017] [Accepted: 01/24/2018] [Indexed: 11/17/2022] Open

Mortimer TD, Annis DS, O’Neill MB, Bohr LL, Smith TM, Poinar HN, Mosher DF, Pepperell CS. Adaptation in a Fibronectin Binding Autolysin of Staphylococcus saprophyticus. mSphere 2017;2:e00511-17. [PMID: 29202045 PMCID: PMC5705806 DOI: 10.1128/msphere.00511-17] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2017] [Accepted: 11/13/2017] [Indexed: 12/18/2022] Open

Abstract

Human-pathogenic bacteria are found in a variety of niches, including free-living, zoonotic, and microbiome environments. Identifying bacterial adaptations that enable invasive disease is an important means of gaining insight into the molecular basis of pathogenesis and understanding pathogen emergence. Staphylococcus saprophyticus, a leading cause of urinary tract infections, can be found in the environment, food, animals, and the human microbiome. We identified a selective sweep in the gene encoding the Aas adhesin, a key virulence factor that binds host fibronectin. We hypothesize that the mutation under selection (aas_2206A>C) facilitates colonization of the urinary tract, an environment where bacteria are subject to strong shearing forces. The mutation appears to have enabled emergence and expansion of a human-pathogenic lineage of S. saprophyticus. These results demonstrate the power of evolutionary genomic approaches in discovering the genetic basis of virulence and emphasize the pleiotropy and adaptability of bacteria occupying diverse niches. IMPORTANCEStaphylococcus saprophyticus is an important cause of urinary tract infections (UTI) in women; such UTI are common, can be severe, and are associated with significant impacts to public health. In addition to being a cause of human UTI, S. saprophyticus can be found in the environment, in food, and associated with animals. After discovering that UTI strains of S. saprophyticus are for the most part closely related to each other, we sought to determine whether these strains are specially adapted to cause disease in humans. We found evidence suggesting that a mutation in the gene aas is advantageous in the context of human infection. We hypothesize that the mutation allows S. saprophyticus to survive better in the human urinary tract. These results show how bacteria found in the environment can evolve to cause disease.

Collapse

Affiliation(s)

Tatum D. Mortimer Department of Medical Microbiology and Immunology, School of Medicine and Public Health, University of Wisconsin—Madison, Madison, Wisconsin, USA Microbiology Doctoral Training Program, University of Wisconsin—Madison, Madison, Wisconsin, USA
Douglas S. Annis Department of Biomolecular Chemistry, School of Medicine and Public Health, University of Wisconsin—Madison, Madison, Wisconsin, USA
Mary B. O’Neill Department of Medical Microbiology and Immunology, School of Medicine and Public Health, University of Wisconsin—Madison, Madison, Wisconsin, USA Laboratory of Genetics, University of Wisconsin—Madison, Madison, Wisconsin, USA
Lindsey L. Bohr Department of Medical Microbiology and Immunology, School of Medicine and Public Health, University of Wisconsin—Madison, Madison, Wisconsin, USA Microbiology Doctoral Training Program, University of Wisconsin—Madison, Madison, Wisconsin, USA
Tracy M. Smith Department of Medical Microbiology and Immunology, School of Medicine and Public Health, University of Wisconsin—Madison, Madison, Wisconsin, USA Department of Medicine, Division of Infectious Diseases, School of Medicine and Public Health, University of Wisconsin—Madison, Madison, Wisconsin, USA
Hendrik N. Poinar McMaster Ancient DNA Centre, Department of Anthropology, McMaster University, Hamilton, Ontario, Canada Department of Biology, McMaster University, Hamilton, Ontario, Canada Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, Ontario, Canada Humans and the Microbiome Program, Canadian Institute for Advanced Research, Toronto, Ontario, Canada
Deane F. Mosher Department of Biomolecular Chemistry, School of Medicine and Public Health, University of Wisconsin—Madison, Madison, Wisconsin, USA
Caitlin S. Pepperell Department of Medical Microbiology and Immunology, School of Medicine and Public Health, University of Wisconsin—Madison, Madison, Wisconsin, USA Department of Medicine, Division of Infectious Diseases, School of Medicine and Public Health, University of Wisconsin—Madison, Madison, Wisconsin, USA

Collapse

Mostowy R, Croucher NJ, Andam CP, Corander J, Hanage WP, Marttinen P. Efficient Inference of Recent and Ancestral Recombination within Bacterial Populations. Mol Biol Evol 2017;34:1167-1182. [PMID: 28199698 PMCID: PMC5400400 DOI: 10.1093/molbev/msx066] [Citation(s) in RCA: 114] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open

De Maio N, Wilson DJ. The Bacterial Sequential Markov Coalescent. Genetics 2017;206:333-343. [PMID: 28258183 PMCID: PMC5419479 DOI: 10.1534/genetics.116.198796] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2016] [Accepted: 02/14/2017] [Indexed: 11/30/2022] Open

Abstract

Bacteria can exchange and acquire new genetic material from other organisms directly and via the environment. This process, known as bacterial recombination, has a strong impact on the evolution of bacteria, for example, leading to the spread of antibiotic resistance across clades and species, and to the avoidance of clonal interference. Recombination hinders phylogenetic and transmission inference because it creates patterns of substitutions (homoplasies) inconsistent with the hypothesis of a single evolutionary tree. Bacterial recombination is typically modeled as statistically akin to gene conversion in eukaryotes, i.e., using the coalescent with gene conversion (CGC). However, this model can be very computationally demanding as it needs to account for the correlations of evolutionary histories of even distant loci. So, with the increasing popularity of whole genome sequencing, the need has emerged for a faster approach to model and simulate bacterial genome evolution. We present a new model that approximates the coalescent with gene conversion: the bacterial sequential Markov coalescent (BSMC). Our approach is based on a similar idea to the sequential Markov coalescent (SMC)-an approximation of the coalescent with crossover recombination. However, bacterial recombination poses hurdles to a sequential Markov approximation, as it leads to strong correlations and linkage disequilibrium across very distant sites in the genome. Our BSMC overcomes these difficulties, and shows a considerable reduction in computational demand compared to the exact CGC, and very similar patterns in simulated data. We implemented our BSMC model within new simulation software FastSimBac. In addition to the decreased computational demand compared to previous bacterial genome evolution simulators, FastSimBac provides more general options for evolutionary scenarios, allowing population structure with migration, speciation, population size changes, and recombination hotspots. FastSimBac is available from https://bitbucket.org/nicofmay/fastsimbac, and is distributed as open source under the terms of the GNU General Public License. Lastly, we use the BSMC within an Approximate Bayesian Computation (ABC) inference scheme, and suggest that parameters simulated under the exact CGC can correctly be recovered, further showcasing the accuracy of the BSMC. With this ABC we infer recombination rate, mutation rate, and recombination tract length of Bacillus cereus from a whole genome alignment.

Collapse