1
|
Horsfield ST, Fok BCT, Fu Y, Turner P, Lees JA, Croucher NJ. Optimizing nanopore adaptive sampling for pneumococcal serotype surveillance in complex samples using the graph-based GNASTy algorithm. Genome Res 2025; 35:1025-1040. [PMID: 40037844 PMCID: PMC12047183 DOI: 10.1101/gr.279435.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Accepted: 01/30/2025] [Indexed: 03/06/2025]
Abstract
Serotype surveillance of Streptococcus pneumoniae (the pneumococcus) is critical for understanding the effectiveness of current vaccination strategies. However, existing methods for serotyping are limited in their ability to identify co-carriage of multiple pneumococci and detect novel serotypes. To develop a scalable and portable serotyping method that overcomes these challenges, we employed nanopore adaptive sampling (NAS), an on-sequencer enrichment method that selects for target DNA in real-time, for direct detection of S. pneumoniae in complex samples. Whereas NAS targeting the whole S. pneumoniae genome was ineffective in the presence of nonpathogenic streptococci, the method was both specific and sensitive when targeting the capsular biosynthetic locus (CBL), the operon that determines S. pneumoniae serotype. NAS significantly improved coverage and yield of the CBL relative to sequencing without NAS and accurately quantified the relative prevalence of serotypes in samples representing co-carriage. To maximize the sensitivity of NAS to detect novel serotypes, we developed and benchmarked a new pangenome-graph algorithm, named GNASTy. We show that GNASTy outperforms the current NAS implementation, which is based on linear genome alignment, when a sample contains a serotype absent from the database of targeted sequences. The methods developed in this work provide an improved approach for novel serotype discovery and routine S. pneumoniae surveillance that is fast, accurate, and feasible in low-resource settings. Although NAS facilitates whole-genome enrichment under ideal circumstances, GNASTy enables targeted enrichment to optimize serotype surveillance in complex samples.
Collapse
Affiliation(s)
- Samuel T Horsfield
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London W12 0BZ, United Kingdom;
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom
| | - Basil C T Fok
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London W12 0BZ, United Kingdom
| | - Yuhan Fu
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London W12 0BZ, United Kingdom
| | - Paul Turner
- Centre for Tropical Medicine and Global Health, University of Oxford, Oxford OX3 7LG, United Kingdom
| | - John A Lees
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London W12 0BZ, United Kingdom
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom
| | - Nicholas J Croucher
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London W12 0BZ, United Kingdom
| |
Collapse
|
2
|
Lee IPA, Eldakar OT, Gogarten JP, Andam CP. Recombination as an enforcement mechanism of prosocial behavior in cooperating bacteria. iScience 2023; 26:107344. [PMID: 37554437 PMCID: PMC10405257 DOI: 10.1016/j.isci.2023.107344] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 04/11/2023] [Accepted: 07/06/2023] [Indexed: 08/10/2023] Open
Abstract
Prosocial behavior is ubiquitous in nature despite the relative fitness costs carried by cooperative individuals. However, the stability of cooperation in populations is fragile and often maintained through enforcement. We propose that homologous recombination provides such a mechanism in bacteria. Using an agent-based model of recombination in bacteria playing a public goods game, we demonstrate how changes in recombination rates affect the proportion of cooperating cells. In our model, recombination converts cells to a different strategy, either freeloading (cheaters) or cooperation, based on the strategies of neighboring cells and recombination rate. Increasing the recombination rate expands the parameter space in which cooperators outcompete freeloaders. However, increasing the recombination rate alone is neither sufficient nor necessary. Intermediate benefits of cooperation, lower population viscosity, and greater population size can promote the evolution of cooperation from within populations of cheaters. Our findings demonstrate how recombination influences the persistence of cooperative behavior in bacteria.
Collapse
Affiliation(s)
- Isaiah Paolo A. Lee
- Department of Molecular, Cellular and Biomedical Sciences, University of New Hampshire, Durham, NH 03824, USA
- National Institute of Molecular Biology and Biotechnology, University of the Philippines–Diliman, Quezon City 1101, Philippines
| | - Omar Tonsi Eldakar
- Department of Biological Sciences, Nova Southeastern University, Fort Lauderdale, FL 33328, USA
| | - J. Peter Gogarten
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269, USA
| | - Cheryl P. Andam
- Department of Biological Sciences, University at Albany, State University of New York, Albany, NY 12222, USA
| |
Collapse
|
3
|
Azarian T, Martinez PP, Arnold BJ, Qiu X, Grant LR, Corander J, Fraser C, Croucher NJ, Hammitt LL, Reid R, Santosham M, Weatherholtz RC, Bentley SD, O’Brien KL, Lipsitch M, Hanage WP. Frequency-dependent selection can forecast evolution in Streptococcus pneumoniae. PLoS Biol 2020; 18:e3000878. [PMID: 33091022 PMCID: PMC7580979 DOI: 10.1371/journal.pbio.3000878] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Accepted: 09/18/2020] [Indexed: 11/30/2022] Open
Abstract
Predicting how pathogen populations will change over time is challenging. Such has been the case with Streptococcus pneumoniae, an important human pathogen, and the pneumococcal conjugate vaccines (PCVs), which target only a fraction of the strains in the population. Here, we use the frequencies of accessory genes to predict changes in the pneumococcal population after vaccination, hypothesizing that these frequencies reflect negative frequency-dependent selection (NFDS) on the gene products. We find that the standardized predicted fitness of a strain, estimated by an NFDS-based model at the time the vaccine is introduced, enables us to predict whether the strain increases or decreases in prevalence following vaccination. Further, we are able to forecast the equilibrium post-vaccine population composition and assess the invasion capacity of emerging lineages. Overall, we provide a method for predicting the impact of an intervention on pneumococcal populations with potential application to other bacterial pathogens in which NFDS is a driving force.
Collapse
Affiliation(s)
- Taj Azarian
- Burnett School of Biomedical Sciences, University of Central Florida, Orlando, Florida, United States of America
- Center for Communicable Disease Dynamics, Department of Epidemiology, T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts, United States of America
| | - Pamela P. Martinez
- Center for Communicable Disease Dynamics, Department of Epidemiology, T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts, United States of America
| | - Brian J. Arnold
- Center for Communicable Disease Dynamics, Department of Epidemiology, T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts, United States of America
| | - Xueting Qiu
- Center for Communicable Disease Dynamics, Department of Epidemiology, T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts, United States of America
| | - Lindsay R. Grant
- Center for American Indian Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America
| | - Jukka Corander
- Helsinki Institute for Information Technology, Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
- Department of Biostatistics, University of Oslo, Oslo, Norway
- Infection Genomics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom
| | - Christophe Fraser
- Big Data Institute, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
| | - Nicholas J. Croucher
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom
| | - Laura L. Hammitt
- Center for American Indian Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America
| | - Raymond Reid
- Center for American Indian Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America
| | - Mathuram Santosham
- Center for American Indian Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America
| | - Robert C. Weatherholtz
- Center for American Indian Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America
| | - Stephen D. Bentley
- Infection Genomics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom
| | | | - Marc Lipsitch
- Center for Communicable Disease Dynamics, Department of Epidemiology, T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts, United States of America
- Department of Immunology and Infectious Diseases, T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts, United States of America
| | - William P. Hanage
- Center for Communicable Disease Dynamics, Department of Epidemiology, T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts, United States of America
| |
Collapse
|
4
|
Gardon H, Biderre-Petit C, Jouan-Dufournel I, Bronner G. A drift-barrier model drives the genomic landscape of a structured bacterial population. Mol Ecol 2020; 29:4143-4156. [PMID: 32920913 DOI: 10.1111/mec.15628] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Revised: 08/26/2020] [Accepted: 08/28/2020] [Indexed: 01/05/2023]
Abstract
Bacterial populations differentiate over time and space to form distinct genetic units. The mechanisms governing this diversification are presumed to result from the ecological context of living units to adapt to specific niches. Recently, a model assuming the acquisition of advantageous genes among populations rather than whole genome sweeps has emerged to explain population differentiation. However, the characteristics of these exchanged, or flexible, genes and whether their evolution is driven by adaptive or neutral processes remain controversial. By analysing the flexible genome of single-amplified genomes of co-occurring populations of the marine Prochlorococcus HLII ecotype, we highlight that genomic compartments - rather than population units - are characterized by different evolutionary trajectories. The dynamics of gene fluxes vary across genomic compartments and therefore the effectiveness of selection depends on the fluctuation of the effective population size along the genome. Taken together, these results support the drift-barrier model of bacterial evolution.
Collapse
Affiliation(s)
- Hélène Gardon
- Laboratoire Microorganismes: Génome et Environnement, Université Clermont Auvergne, CNRS, Clermont-Ferrand, France
| | - Corinne Biderre-Petit
- Laboratoire Microorganismes: Génome et Environnement, Université Clermont Auvergne, CNRS, Clermont-Ferrand, France
| | - Isabelle Jouan-Dufournel
- Laboratoire Microorganismes: Génome et Environnement, Université Clermont Auvergne, CNRS, Clermont-Ferrand, France
| | - Gisèle Bronner
- Laboratoire Microorganismes: Génome et Environnement, Université Clermont Auvergne, CNRS, Clermont-Ferrand, France
| |
Collapse
|
5
|
Iranzo J, Wolf YI, Koonin EV, Sela I. Gene gain and loss push prokaryotes beyond the homologous recombination barrier and accelerate genome sequence divergence. Nat Commun 2019; 10:5376. [PMID: 31772262 PMCID: PMC6879757 DOI: 10.1038/s41467-019-13429-2] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Accepted: 11/07/2019] [Indexed: 02/05/2023] Open
Abstract
Bacterial and archaeal evolution involve extensive gene gain and loss. Thus, phylogenetic trees of prokaryotes can be constructed both by traditional sequence-based methods (gene trees) and by comparison of gene compositions (genome trees). Comparing the branch lengths in gene and genome trees with identical topologies for 34 clusters of closely related bacterial and archaeal genomes, we show here that terminal branches of gene trees are systematically compressed compared to those of genome trees. Thus, sequence evolution is delayed compared to genome evolution by gene gain and loss. The extent of this delay differs widely among bacteria and archaea. Mathematical modeling shows that the divergence delay can result from sequence homogenization by homologous recombination. The model explains how homologous recombination maintains the cohesiveness of the core genome of a species while allowing extensive gene gain and loss within the accessory genome. Once evolving genomes become isolated by barriers impeding homologous recombination, gene and genome evolution processes settle into parallel trajectories, and genomes diverge, resulting in speciation. A significant proportion of the molecular evolution of bacteria and archaea occurs through gene gain and loss. Here Iranzo et al. develop a mathematical model that explains observed differential patterns of sequence evolution vs. gene content evolution as a consequence of homologous recombination.
Collapse
Affiliation(s)
- Jaime Iranzo
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.,Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM)-Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, Pozuelo de Alarcón, 28223, Madrid, Spain
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.
| | - Itamar Sela
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.
| |
Collapse
|
6
|
Sipola A, Marttinen P, Corander J. Bacmeta: simulator for genomic evolution in bacterial metapopulations. Bioinformatics 2019; 34:2308-2310. [PMID: 29474733 DOI: 10.1093/bioinformatics/bty093] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2017] [Accepted: 02/20/2018] [Indexed: 12/25/2022] Open
Abstract
Summary The advent of genomic data from densely sampled bacterial populations has created a need for flexible simulators by which models and hypotheses can be efficiently investigated in the light of empirical observations. Bacmeta provides fast stochastic simulation of neutral evolution within a large collection of interconnected bacterial populations with completely adjustable connectivity network. Stochastic events of mutations, recombinations, insertions/deletions, migrations and micro-epidemics can be simulated in discrete non-overlapping generations with a Wright-Fisher model that operates on explicit sequence data of any desired genome length. Each model component, including locus, bacterial strain, population and ultimately the whole metapopulation, is efficiently simulated using C++ objects and detailed metadata from each level can be acquired. The software can be executed in a cluster environment using simple textual input files, enabling, e.g. large-scale simulations and likelihood-free inference. Availability and implementation Bacmeta is implemented with C++ for Linux, Mac and Windows. It is available at https://bitbucket.org/aleksisipola/bacmeta under the BSD 3-clause license. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Aleksi Sipola
- Department of Mathematics and Statistics, University of Helsinki, Finland.,Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Finland
| | - Pekka Marttinen
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Finland
| | - Jukka Corander
- Department of Mathematics and Statistics, University of Helsinki, Finland.,Department of Biostatistics, University of Oslo, Norway
| |
Collapse
|
7
|
Tomczak JM, Węglarz‐Tomczak E. Estimating kinetic constants in the Michaelis–Menten model from one enzymatic assay using Approximate Bayesian Computation. FEBS Lett 2019; 593:2742-2750. [DOI: 10.1002/1873-3468.13531] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2019] [Revised: 06/05/2019] [Accepted: 06/27/2019] [Indexed: 01/04/2023]
Affiliation(s)
- Jakub M. Tomczak
- Institute of Informatics, Faculty of Science University of Amsterdam The Netherlands
| | - Ewelina Węglarz‐Tomczak
- Swammerdam Institute for Life Sciences, Faculty of Science University of Amsterdam The Netherlands
| |
Collapse
|
8
|
Ding W, Baumdicker F, Neher RA. panX: pan-genome analysis and exploration. Nucleic Acids Res 2019; 46:e5. [PMID: 29077859 PMCID: PMC5758898 DOI: 10.1093/nar/gkx977] [Citation(s) in RCA: 167] [Impact Index Per Article: 27.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2016] [Accepted: 10/10/2017] [Indexed: 11/24/2022] Open
Abstract
Horizontal transfer, gene loss, and duplication result in dynamic bacterial genomes shaped by a complex mixture of different modes of evolution. Closely related strains can differ in the presence or absence of many genes, and the total number of distinct genes found in a set of related isolates—the pan-genome—is often many times larger than the genome of individual isolates. We have developed a pipeline that efficiently identifies orthologous gene clusters in the pan-genome. This pipeline is coupled to a powerful yet easy-to-use web-based visualization for interactive exploration of the pan-genome. The visualization consists of connected components that allow rapid filtering and searching of genes and inspection of their evolutionary history. For each gene cluster, panX displays an alignment, a phylogenetic tree, maps mutations within that cluster to the branches of the tree and infers gain and loss of genes on the core-genome phylogeny. PanX is available at pangenome.de. Custom pan-genomes can be visualized either using a web server or by serving panX locally as a browser-based application.
Collapse
Affiliation(s)
- Wei Ding
- Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| | - Franz Baumdicker
- Mathematisches Institut, Albert-Ludwigs University of Freiburg, 79104 Freiburg, Germany
| | - Richard A Neher
- Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany.,Biozentrum and SIB Swiss Institute of Bioinformatics, University of Basel, 4056 Basel, Switzerland
| |
Collapse
|
9
|
Lees JA, Harris SR, Tonkin-Hill G, Gladstone RA, Lo SW, Weiser JN, Corander J, Bentley SD, Croucher NJ. Fast and flexible bacterial genomic epidemiology with PopPUNK. Genome Res 2019; 29:304-316. [PMID: 30679308 PMCID: PMC6360808 DOI: 10.1101/gr.241455.118] [Citation(s) in RCA: 245] [Impact Index Per Article: 40.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2018] [Accepted: 12/10/2018] [Indexed: 12/02/2022]
Abstract
The routine use of genomics for disease surveillance provides the opportunity for high-resolution bacterial epidemiology. Current whole-genome clustering and multilocus typing approaches do not fully exploit core and accessory genomic variation, and they cannot both automatically identify, and subsequently expand, clusters of significantly similar isolates in large data sets spanning entire species. Here, we describe PopPUNK (Population Partitioning Using Nucleotide K -mers), a software implementing scalable and expandable annotation- and alignment-free methods for population analysis and clustering. Variable-length k-mer comparisons are used to distinguish isolates' divergence in shared sequence and gene content, which we demonstrate to be accurate over multiple orders of magnitude using data from both simulations and genomic collections representing 10 taxonomically widespread species. Connections between closely related isolates of the same strain are robustly identified, despite interspecies variation in the pairwise distance distributions that reflects species' diverse evolutionary patterns. PopPUNK can process 103-104 genomes in a single batch, with minimal memory use and runtimes up to 200-fold faster than existing model-based methods. Clusters of strains remain consistent as new batches of genomes are added, which is achieved without needing to reanalyze all genomes de novo. This facilitates real-time surveillance with consistent cluster naming between studies and allows for outbreak detection using hundreds of genomes in minutes. Interactive visualization and online publication is streamlined through the automatic output of results to multiple platforms. PopPUNK has been designed as a flexible platform that addresses important issues with currently used whole-genome clustering and typing methods, and has potential uses across bacterial genetics and public health research.
Collapse
Affiliation(s)
- John A Lees
- Department of Microbiology, New York University School of Medicine, New York, New York 10016, USA
| | - Simon R Harris
- Parasites and Microbes, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom
| | - Gerry Tonkin-Hill
- Parasites and Microbes, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom
| | - Rebecca A Gladstone
- Parasites and Microbes, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom
| | - Stephanie W Lo
- Parasites and Microbes, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom
| | - Jeffrey N Weiser
- Department of Microbiology, New York University School of Medicine, New York, New York 10016, USA
| | - Jukka Corander
- Parasites and Microbes, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom
- Department of Biostatistics, University of Oslo, 0372 Oslo, Norway
- Helsinki Institute of Information Technology, Department of Mathematics and Statistics, University of Helsinki, 00014 Helsinki, Finland
| | - Stephen D Bentley
- Parasites and Microbes, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom
- Institute of Infection and Global Health, University of Liverpool, Liverpool L7 3EA, United Kingdom
- Department of Pathology, University of Cambridge, Cambridge CB2 1QP, United Kingdom
| | - Nicholas J Croucher
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London W2 1PG, United Kingdom
| |
Collapse
|
10
|
Järvenpää M, Gutmann MU, Vehtari A, Marttinen P. Gaussian process modelling in approximate Bayesian computation to estimate horizontal gene transfer in bacteria. Ann Appl Stat 2018. [DOI: 10.1214/18-aoas1150] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
11
|
Abudahab K, Prada JM, Yang Z, Bentley SD, Croucher NJ, Corander J, Aanensen DM. PANINI: Pangenome Neighbour Identification for Bacterial Populations. Microb Genom 2018; 5. [PMID: 30465642 PMCID: PMC6521588 DOI: 10.1099/mgen.0.000220] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The standard workhorse for genomic analysis of the evolution of bacterial populations is phylogenetic modelling of mutations in the core genome. However, a notable amount of information about evolutionary and transmission processes in diverse populations can be lost unless the accessory genome is also taken into consideration. Here, we introduce panini (Pangenome Neighbour Identification for Bacterial Populations), a computationally scalable method for identifying the neighbours for each isolate in a data set using unsupervised machine learning with stochastic neighbour embedding based on the t-SNE (t-distributed stochastic neighbour embedding) algorithm. panini is browser-based and integrates with the Microreact platform for rapid online visualization and exploration of both core and accessory genome evolutionary signals, together with relevant epidemiological, geographical, temporal and other metadata. Several case studies with single- and multi-clone pneumococcal populations are presented to demonstrate the ability to identify biologically important signals from gene content data. panini is available at http://panini.pathogen.watch and code at http://gitlab.com/cgps/panini.
Collapse
Affiliation(s)
- Khalil Abudahab
- 1Centre for Genomic Pathogen Surveillance, Wellcome Genome Campus, Hinxton, UK
| | - Joaquín M Prada
- 2School of Veterinary Medicine, University of Surrey, Guildford, UK
| | - Zhirong Yang
- 3Department of Mathematics and Statistics, Helsinki Institute of Information Technology, University of Helsinki, FI-00014 Helsinki, Finland
| | | | - Nicholas J Croucher
- 5Department of Infectious Disease Epidemiology, Imperial College London, London, UK
| | - Jukka Corander
- 3Department of Mathematics and Statistics, Helsinki Institute of Information Technology, University of Helsinki, FI-00014 Helsinki, Finland.,6Department of Biostatistics, Institute of Basic Medical Sciences, University of Oslo, N-0317 Oslo, Norway
| | - David M Aanensen
- 1Centre for Genomic Pathogen Surveillance, Wellcome Genome Campus, Hinxton, UK.,7Big Data Institute, Li Ka Shing Centre for Health Informatics, University of Oxford, Oxford, UK
| |
Collapse
|
12
|
Lintusaari J, Gutmann MU, Dutta R, Kaski S, Corander J. Fundamentals and Recent Developments in Approximate Bayesian Computation. Syst Biol 2018; 66:e66-e82. [PMID: 28175922 PMCID: PMC5837704 DOI: 10.1093/sysbio/syw077] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2015] [Revised: 08/09/2016] [Accepted: 08/09/2016] [Indexed: 12/16/2022] Open
Abstract
Bayesian inference plays an important role in phylogenetics, evolutionary biology, and in many other branches of science. It provides a principled framework for dealing with uncertainty and quantifying how it changes in the light of new evidence. For many complex models and inference problems, however, only approximate quantitative answers are obtainable. Approximate Bayesian computation (ABC) refers to a family of algorithms for approximate inference that makes a minimal set of assumptions by only requiring that sampling from a model is possible. We explain here the fundamentals of ABC, review the classical algorithms, and highlight recent developments. [ABC; approximate Bayesian computation; Bayesian inference; likelihood-free inference; phylogenetics; simulator-based models; stochastic simulation models; tree-based models.]
Collapse
Affiliation(s)
- Jarno Lintusaari
- Department of Computer Science, Aalto University, Espoo, Finland.,Helsinki Institute for Information Technology HIIT, Espoo, Finland
| | - Michael U Gutmann
- Department of Computer Science, Aalto University, Espoo, Finland.,Helsinki Institute for Information Technology HIIT, Espoo, Finland.,Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
| | - Ritabrata Dutta
- Department of Computer Science, Aalto University, Espoo, Finland.,Helsinki Institute for Information Technology HIIT, Espoo, Finland
| | - Samuel Kaski
- Department of Computer Science, Aalto University, Espoo, Finland.,Helsinki Institute for Information Technology HIIT, Espoo, Finland
| | - Jukka Corander
- Helsinki Institute for Information Technology HIIT, Espoo, Finland.,Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland.,Department of Biostatistics, University of Oslo, Oslo, Norway
| |
Collapse
|
13
|
Corander J, Fraser C, Gutmann MU, Arnold B, Hanage WP, Bentley SD, Lipsitch M, Croucher NJ. Frequency-dependent selection in vaccine-associated pneumococcal population dynamics. Nat Ecol Evol 2017; 1:1950-1960. [PMID: 29038424 PMCID: PMC5708525 DOI: 10.1038/s41559-017-0337-x] [Citation(s) in RCA: 90] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2017] [Accepted: 09/01/2017] [Indexed: 12/21/2022]
Abstract
Many bacterial species are composed of multiple lineages distinguished by extensive variation in gene content. These often cocirculate in the same habitat, but the evolutionary and ecological processes that shape these complex populations are poorly understood. Addressing these questions is particularly important for Streptococcus pneumoniae, a nasopharyngeal commensal and respiratory pathogen, because the changes in population structure associated with the recent introduction of partial-coverage vaccines have substantially reduced pneumococcal disease. Here we show that pneumococcal lineages from multiple populations each have a distinct combination of intermediate-frequency genes. Functional analysis suggested that these loci may be subject to negative frequency-dependent selection (NFDS) through interactions with other bacteria, hosts or mobile elements. Correspondingly, these genes had similar frequencies in four populations with dissimilar lineage compositions. These frequencies were maintained following substantial alterations in lineage prevalences once vaccination programmes began. Fitting a multilocus NFDS model of post-vaccine population dynamics to three genomic datasets using Approximate Bayesian Computation generated reproducible estimates of the influence of NFDS on pneumococcal evolution, the strength of which varied between loci. Simulations replicated the stable frequency of lineages unperturbed by vaccination, patterns of serotype switching and clonal replacement. This framework highlights how bacterial ecology affects the impact of clinical interventions.
Collapse
Affiliation(s)
- Jukka Corander
- Helsinki Institute for Information Technology, Department of Mathematics and Statistics, University of Helsinki, 00014, Helsinki, Finland
- Department of Biostatistics, University of Oslo, 0317, Oslo, Norway
- Infection Genomics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Christophe Fraser
- Big Data Institute, Nuffield Department of Medicine, University of Oxford, Oxford, OX3 7LF, UK
| | - Michael U Gutmann
- School of Informatics, University of Edinburgh, Edinburgh, EH8 9AB, UK
| | - Brian Arnold
- Center for Communicable Disease Dynamics, Harvard T. H. Chan School of Public Health, 677 Huntington Avenue, Boston, MA, 02115, USA
| | - William P Hanage
- Center for Communicable Disease Dynamics, Harvard T. H. Chan School of Public Health, 677 Huntington Avenue, Boston, MA, 02115, USA
| | - Stephen D Bentley
- Infection Genomics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Marc Lipsitch
- Center for Communicable Disease Dynamics, Harvard T. H. Chan School of Public Health, 677 Huntington Avenue, Boston, MA, 02115, USA
- Departments of Epidemiology and Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, 677 Huntington Avenue, Boston, MA, 02115, USA
| | - Nicholas J Croucher
- MRC Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, Imperial College London, London, W2 1PG, UK.
| |
Collapse
|
14
|
Apagyi KJ, Fraser C, Croucher NJ. Transformation Asymmetry and the Evolution of the Bacterial Accessory Genome. Mol Biol Evol 2017; 35:575-581. [PMID: 29211859 PMCID: PMC5850275 DOI: 10.1093/molbev/msx309] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Bacterial transformation can insert or delete genomic islands (GIs), depending on the donor and recipient genotypes, if an homologous recombination spans the GI’s integration site and includes sufficiently long flanking homologous arms. Combining mathematical models of recombination with experiments using pneumococci found GI insertion rates declined geometrically with the GI’s size. The decrease in acquisition frequency with length (1.08×10−3 bp−1) was higher than a previous estimate of the analogous rate at which core genome recombinations terminated. Although most efficient for shorter GIs, transformation-mediated deletion frequencies did not vary consistently with GI length, with removal of 10-kb GIs ∼50% as efficient as acquisition of base substitutions. Fragments of 2 kb, typical of transformation event sizes, could drive all these deletions independent of island length. The strong asymmetry of transformation, and its capacity to efficiently remove GIs, suggests nonmobile accessory loci will decline in frequency without preservation by selection.
Collapse
Affiliation(s)
- Katinka J Apagyi
- MRC Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom
| | - Christophe Fraser
- Big Data Institute, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
| | - Nicholas J Croucher
- MRC Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom
| |
Collapse
|
15
|
Mostowy R, Croucher NJ, Andam CP, Corander J, Hanage WP, Marttinen P. Efficient Inference of Recent and Ancestral Recombination within Bacterial Populations. Mol Biol Evol 2017; 34:1167-1182. [PMID: 28199698 PMCID: PMC5400400 DOI: 10.1093/molbev/msx066] [Citation(s) in RCA: 114] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
Prokaryotic evolution is affected by horizontal transfer of genetic material through recombination. Inference of an evolutionary tree of bacteria thus relies on accurate identification of the population genetic structure and recombination-derived mosaicism. Rapidly growing databases represent a challenge for computational methods to detect recombinations in bacterial genomes. We introduce a novel algorithm called fastGEAR which identifies lineages in diverse microbial alignments, and recombinations between them and from external origins. The algorithm detects both recent recombinations (affecting a few isolates) and ancestral recombinations between detected lineages (affecting entire lineages), thus providing insight into recombinations affecting deep branches of the phylogenetic tree. In simulations, fastGEAR had comparable power to detect recent recombinations and outstanding power to detect the ancestral ones, compared with state-of-the-art methods, often with a fraction of computational cost. We demonstrate the utility of the method by analyzing a collection of 616 whole-genomes of a recombinogenic pathogen Streptococcus pneumoniae, for which the method provided a high-resolution view of recombination across the genome. We examined in detail the penicillin-binding genes across the Streptococcus genus, demonstrating previously undetected genetic exchanges between different species at these three loci. Hence, fastGEAR can be readily applied to investigate mosaicism in bacterial genes across multiple species. Finally, fastGEAR correctly identified many known recombination hotspots and pointed to potential new ones. Matlab code and Linux/Windows executables are available at https://users.ics.aalto.fi/~pemartti/fastGEAR/ (last accessed February 6, 2017).
Collapse
Affiliation(s)
- Rafal Mostowy
- Department of Infectious Disease Epidemiology, St. Mary's Campus, Imperial College London, London, United Kingdom
| | - Nicholas J Croucher
- Department of Infectious Disease Epidemiology, St. Mary's Campus, Imperial College London, London, United Kingdom
| | - Cheryl P Andam
- Department of Epidemiology, Harvard TH Chan School of Public Health, Center for Communicable Disease Dynamics, Boston, MA
| | - Jukka Corander
- Department of Mathematics and Statistics, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, Finland.,Department of Biostatistics, University of Oslo, Oslo, Norway
| | - William P Hanage
- Department of Epidemiology, Harvard TH Chan School of Public Health, Center for Communicable Disease Dynamics, Boston, MA
| | - Pekka Marttinen
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Espoo, Finland
| |
Collapse
|
16
|
Recombination-Driven Genome Evolution and Stability of Bacterial Species. Genetics 2017; 207:281-295. [PMID: 28751420 DOI: 10.1534/genetics.117.300061] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2016] [Accepted: 07/18/2017] [Indexed: 01/21/2023] Open
Abstract
While bacteria divide clonally, horizontal gene transfer followed by homologous recombination is now recognized as an important contributor to their evolution. However, the details of how the competition between clonality and recombination shapes genome diversity remains poorly understood. Using a computational model, we find two principal regimes in bacterial evolution and identify two composite parameters that dictate the evolutionary fate of bacterial species. In the divergent regime, characterized by either a low recombination frequency or strict barriers to recombination, cohesion due to recombination is not sufficient to overcome the mutational drift. As a consequence, the divergence between pairs of genomes in the population steadily increases in the course of their evolution. The species lacks genetic coherence with sexually isolated clonal subpopulations continuously formed and dissolved. In contrast, in the metastable regime, characterized by a high recombination frequency combined with low barriers to recombination, genomes continuously recombine with the rest of the population. The population remains genetically cohesive and temporally stable. Notably, the transition between these two regimes can be affected by relatively small changes in evolutionary parameters. Using the Multi Locus Sequence Typing (MLST) data, we classify a number of bacterial species to be either the divergent or the metastable type. Generalizations of our framework to include selection, ecologically structured populations, and horizontal gene transfer of nonhomologous regions are discussed as well.
Collapse
|
17
|
Marttinen P, Hanage WP. Speciation trajectories in recombining bacterial species. PLoS Comput Biol 2017; 13:e1005640. [PMID: 28671999 PMCID: PMC5542674 DOI: 10.1371/journal.pcbi.1005640] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2017] [Revised: 08/03/2017] [Accepted: 06/15/2017] [Indexed: 01/26/2023] Open
Abstract
It is generally agreed that bacterial diversity can be classified into genetically and ecologically cohesive units, but what produces such variation is a topic of intensive research. Recombination may maintain coherent species of frequently recombining bacteria, but the emergence of distinct clusters within a recombining species, and the impact of habitat structure in this process are not well described, limiting our understanding of how new species are created. Here we present a model of bacterial evolution in overlapping habitat space. We show that the amount of habitat overlap determines the outcome for a pair of clusters, which may range from fast clonal divergence with little interaction between the clusters to a stationary population structure, where different clusters maintain an equilibrium distance between each other for an indefinite time. We fit our model to two data sets. In Streptococcus pneumoniae, we find a genomically and ecologically distinct subset, held at a relatively constant genetic distance from the majority of the population through frequent recombination with it, while in Campylobacter jejuni, we find a minority population we predict will continue to diverge at a higher rate. This approach may predict and define speciation trajectories in multiple bacterial species.
Collapse
Affiliation(s)
- Pekka Marttinen
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Espoo, Finland
- Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, MA, USA
| | - William P. Hanage
- Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
18
|
David S, Sánchez-Busó L, Harris SR, Marttinen P, Rusniok C, Buchrieser C, Harrison TG, Parkhill J. Dynamics and impact of homologous recombination on the evolution of Legionella pneumophila. PLoS Genet 2017. [PMID: 28650958 PMCID: PMC5507463 DOI: 10.1371/journal.pgen.1006855] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Legionella pneumophila is an environmental bacterium and the causative agent of Legionnaires' disease. Previous genomic studies have shown that recombination accounts for a high proportion (>96%) of diversity within several major disease-associated sequence types (STs) of L. pneumophila. This suggests that recombination represents a potentially important force shaping adaptation and virulence. Despite this, little is known about the biological effects of recombination in L. pneumophila, particularly with regards to homologous recombination (whereby genes are replaced with alternative allelic variants). Using newly available population genomic data, we have disentangled events arising from homologous and non-homologous recombination in six major disease-associated STs of L. pneumophila (subsp. pneumophila), and subsequently performed a detailed characterisation of the dynamics and impact of homologous recombination. We identified genomic "hotspots" of homologous recombination that include regions containing outer membrane proteins, the lipopolysaccharide (LPS) region and Dot/Icm effectors, which provide interesting clues to the selection pressures faced by L. pneumophila. Inference of the origin of the recombined regions showed that isolates have most frequently imported DNA from isolates belonging to their own clade, but also occasionally from other major clades of the same subspecies. This supports the hypothesis that the possibility for horizontal exchange of new adaptations between major clades of the subspecies may have been a critical factor in the recent emergence of several clinically important STs from diverse genomic backgrounds. However, acquisition of recombined regions from another subspecies, L. pneumophila subsp. fraseri, was rarely observed, suggesting the existence of a recombination barrier and/or the possibility of ongoing speciation between the two subspecies. Finally, we suggest that multi-fragment recombination may occur in L. pneumophila, whereby multiple non-contiguous segments that originate from the same molecule of donor DNA are imported into a recipient genome during a single episode of recombination.
Collapse
Affiliation(s)
- Sophia David
- Pathogen Genomics, Wellcome Trust Sanger Institute, Cambridge, United Kingdom
- Respiratory and Vaccine Preventable Bacteria Reference Unit, Public Health England, London, United Kingdom
| | - Leonor Sánchez-Busó
- Pathogen Genomics, Wellcome Trust Sanger Institute, Cambridge, United Kingdom
| | - Simon R. Harris
- Pathogen Genomics, Wellcome Trust Sanger Institute, Cambridge, United Kingdom
| | - Pekka Marttinen
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Aalto, Espoo, Finland
| | - Christophe Rusniok
- Institut Pasteur, Biologie des Bactéries Intracellulaires, Paris, France
- CNRS UMR 3525, Paris, France
| | - Carmen Buchrieser
- Institut Pasteur, Biologie des Bactéries Intracellulaires, Paris, France
- CNRS UMR 3525, Paris, France
| | - Timothy G. Harrison
- Respiratory and Vaccine Preventable Bacteria Reference Unit, Public Health England, London, United Kingdom
| | - Julian Parkhill
- Pathogen Genomics, Wellcome Trust Sanger Institute, Cambridge, United Kingdom
- * E-mail:
| |
Collapse
|
19
|
Shapiro BJ. How clonal are bacteria over time? Curr Opin Microbiol 2016; 31:116-123. [PMID: 27057964 DOI: 10.1016/j.mib.2016.03.013] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2016] [Revised: 03/20/2016] [Accepted: 03/22/2016] [Indexed: 11/15/2022]
Abstract
Bacteria and archaea reproduce clonally (vertical descent), but exchange genes by recombination (horizontal transfer). Recombination allows adaptive mutations or genes to spread rapidly within (or even between) species, and reduces the burden of deleterious mutations. Clonality-defined here as the balance between vertical and horizontal inheritance-is therefore a key microbial trait, determining how quickly a population can adapt and the size of its gene pool. Here, I discuss whether clonality varies over time and if it can be considered a stable trait of a given population. I show that, in some cases, clonality is clearly not static. For example, non-clonal (highly recombining) populations can give rise to clonal expansions, often of pathogens. However, an analysis of time-course metagenomic data from a lake suggests that a bacterial population's past clonality (as measured by its genetic diversity) is a good predictor of its future clonality. Clonality therefore appears to be relatively-but not completely-stable over evolutionary time.
Collapse
Affiliation(s)
- B Jesse Shapiro
- Département de sciences biologiques, Université de Montréal, Montréal, QC H3C 3J7, Canada.
| |
Collapse
|