1
|
Jacobs B, Bogaerts B, Verhaegen M, Vanneste K, De Keersmaecker SCJ, Roosens NHC, Rajkovic A, Mahillon J, Van Nieuwenhuysen T, Van Hoorde K. Whole-genome sequencing of soil- and foodborne Bacillus cereus sensu lato indicates no clear association between their virulence repertoire, genomic diversity and food matrix. Int J Food Microbiol 2025; 439:111266. [PMID: 40378489 DOI: 10.1016/j.ijfoodmicro.2025.111266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2025] [Revised: 04/29/2025] [Accepted: 05/10/2025] [Indexed: 05/19/2025]
Abstract
Bacillus cereus sensu lato is frequently involved in foodborne toxico-infections and is found in various foodstuff. It is unclear whether certain strains have a higher affinity for specific food matrices, which can be of interest for risk assessment. This study reports the characterization by whole-genome sequencing of 169 B. cereus isolates, isolated from 12 food types and soil over two decades. Any potential links between the food matrix of isolation, the isolate's genetic lineage and/or their (putative) virulence gene reservoir were investigated. More than 20 % of the strains contained the genes for the main potential enterotoxins (nheABC, hblCDA and cytK_2). Cereulide biosynthesis genes and genes encoding hemolysins and phospholipases, were detected in multiple isolates. Strain typing revealed a high diversity, as illustrated by 84 distinct sequence types, including 26 not previously described. This diversity was also reflected in the detection of all seven panC types and 71 unique virulence gene profiles. Core-genome MLST was used for phylogenomic investigation of the entire collection and SNP-based clustering was performed on the four most abundant sequence types, which did not reveal a clear affinity for specific B. cereus lineages or (putative) virulence genes for certain food matrices. Additionally, minimal genetic overlap was observed between soil and foodborne isolates. Clusters of closely-related isolates with common epidemiological metadata were detected. However, some isolates from different food matrices or collected several years apart were found to be genetically identical. This study provides elements that can be used for risk assessment of B. cereus in food.
Collapse
Affiliation(s)
- Bram Jacobs
- Foodborne Pathogens, Sciensano, Juliette Wytsmanstraat 14, Brussels, Belgium; Laboratory of Food Microbiology and Food Preservation, Department of Food Technology, Safety and Health, Faculty of Bioscience Engineering, Ghent University, Coupure Links 635, Ghent, Belgium; Laboratory of Food and Environmental Microbiology, Earth and Life Institute, Catholic University of Louvain, Croix du Sud 2, Louvain-la-Neuve, Belgium.
| | - Bert Bogaerts
- Transversal activities in Applied Genomics, Sciensano, Juliette Wytsmanstraat 14, Brussels, Belgium
| | - Marie Verhaegen
- Laboratory of Food and Environmental Microbiology, Earth and Life Institute, Catholic University of Louvain, Croix du Sud 2, Louvain-la-Neuve, Belgium
| | - Kevin Vanneste
- Transversal activities in Applied Genomics, Sciensano, Juliette Wytsmanstraat 14, Brussels, Belgium
| | | | - Nancy H C Roosens
- Transversal activities in Applied Genomics, Sciensano, Juliette Wytsmanstraat 14, Brussels, Belgium
| | - Andreja Rajkovic
- Laboratory of Food Microbiology and Food Preservation, Department of Food Technology, Safety and Health, Faculty of Bioscience Engineering, Ghent University, Coupure Links 635, Ghent, Belgium
| | - Jacques Mahillon
- Laboratory of Food and Environmental Microbiology, Earth and Life Institute, Catholic University of Louvain, Croix du Sud 2, Louvain-la-Neuve, Belgium
| | | | - Koenraad Van Hoorde
- Foodborne Pathogens, Sciensano, Juliette Wytsmanstraat 14, Brussels, Belgium
| |
Collapse
|
2
|
Derelle R, von Wachsmann J, Mäklin T, Hellewell J, Russell T, Lalvani A, Chindelevitch L, Croucher NJ, Harris SR, Lees JA. Seamless, rapid, and accurate analyses of outbreak genomic data using split k-mer analysis. Genome Res 2024; 34:1661-1673. [PMID: 39406504 PMCID: PMC11529842 DOI: 10.1101/gr.279449.124] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Accepted: 09/16/2024] [Indexed: 11/01/2024]
Abstract
Sequence variation observed in populations of pathogens can be used for important public health and evolutionary genomic analyses, especially outbreak analysis and transmission reconstruction. Identifying this variation is typically achieved by aligning sequence reads to a reference genome, but this approach is susceptible to reference biases and requires careful filtering of called genotypes. There is a need for tools that can process this growing volume of bacterial genome data, providing rapid results, but that remain simple so they can be used without highly trained bioinformaticians, expensive data analysis, and long-term storage and processing of large files. Here we describe split k-mer analysis (SKA2), a method that supports both reference-free and reference-based mapping to quickly and accurately genotype populations of bacteria using sequencing reads or genome assemblies. SKA2 is highly accurate for closely related samples, and in outbreak simulations, we show superior variant recall compared with reference-based methods, with no false positives. SKA2 can also accurately map variants to a reference and be used with recombination detection methods to rapidly reconstruct vertical evolutionary history. SKA2 is many times faster than comparable methods and can be used to add new genomes to an existing call set, allowing sequential use without the need to reanalyze entire collections. With an inherent absence of reference bias, high accuracy, and a robust implementation, SKA2 has the potential to become the tool of choice for genotyping bacteria. SKA2 is implemented in Rust and is freely available as open-source software.
Collapse
Affiliation(s)
- Romain Derelle
- NIHR Health Protection Research Unit in Respiratory Infections, National Heart and Lung Institute, Imperial College London, London W21PG, United Kingdom
| | - Johanna von Wachsmann
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
| | - Tommi Mäklin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
- Department of Mathematics and Statistics, University of Helsinki, Helsinki 00014, Finland
| | - Joel Hellewell
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
| | - Timothy Russell
- Centre for Mathematical Modelling of Infectious Diseases, London School of Hygiene & Tropical Medicine, London WC1E 7HT, United Kingdom
| | - Ajit Lalvani
- NIHR Health Protection Research Unit in Respiratory Infections, National Heart and Lung Institute, Imperial College London, London W21PG, United Kingdom
| | - Leonid Chindelevitch
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London W12 0BZ, United Kingdom
| | - Nicholas J Croucher
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London W12 0BZ, United Kingdom
| | - Simon R Harris
- Bill and Melinda Gates Foundation, Westminster, London SW1E 6AJ, United Kingdom
| | - John A Lees
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom;
| |
Collapse
|
3
|
Daugaliyeva A, Daugaliyeva S, Kydyr N, Peletto S. Molecular typing methods to characterize Brucella spp. from animals: A review. Vet World 2024; 17:1778-1788. [PMID: 39328439 PMCID: PMC11422631 DOI: 10.14202/vetworld.2024.1778-1788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2024] [Accepted: 07/18/2024] [Indexed: 09/28/2024] Open
Abstract
Brucellosis is an infectious disease of animals that can infect humans. The disease causes significant economic losses and threatens human health. A timely and accurate disease diagnosis plays a vital role in the identification of brucellosis. In addition to traditional diagnostic methods, molecular methods allow diagnosis and typing of the causative agent of brucellosis. This review will discuss various methods, such as Bruce-ladder, Suiladder, high-resolution melt analysis, restriction fragment length polymorphism, multilocus sequence typing, multilocus variable-number tandem repeat analysis, and whole-genome sequencing single-nucleotide polymorphism, for the molecular typing of Brucella and discuss their advantages and disadvantages.
Collapse
Affiliation(s)
- Aida Daugaliyeva
- LLP "Kazakh Research Institute for Livestock and Fodder Production," St. Zhandosova 51, Almaty 050035, Kazakhstan
| | - Saule Daugaliyeva
- LLP "Scientific Production Center of Microbiology and Virology," Bogenbay Batyr Str. 105, Almaty 050010, Kazakhstan
| | - Nazerke Kydyr
- LLP "Kazakh Research Institute for Livestock and Fodder Production," St. Zhandosova 51, Almaty 050035, Kazakhstan
| | - Simone Peletto
- Experimental Zooprofilactic Institute of Piedmont, Liguria and Aosta Valley, Via Bologna 148, 10154 Turin, Italy
| |
Collapse
|
4
|
Donkpegan ASL, Bernard A, Barreneche T, Quero-García J, Bonnet H, Fouché M, Le Dantec L, Wenden B, Dirlewanger E. Genome-wide association mapping in a sweet cherry germplasm collection ( Prunus avium L.) reveals candidate genes for fruit quality traits. HORTICULTURE RESEARCH 2023; 10:uhad191. [PMID: 38239559 PMCID: PMC10794993 DOI: 10.1093/hr/uhad191] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 09/12/2023] [Indexed: 01/22/2024]
Abstract
In sweet cherry (Prunus avium L.), large variability exists for various traits related to fruit quality. There is a need to discover the genetic architecture of these traits in order to enhance the efficiency of breeding strategies for consumer and producer demands. With this objective, a germplasm collection consisting of 116 sweet cherry accessions was evaluated for 23 agronomic fruit quality traits over 2-6 years, and characterized using a genotyping-by-sequencing approach. The SNP coverage collected was used to conduct a genome-wide association study using two multilocus models and three reference genomes. We identified numerous SNP-trait associations for global fruit size (weight, width, and thickness), fruit cracking, fruit firmness, and stone size, and we pinpointed several candidate genes involved in phytohormone, calcium, and cell wall metabolisms. Finally, we conducted a precise literature review focusing on the genetic architecture of fruit quality traits in sweet cherry to compare our results with potential colocalizations of marker-trait associations. This study brings new knowledge of the genetic control of important agronomic traits related to fruit quality, and to the development of marker-assisted selection strategies targeted towards the facilitation of breeding efforts.
Collapse
Affiliation(s)
- Armel S L Donkpegan
- UMR BFP, INRAE, University of Bordeaux, 71 Avenue Edouard
Bourlaux, F-33882 Villenave d’Ornon, France
- UMR BOA, SYSAAF, Centre INRAE Val de Loire, 37380
Nouzilly, France
| | - Anthony Bernard
- UMR BFP, INRAE, University of Bordeaux, 71 Avenue Edouard
Bourlaux, F-33882 Villenave d’Ornon, France
| | - Teresa Barreneche
- UMR BFP, INRAE, University of Bordeaux, 71 Avenue Edouard
Bourlaux, F-33882 Villenave d’Ornon, France
| | - José Quero-García
- UMR BFP, INRAE, University of Bordeaux, 71 Avenue Edouard
Bourlaux, F-33882 Villenave d’Ornon, France
| | - Hélène Bonnet
- UMR BFP, INRAE, University of Bordeaux, 71 Avenue Edouard
Bourlaux, F-33882 Villenave d’Ornon, France
| | - Mathieu Fouché
- UMR BFP, INRAE, University of Bordeaux, 71 Avenue Edouard
Bourlaux, F-33882 Villenave d’Ornon, France
| | - Loïck Le Dantec
- UMR BFP, INRAE, University of Bordeaux, 71 Avenue Edouard
Bourlaux, F-33882 Villenave d’Ornon, France
| | - Bénédicte Wenden
- UMR BFP, INRAE, University of Bordeaux, 71 Avenue Edouard
Bourlaux, F-33882 Villenave d’Ornon, France
| | - Elisabeth Dirlewanger
- UMR BFP, INRAE, University of Bordeaux, 71 Avenue Edouard
Bourlaux, F-33882 Villenave d’Ornon, France
| |
Collapse
|
5
|
Seah YM, Stewart MK, Hoogestraat D, Ryder M, Cookson BT, Salipante SJ, Hoffman NG. In Silico Evaluation of Variant Calling Methods for Bacterial Whole-Genome Sequencing Assays. J Clin Microbiol 2023; 61:e0184222. [PMID: 37428072 PMCID: PMC10446864 DOI: 10.1128/jcm.01842-22] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Accepted: 06/18/2023] [Indexed: 07/11/2023] Open
Abstract
Identification and analysis of clinically relevant strains of bacteria increasingly relies on whole-genome sequencing. The downstream bioinformatics steps necessary for calling variants from short-read sequences are well-established but seldom validated against haploid genomes. We devised an in silico workflow to introduce single nucleotide polymorphisms (SNP) and indels into bacterial reference genomes, and computationally generate sequencing reads based on the mutated genomes. We then applied the method to Mycobacterium tuberculosis H37Rv, Staphylococcus aureus NCTC 8325, and Klebsiella pneumoniae HS11286, and used the synthetic reads as truth sets for evaluating several popular variant callers. Insertions proved especially challenging for most variant callers to correctly identify, relative to deletions and single nucleotide polymorphisms. With adequate read depth, however, variant callers that use high quality soft-clipped reads and base mismatches to perform local realignment consistently had the highest precision and recall in identifying insertions and deletions ranging from1 to 50 bp. The remaining variant callers had lower recall values associated with identification of insertions greater than 20 bp.
Collapse
Affiliation(s)
- Yee Mey Seah
- Department of Laboratory Medicine and Pathology, University of Washington Medical Center, Seattle, Washington, USA
| | - Mary K. Stewart
- Department of Laboratory Medicine and Pathology, University of Washington Medical Center, Seattle, Washington, USA
| | - Daniel Hoogestraat
- Department of Laboratory Medicine and Pathology, University of Washington Medical Center, Seattle, Washington, USA
| | - Molly Ryder
- Department of Laboratory Medicine and Pathology, University of Washington Medical Center, Seattle, Washington, USA
| | - Brad T. Cookson
- Department of Laboratory Medicine and Pathology, University of Washington Medical Center, Seattle, Washington, USA
- Department of Microbiology, University of Washington, Seattle, Washington, USA
| | - Stephen J. Salipante
- Department of Laboratory Medicine and Pathology, University of Washington Medical Center, Seattle, Washington, USA
| | - Noah G. Hoffman
- Department of Laboratory Medicine and Pathology, University of Washington Medical Center, Seattle, Washington, USA
| |
Collapse
|
6
|
Deblais L, Ranjit S, Vrisman C, Antony L, Scaria J, Miller SA, Rajashekara G. Role of Stress-Induced Proteins RpoS and YicC in the Persistence of Salmonella enterica subsp. enterica Serotype Typhimurium in Tomato Plants. MOLECULAR PLANT-MICROBE INTERACTIONS : MPMI 2023; 36:109-118. [PMID: 36394339 DOI: 10.1094/mpmi-07-22-0152-r] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Understanding the functional role of bacterial genes in the persistence of Salmonella in plant organs can facilitate the development of agricultural practices to mitigate food safety risks associated with the consumption of fresh produce contaminated with Salmonella spp. Our study showed that Salmonella enterica subsp. enterica serotype Typhimurium (strain MDD14) persisted less in inoculated tomato plants than other Salmonella Typhimurium strains tested (JSG210, JSG626, JSG634, JSG637, JSG3444, and EV030415; P < 0.01). In-vitro assays performed in limited-nutrient conditions (growth rate, biofilm production, and motility) were inconclusive in explaining the in-planta phenotype observed with MDD14. Whole-genome sequencing combined with non-synonymous single nucleotide variations analysis was performed to identify genomic differences between MDD14 and the other Salmonella Typhimurium strains. The genome of MDD14 contained a truncated version (123 bp N-terminal) of yicC and a mutated version of rpoS (two non-synonymous substitutions, i.e., G66E and R82C), which are two stress-induced proteins involved in iron acquisition, environmental sensing, and cell envelope integrity. The rpoS and yicC genes were deleted in Salmonella Typhimurium JSG210 with the Lambda Red recombining system. Both mutants had limited persistence in tomato plant organs, similar to that of MDD14. In conclusion, we demonstrated that YicC and RpoS are involved in the persistence of Salmonella in tomato plants in greenhouse conditions and, thus, could represent potential targets to mitigate persistence of Salmonella spp. in planta. [Formula: see text] Copyright © 2023 The Author(s). This is an open access article distributed under the CC BY-NC-ND 4.0 International license.
Collapse
Affiliation(s)
- Loïc Deblais
- Department of Animal Sciences, The Ohio State University, Wooster, OH, U.S.A
| | - Sochina Ranjit
- Department of Animal Sciences, The Ohio State University, Wooster, OH, U.S.A
| | - Claudio Vrisman
- Department of Plant Pathology, The Ohio State University, Wooster, OH, U.S.A
| | - Linto Antony
- Department of Veterinary and Biomedical Sciences, South Dakota State University, Brookings, SD, U.S.A
| | - Joy Scaria
- Department of Veterinary and Biomedical Sciences, South Dakota State University, Brookings, SD, U.S.A
| | - Sally A Miller
- Department of Plant Pathology, The Ohio State University, Wooster, OH, U.S.A
| | - Gireesh Rajashekara
- Department of Animal Sciences, The Ohio State University, Wooster, OH, U.S.A
| |
Collapse
|
7
|
Enam SU, Cherry JL, Leonard SR, Zheludev IN, Lipman DJ, Fire AZ. Restriction Endonuclease-Based Modification-Dependent Enrichment (REMoDE) of DNA for Metagenomic Sequencing. Appl Environ Microbiol 2023; 89:e0167022. [PMID: 36519847 PMCID: PMC9888230 DOI: 10.1128/aem.01670-22] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Accepted: 11/29/2022] [Indexed: 12/23/2022] Open
Abstract
Metagenomic sequencing is a swift and powerful tool to ascertain the presence of an organism of interest in a sample. However, sequencing coverage of the organism of interest can be insufficient due to an inundation of reads from irrelevant organisms in the sample. Here, we report a nuclease-based approach to rapidly enrich for DNA from certain organisms, including enterobacteria, based on their differential endogenous modification patterns. We exploit the ability of taxon-specific methylated motifs to resist the action of cognate methylation-sensitive restriction endonucleases that thereby digest unwanted, unmethylated DNA. Subsequently, we use a distributive exonuclease or electrophoretic separation to deplete or exclude the digested fragments, thus enriching for undigested DNA from the organism of interest. As a proof of concept, we apply this method to enrich for the enterobacteria Escherichia coli and Salmonella enterica by 11- to 142-fold from mock metagenomic samples and validate this approach as a versatile means to enrich for genomes of interest in metagenomic samples. IMPORTANCE Pathogens that contaminate the food supply or spread through other means can cause outbreaks that bring devastating repercussions to the health of a populace. Investigations to trace the source of these outbreaks are initiated rapidly but can be drawn out due to the labored methods of pathogen isolation. Metagenomic sequencing can alleviate this hurdle but is often insufficiently sensitive. The approach and implementations detailed here provide a rapid means to enrich for many pathogens involved in foodborne outbreaks, thereby improving the utility of metagenomic sequencing as a tool in outbreak investigations. Additionally, this approach provides a means to broadly enrich for otherwise minute levels of modified DNA, which may escape unnoticed in metagenomic samples.
Collapse
Affiliation(s)
- Syed Usman Enam
- Department of Genetics, Stanford University School of Medicine, Stanford, California, USA
| | - Joshua L. Cherry
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
- Division of International Epidemiology and Population Studies, Fogarty International Center, National Institutes of Health, Bethesda, Maryland, USA
| | - Susan R. Leonard
- Office of Applied Research and Safety Assessment, Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration, Laurel, Maryland, USA
| | - Ivan N. Zheludev
- Department of Biochemistry, Stanford University School of Medicine, Stanford, California, USA
| | - David J. Lipman
- Office of the Center Director, Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration, Silver Spring, Maryland, USA
| | - Andrew Z. Fire
- Department of Genetics, Stanford University School of Medicine, Stanford, California, USA
- Department of Pathology, Stanford University School of Medicine, Stanford, California, USA
| |
Collapse
|
8
|
Adaptation to simulated microgravity in Streptococcus mutans. NPJ Microgravity 2022; 8:17. [PMID: 35654802 PMCID: PMC9163064 DOI: 10.1038/s41526-022-00205-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Accepted: 05/13/2022] [Indexed: 11/08/2022] Open
Abstract
Long-term space missions have shown an increased incidence of oral disease in astronauts’ and as a result, are one of the top conditions predicted to impact future missions. Here we set out to evaluate the adaptive response of Streptococcus mutans (etiological agent of dental caries) to simulated microgravity. This organism has been well studied on earth and treatment strategies are more predictable. Despite this, we are unsure how the bacterium will respond to the environmental stressors in space. We used experimental evolution for 100-days in high aspect ratio vessels followed by whole genome resequencing to evaluate this adaptive response. Our data shows that planktonic S. mutans did evolve variants in three genes (pknB, SMU_399 and SMU_1307c) that can be uniquely attributed to simulated microgravity populations. In addition, collection of data at multiple time points showed mutations in three additional genes (SMU_399, ptsH and rex) that were detected earlier in simulated microgravity populations than in the normal gravity controls, many of which are consistent with other studies. Comparison of virulence-related phenotypes between biological replicates from simulated microgravity and control orientation cultures generally showed few changes in antibiotic susceptibility, while acid tolerance and adhesion varied significantly between biological replicates and decreased as compared to the ancestral populations. Most importantly, our data shows the importance of a parallel normal gravity control, sequencing at multiple time points and the use of biological replicates for appropriate analysis of adaptation in simulated microgravity.
Collapse
|
9
|
Cherchame E, Guillier L, Lailler R, Vignaud ML, Jourdan-Da Silva N, Le Hello S, Weill FX, Cadel-Six S. Salmonella enterica subsp. enterica Welikade: guideline for phylogenetic analysis of serovars rarely involved in foodborne outbreaks. BMC Genomics 2022; 23:217. [PMID: 35303794 PMCID: PMC8933937 DOI: 10.1186/s12864-022-08439-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Accepted: 02/23/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Salmonella spp. is a major foodborne pathogen with a wide variety of serovars associated with human cases and food sources. Nevertheless, in Europe a panel of ten serovars is responsible for up to 80% of confirmed human cases. Clustering studies by single nucleotide polymorphism (SNP) core-genome phylogenetic analysis of outbreaks due to these major serovars are simplified by the availability of many complete genomes in the free access databases. This is not the case for outbreaks due to less common serovars, such as Welikade, for which no reference genomes are available. In this study, we propose a method to solve this problem. We propose to perform a core genome MLST (cgMLST) analysis based on hierarchical clustering using the free-access EnteroBase to select the most suitable genome to use as a reference for SNP phylogenetic analysis. In this study, we applied this protocol to a retrospective analysis of a Salmonella enterica serovar Welikade (S. Welikade) foodborne outbreak that occurred in France in 2016. Finally, we compared the cgMLST and SNP analyses. SNP phylogenetic reconstruction was carried out considering the effect of recombination events identified by the ClonalFrameML tool. The accessory genome was also explored by phage content and virulome analyses. RESULTS Our findings revealed high clustering concordance using cgMLST and SNP analyses. Nevertheless, SNP analysis allowed for better assessment of the genetic distance among strains. The results revealed epidemic clones of S. Welikade circulating within the poultry and dairy sectors in France, responsible for sporadic and non-sporadic human cases between 2012 and 2019. CONCLUSIONS This study increases knowledge on this poorly described serovar and enriches public genome databases with 42 genomes from human and non-human S. Welikade strains, including the isolate collected in 1956 in Sri Lanka, which gave the name to this serovar. This is the first genomic analysis of an outbreak due to S. Welikade described to date.
Collapse
Affiliation(s)
- Emeline Cherchame
- Laboratory for Food Safety, French Agency for Food, Environmental and Occupational Health & Safety (ANSES), 94700, Maisons-Alfort, France. .,Present address: Data Analysis Core, Paris Brain Institute, ICM, Paris, France.
| | - Laurent Guillier
- Laboratory for Food Safety, French Agency for Food, Environmental and Occupational Health & Safety (ANSES), 94700, Maisons-Alfort, France
| | - Renaud Lailler
- Laboratory for Food Safety, French Agency for Food, Environmental and Occupational Health & Safety (ANSES), 94700, Maisons-Alfort, France
| | - Marie-Leone Vignaud
- Laboratory for Food Safety, French Agency for Food, Environmental and Occupational Health & Safety (ANSES), 94700, Maisons-Alfort, France
| | | | - Simon Le Hello
- Centre National de Référence Des Escherichia Coli, Institut Pasteur, Unité Des Bactéries Pathogènes Entériques, Shigella et Salmonella, 75015, Paris, France.,Present address: Groupe de Recherche Sur L'Adaptation Microbienne (GRAM 2.0), Normandie Univ, UNICAEN, Caen, France
| | - François-Xavier Weill
- Centre National de Référence Des Escherichia Coli, Institut Pasteur, Unité Des Bactéries Pathogènes Entériques, Shigella et Salmonella, 75015, Paris, France
| | - Sabrina Cadel-Six
- Laboratory for Food Safety, French Agency for Food, Environmental and Occupational Health & Safety (ANSES), 94700, Maisons-Alfort, France
| |
Collapse
|
10
|
Turner D, Adriaenssens EM, Tolstoy I, Kropinski AM. Phage Annotation Guide: Guidelines for Assembly and High-Quality Annotation. PHAGE (NEW ROCHELLE, N.Y.) 2021; 2:170-182. [PMID: 35083439 PMCID: PMC8785237 DOI: 10.1089/phage.2021.0013] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
All sequencing projects of bacteriophages (phages) should seek to report an accurate and comprehensive annotation of their genomes. This article defines 14 questions for those new to phage genomics that should be addressed before submitting a genome sequence to the International Nucleotide Sequence Database Collaboration or writing a publication.
Collapse
Affiliation(s)
- Dann Turner
- Department of Applied Sciences, Faculty of Health and Applied Sciences, University of the West of England, Bristol, United Kingdom
| | | | - Igor Tolstoy
- Viral Resources, National Center for Biotechnology Information, U.S. National Library of Medicine, Bethesda, Maryland, USA
| | - Andrew M. Kropinski
- Department of Food Science, and University of Guelph, Guelph, Ontario, Canada
- Department of Pathobiology, University of Guelph, Guelph, Ontario, Canada
| |
Collapse
|
11
|
Kajitani R, Yoshimura D, Ogura Y, Gotoh Y, Hayashi T, Itoh T. Platanus_B: an accurate de novo assembler for bacterial genomes using an iterative error-removal process. DNA Res 2021; 27:5870828. [PMID: 32658266 PMCID: PMC7433917 DOI: 10.1093/dnares/dsaa014] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Accepted: 07/07/2020] [Indexed: 11/14/2022] Open
Abstract
De novo assembly of short DNA reads remains an essential technology, especially for large-scale projects and high-resolution variant analyses in epidemiology. However, the existing tools often lack sufficient accuracy required to compare closely related strains. To facilitate such studies on bacterial genomes, we developed Platanus_B, a de novo assembler that employs iterations of multiple error-removal algorithms. The benchmarks demonstrated the superior accuracy and high contiguity of Platanus_B, in addition to its ability to enhance the hybrid assembly of both short and nanopore long reads. Although the hybrid strategies for short and long reads were effective in achieving near full-length genomes, we found that short-read-only assemblies generated with Platanus_B were sufficient to obtain ≥90% of exact coding sequences in most cases. In addition, while nanopore long-read-only assemblies lacked fine-scale accuracies, inclusion of short reads was effective in improving the accuracies. Platanus_B can, therefore, be used for comprehensive genomic surveillances of bacterial pathogens and high-resolution phylogenomic analyses of a wide range of bacteria.
Collapse
Affiliation(s)
- Rei Kajitani
- School of Life Science and Technology, Tokyo Institute of Technology, Tokyo 152-8550, Japan
| | - Dai Yoshimura
- School of Life Science and Technology, Tokyo Institute of Technology, Tokyo 152-8550, Japan
| | - Yoshitoshi Ogura
- Division of Microbiology, Department of Infectious Medicine, Kurume University School of Medicine, Kurume, Fukuoka 830-0011, Japan.,Department of Bacteriology, Graduate School of Medical Sciences, Kyushu University, Fukuoka 812-8582, Japan
| | - Yasuhiro Gotoh
- Department of Bacteriology, Graduate School of Medical Sciences, Kyushu University, Fukuoka 812-8582, Japan
| | - Tetsuya Hayashi
- Department of Bacteriology, Graduate School of Medical Sciences, Kyushu University, Fukuoka 812-8582, Japan
| | - Takehiko Itoh
- School of Life Science and Technology, Tokyo Institute of Technology, Tokyo 152-8550, Japan
| |
Collapse
|
12
|
Carroll LM, Cheng RA, Wiedmann M, Kovac J. Keeping up with the Bacillus cereus group: taxonomy through the genomics era and beyond. Crit Rev Food Sci Nutr 2021; 62:7677-7702. [PMID: 33939559 DOI: 10.1080/10408398.2021.1916735] [Citation(s) in RCA: 65] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The Bacillus cereus group, also known as B. cereus sensu lato (s.l.), is a species complex that contains numerous closely related lineages, which vary in their ability to cause illness in humans and animals. The classification of B. cereus s.l. isolates into species-level taxonomic units is thus essential for informing public health and food safety efforts. However, taxonomic classification of these organisms is challenging. Numerous-often conflicting-taxonomic changes to the group have been proposed over the past two decades, making it difficult to remain up to date. In this review, we discuss the major nomenclatural changes that have accumulated in the B. cereus s.l. taxonomic space prior to 2020, particularly in the genomic sequencing era, and outline the resulting problems. We discuss several contemporary taxonomic frameworks as applied to B. cereus s.l., including (i) phenotypic, (ii) genomic, and (iii) hybrid nomenclatural frameworks, and we discuss the advantages and disadvantages of each. We offer suggestions as to how readers can avoid B. cereus s.l. taxonomic ambiguities, regardless of the nomenclatural framework(s) they choose to employ. Finally, we discuss future directions and open problems in the B. cereus s.l. taxonomic realm, including those that cannot be solved by genomic approaches alone.
Collapse
Affiliation(s)
- Laura M Carroll
- Structural and Computational Biology Unit, EMBL, Heidelberg, Germany
| | - Rachel A Cheng
- Department of Food Science, Cornell University, Ithaca, New York, USA
| | - Martin Wiedmann
- Department of Food Science, Cornell University, Ithaca, New York, USA
| | - Jasna Kovac
- Department of Food Science, The Pennsylvania State University, University Park, Pennsylvania, USA
| |
Collapse
|
13
|
Qi H, Li L, Zhang G. Construction of a chromosome-level genome and variation map for the Pacific oyster Crassostrea gigas. Mol Ecol Resour 2021; 21:1670-1685. [PMID: 33655634 DOI: 10.1111/1755-0998.13368] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 02/17/2021] [Accepted: 02/23/2021] [Indexed: 12/11/2022]
Abstract
The Pacific oyster (Crassostrea gigas) is a widely distributed marine bivalve of great ecological and economic importance. In this study, we provide a high-quality chromosome-level genome assembled using Pacific Bioscience long reads and Hi-C-based and linkage-map-based scaffolding technologies and a high-resolution variation map constructed using large-scale resequencing analysis. The 586.8 Mb genome consists of 10 pseudochromosome sequences ranging from 38.6 to 78.9 Mb, containing 301 contigs with an N50 size of 3.1 Mb. A total of 30,078 protein-coding genes were predicted, of which 22,757 (75.7%) were high-reliability annotations supported by a homologous match to a curated protein in the SWISS-PROT database or transcript expression. Although a medium level of repeat components (57.2%) was detected, the genomic content of the segmental duplications reached 26.2%, which is the highest among the reported genomes. By whole genome resequencing analysis of 495 Pacific oysters, a comprehensive variation map was built, comprised of 4.78 million single nucleotide polymorphisms, 0.60 million short insertions and deletions, and 49,333 copy number variation regions. The structural variations can lead to an average interindividual genomic divergence of 0.21, indicating their crucial role in shaping the Pacific oyster genome diversity. The large amount of mosaic distributed repeat elements, small variations, and copy number variations indicate that the Pacific oyster is a diploid organism with an extremely high genomic complexity at the intra- and interindividual level. The genome and variation maps can improve our understanding of oyster genome diversity and enrich the resources for oyster molecular evolution, comparative genomics, and genetic research.
Collapse
Affiliation(s)
- Haigang Qi
- Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, China.,Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao, China.,Center for Ocean Mega-Science, Chinese Academy of Sciences, Qingdao, China.,National and Local Joint Engineering Laboratory of Ecological Mariculture, Qingdao, China
| | - Li Li
- Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, China.,Center for Ocean Mega-Science, Chinese Academy of Sciences, Qingdao, China.,Laboratory for Marine Fisheries Science and Food Production Processes, Qingdao National Laboratory for Marine Science and Technology, Qingdao, China.,National and Local Joint Engineering Laboratory of Ecological Mariculture, Qingdao, China
| | - Guofan Zhang
- Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, China.,Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao, China.,Center for Ocean Mega-Science, Chinese Academy of Sciences, Qingdao, China.,National and Local Joint Engineering Laboratory of Ecological Mariculture, Qingdao, China
| |
Collapse
|
14
|
Mohan V, Cruz CD, van Vliet AHM, Pitman AR, Visnovsky SB, Rivas L, Gilpin B, Fletcher GC. Genomic diversity of Listeria monocytogenes isolates from seafood, horticulture and factory environments in New Zealand. Int J Food Microbiol 2021; 347:109166. [PMID: 33838478 DOI: 10.1016/j.ijfoodmicro.2021.109166] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Revised: 02/28/2021] [Accepted: 03/06/2021] [Indexed: 11/28/2022]
Abstract
Listeria monocytogenes is a foodborne human pathogen that causes systemic infection, fetal-placental infection in pregnant women causing abortion and stillbirth and meningoencephalitis in elderly and immunocompromised individuals. This study aimed to analyse L. monocytogenes from different sources from New Zealand (NZ) and to compare them with international strains. We used pulsed-field gel electrophoresis (PFGE), multilocus sequence typing (MLST) and whole-genome single nucleotide polymorphisms (SNP) to study the population structure of the NZ L. monocytogenes isolates and their relationship with the international strains. The NZ isolates formed unique clusters in PFGE, MLST and whole-genome SNP comparisons compared to the international isolates for which data were available. PFGE identified 31 AscI and 29 ApaI PFGE patterns with indistinguishable pulsotypes being present in seafood, horticultural products and environmental samples. Apart from the Asc0002:Apa0002 pulsotype which was distributed across different sources, other pulsotypes were site or factory associated. Whole-genome analysis of 200 randomly selected L. monocytogenes isolates revealed that lineage II dominated the NZ L. monocytogenes populations. MLST comparison of international and NZ isolates with lineage II accounted for 89% (177 of 200) of the total L. monocytogenes population, while the international representation was 45.3% (1674 of 3473). Rarefaction analysis showed that sequence type richness was greater in NZ isolates compared to international trend, however, it should be noted that NZ isolates predominantly came from seafood, horticulture and their respective processing environments or factories, unlike international isolates where there was a good mixture of clinical, food and environmental isolates.
Collapse
Affiliation(s)
- Vathsala Mohan
- The New Zealand Institute for Plant & Food Research Limited, Auckland, New Zealand.
| | - Cristina D Cruz
- The New Zealand Institute for Plant & Food Research Limited, Auckland, New Zealand
| | - Arnoud H M van Vliet
- Department of Pathology and Infectious Diseases, School of Veterinary Medicine, Faculty of Health and Medical Sciences, University of Surrey, Daphne Jackson Road, Guildford GU2 7AL, Surrey, United Kingdom
| | - Andrew R Pitman
- The New Zealand Institute for Plant & Food Research Limited, Lincoln, New Zealand.
| | - Sandra B Visnovsky
- The New Zealand Institute for Plant & Food Research Limited, Lincoln, New Zealand
| | - Lucia Rivas
- Institute of Environmental Science and Research Limited, Christchurch, New Zealand
| | - Brent Gilpin
- Institute of Environmental Science and Research Limited, Christchurch, New Zealand
| | - Graham C Fletcher
- The New Zealand Institute for Plant & Food Research Limited, Auckland, New Zealand
| |
Collapse
|
15
|
Whole Genome Sequence Analysis of Brucella abortus Isolates from Various Regions of South Africa. Microorganisms 2021; 9:microorganisms9030570. [PMID: 33799545 PMCID: PMC7998772 DOI: 10.3390/microorganisms9030570] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 02/10/2021] [Accepted: 02/15/2021] [Indexed: 11/17/2022] Open
Abstract
The availability of whole genome sequences in public databases permits genome-wide comparative studies of various bacterial species. Whole genome sequence-single nucleotide polymorphisms (WGS-SNP) analysis has been used in recent studies and allows the discrimination of various Brucella species and strains. In the present study, 13 Brucella spp. strains from cattle of various locations in provinces of South Africa were typed and discriminated. WGS-SNP analysis indicated a maximum pairwise distance ranging from 4 to 77 single nucleotide polymorphisms (SNPs) between the South African Brucella abortus virulent field strains. Moreover, it was shown that the South African B. abortus strains grouped closely to B. abortus strains from Mozambique and Zimbabwe, as well as other Eurasian countries, such as Portugal and India. WGS-SNP analysis of South African B. abortus strains demonstrated that the same genotype circulated in one farm (Farm 1), whereas another farm (Farm 2) in the same province had two different genotypes. This indicated that brucellosis in South Africa spreads within the herd on some farms, whereas the introduction of infected animals is the mode of transmission on other farms. Three B. abortus vaccine S19 strains isolated from tissue and aborted material were identical, even though they originated from different herds and regions of South Africa. This might be due to the incorrect vaccination of animals older than the recommended age of 4-8 months or might be a problem associated with vaccine production.
Collapse
|
16
|
Liu L, Bosse M, Megens H, de Visser M, A. M. Groenen M, Madsen O. Genetic consequences of long-term small effective population size in the critically endangered pygmy hog. Evol Appl 2021; 14:710-720. [PMID: 33767746 PMCID: PMC7980308 DOI: 10.1111/eva.13150] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Revised: 10/11/2020] [Accepted: 10/13/2020] [Indexed: 12/24/2022] Open
Abstract
Increasing human disturbance and climate change have a major impact on habitat integrity and size, with far-reaching consequences for wild fauna and flora. Specifically, population decline and habitat fragmentation result in small, isolated populations. To what extend different endangered species can cope with small population size is still largely unknown. Studies on the genomic landscape of these species can shed light on past demographic dynamics and current genetic load, thereby also providing guidance for conservation programs. The pygmy hog (Porcula salvania) is the smallest and rarest wild pig in the world, with current estimation of only a few hundred living in the wild. Here, we analyzed whole-genome sequencing data of six pygmy hogs, three from the wild and three from a captive population, along with 30 pigs representing six other Suidae. First, we show that the pygmy hog had a very small population size with low genetic diversity over the course of the past ~1 million years. One indication of historical small effective population size is the absence of mitochondrial variation in the six sequenced individuals. Second, we evaluated the impact of historical demography. Runs of homozygosity (ROH) analysis suggests that the pygmy hog population has gone through past but not recent inbreeding. Also, the long-term, extremely small population size may have led to the accumulation of harmful mutations suggesting that the accumulation of deleterious mutations is exceeding purifying selection in this species. Thus, care has to be taken in the conservation program to avoid or minimize the potential for further inbreeding depression, and guard against environmental changes in the future.
Collapse
Affiliation(s)
- Langqing Liu
- Animal Breeding and GenomicsWageningen University & ResearchWageningenthe Netherlands
| | - Mirte Bosse
- Animal Breeding and GenomicsWageningen University & ResearchWageningenthe Netherlands
| | - Hendrik‐Jan Megens
- Animal Breeding and GenomicsWageningen University & ResearchWageningenthe Netherlands
| | - Manon de Visser
- Animal Breeding and GenomicsWageningen University & ResearchWageningenthe Netherlands
| | - Martien A. M. Groenen
- Animal Breeding and GenomicsWageningen University & ResearchWageningenthe Netherlands
| | - Ole Madsen
- Animal Breeding and GenomicsWageningen University & ResearchWageningenthe Netherlands
| |
Collapse
|
17
|
Barretto C, Rincón C, Portmann AC, Ngom-Bru C. Whole Genome Sequencing Applied to Pathogen Source Tracking in Food Industry: Key Considerations for Robust Bioinformatics Data Analysis and Reliable Results Interpretation. Genes (Basel) 2021; 12:275. [PMID: 33671973 PMCID: PMC7919020 DOI: 10.3390/genes12020275] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 01/28/2021] [Accepted: 02/08/2021] [Indexed: 12/31/2022] Open
Abstract
Whole genome sequencing (WGS) has arisen as a powerful tool to perform pathogen source tracking in the food industry thanks to several developments in recent years. However, the cost associated to this technology and the degree of expertise required to accurately process and understand the data has limited its adoption at a wider scale. Additionally, the time needed to obtain actionable information is often seen as an impairment for the application and use of the information generated via WGS. Ongoing work towards standardization of wet lab including sequencing protocols, following guidelines from the regulatory authorities and international standardization efforts make the technology more and more accessible. However, data analysis and results interpretation guidelines are still subject to initiatives coming from distinct groups and institutions. There are multiple bioinformatics software and pipelines developed to handle such information. Nevertheless, little consensus exists on a standard way to process the data and interpret the results. Here, we want to present the constraints we face in an industrial setting and the steps we consider necessary to obtain high quality data, reproducible results and a robust interpretation of the obtained information. All of this, in a time frame allowing for data-driven actions supporting factories and their needs.
Collapse
Affiliation(s)
- Caroline Barretto
- Institute of Food Safety and Analytical Sciences, Nestlé Research, 1000 Lausanne 26, Switzerland; (C.R.); (A.-C.P.); (C.N.-B.)
| | | | | | | |
Collapse
|
18
|
Valiente-Mullor C, Beamud B, Ansari I, Francés-Cuesta C, García-González N, Mejía L, Ruiz-Hueso P, González-Candelas F. One is not enough: On the effects of reference genome for the mapping and subsequent analyses of short-reads. PLoS Comput Biol 2021; 17:e1008678. [PMID: 33503026 PMCID: PMC7870062 DOI: 10.1371/journal.pcbi.1008678] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2020] [Revised: 02/08/2021] [Accepted: 01/05/2021] [Indexed: 12/17/2022] Open
Abstract
Mapping of high-throughput sequencing (HTS) reads to a single arbitrary reference genome is a frequently used approach in microbial genomics. However, the choice of a reference may represent a source of errors that may affect subsequent analyses such as the detection of single nucleotide polymorphisms (SNPs) and phylogenetic inference. In this work, we evaluated the effect of reference choice on short-read sequence data from five clinically and epidemiologically relevant bacteria (Klebsiella pneumoniae, Legionella pneumophila, Neisseria gonorrhoeae, Pseudomonas aeruginosa and Serratia marcescens). Publicly available whole-genome assemblies encompassing the genomic diversity of these species were selected as reference sequences, and read alignment statistics, SNP calling, recombination rates, dN/dS ratios, and phylogenetic trees were evaluated depending on the mapping reference. The choice of different reference genomes proved to have an impact on almost all the parameters considered in the five species. In addition, these biases had potential epidemiological implications such as including/excluding isolates of particular clades and the estimation of genetic distances. These findings suggest that the single reference approach might introduce systematic errors during mapping that affect subsequent analyses, particularly for data sets with isolates from genetically diverse backgrounds. In any case, exploring the effects of different references on the final conclusions is highly recommended. Mapping consists in the alignment of reads (i.e., DNA fragments) obtained through high-throughput genome sequencing to a previously assembled reference sequence. It is a common practice in genomic studies to use a single reference for mapping, usually the ‘reference genome’ of a species—a high-quality assembly. However, the selection of an optimal reference is hindered by intrinsic intra-species genetic variability, particularly in bacteria. It is known that genetic differences between the reference genome and the read sequences may produce incorrect alignments during mapping. Eventually, these errors could lead to misidentification of variants and biased reconstruction of phylogenetic trees (which reflect ancestry between different bacterial lineages). To our knowledge, this is the first work to systematically examine the effect of different references for mapping on the inference of tree topology as well as the impact on recombination and natural selection inferences. Furthermore, the novelty of this work relies on a procedure that guarantees that we are evaluating only the effect of the reference. This effect has proved to be pervasive in the five bacterial species that we have studied and, in some cases, alterations in phylogenetic trees could lead to incorrect epidemiological inferences. Hence, the use of different reference genomes may be prescriptive to assess the potential biases of mapping.
Collapse
Affiliation(s)
- Carlos Valiente-Mullor
- Joint Research Unit “Infection and Public Health” FISABIO-University of Valencia, Institute for Integrative Systems Biology (I2SysBio), Valencia, Spain
| | - Beatriz Beamud
- Joint Research Unit “Infection and Public Health” FISABIO-University of Valencia, Institute for Integrative Systems Biology (I2SysBio), Valencia, Spain
- * E-mail: (BB); (FG-C)
| | - Iván Ansari
- Joint Research Unit “Infection and Public Health” FISABIO-University of Valencia, Institute for Integrative Systems Biology (I2SysBio), Valencia, Spain
| | - Carlos Francés-Cuesta
- Joint Research Unit “Infection and Public Health” FISABIO-University of Valencia, Institute for Integrative Systems Biology (I2SysBio), Valencia, Spain
| | - Neris García-González
- Joint Research Unit “Infection and Public Health” FISABIO-University of Valencia, Institute for Integrative Systems Biology (I2SysBio), Valencia, Spain
| | - Lorena Mejía
- Joint Research Unit “Infection and Public Health” FISABIO-University of Valencia, Institute for Integrative Systems Biology (I2SysBio), Valencia, Spain
- Instituto de Microbiología, Colegio de Ciencias Biológicas y Ambientales, Universidad San Francisco de Quito, Quito, Ecuador
| | - Paula Ruiz-Hueso
- Joint Research Unit “Infection and Public Health” FISABIO-University of Valencia, Institute for Integrative Systems Biology (I2SysBio), Valencia, Spain
| | - Fernando González-Candelas
- Joint Research Unit “Infection and Public Health” FISABIO-University of Valencia, Institute for Integrative Systems Biology (I2SysBio), Valencia, Spain
- CIBER in Epidemiology and Public Health, Valencia, Spain
- * E-mail: (BB); (FG-C)
| |
Collapse
|
19
|
Abstract
Read alignment is the central step of many analytic pipelines that perform variant calling. To reduce error, it is common practice to pre-process raw sequencing reads to remove low-quality bases and residual adapter contamination, a procedure collectively known as ‘trimming’. Trimming is widely assumed to increase the accuracy of variant calling, although there are relatively few systematic evaluations of its effects and no clear consensus on its efficacy. As sequencing datasets increase both in number and size, it is worthwhile reappraising computational operations of ambiguous benefit, particularly when the scope of many analyses now routinely incorporates thousands of samples, increasing the time and cost required. Using a curated set of 17 Gram-negative bacterial genomes, this study initially evaluated the impact of four read-trimming utilities (Atropos, fastp, Trim Galore and Trimmomatic), each used with a range of stringencies, on the accuracy and completeness of three bacterial SNP-calling pipelines. It was found that read trimming made only small, and statistically insignificant, increases in SNP-calling accuracy even when using the highest-performing pre-processor in this study, fastp. To extend these findings, >6500 publicly archived sequencing datasets from Escherichia coli, Mycobacterium tuberculosis and Staphylococcus aureus were re-analysed using a common analytic pipeline. Of the approximately 125 million SNPs and 1.25 million indels called across all samples, the same bases were called in 98.8 and 91.9 % of cases, respectively, irrespective of whether raw reads or trimmed reads were used. Nevertheless, the proportion of mixed calls (i.e. calls where <100 % of the reads support the variant allele; considered a proxy of false positives) was significantly reduced after trimming, which suggests that while trimming rarely alters the set of variant bases, it can affect the proportion of reads supporting each call. It was concluded that read quality- and adapter-trimming add relatively little value to a SNP-calling pipeline and may only be necessary if small differences in the absolute number of SNP calls, or the false call rate, are critical. Broadly similar conclusions can be drawn about the utility of trimming to an indel-calling pipeline. Read trimming remains routinely performed prior to variant calling likely out of concern that doing otherwise would typically have negative consequences. While historically this may have been the case, the data in this study suggests that read trimming is not always a practical necessity.
Collapse
Affiliation(s)
- Stephen J Bush
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
| |
Collapse
|
20
|
Liang KYH, Orata FD, Islam MT, Nasreen T, Alam M, Tarr CL, Boucher YF. A Vibrio cholerae Core Genome Multilocus Sequence Typing Scheme To Facilitate the Epidemiological Study of Cholera. J Bacteriol 2020; 202:e00086-20. [PMID: 32540931 PMCID: PMC7685551 DOI: 10.1128/jb.00086-20] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Accepted: 06/07/2020] [Indexed: 12/11/2022] Open
Abstract
Core genome multilocus sequence typing (cgMLST) has gained popularity in recent years in epidemiological research and subspecies-level classification. cgMLST retains the intuitive nature of traditional MLST but offers much greater resolution by utilizing significantly larger portions of the genome. Here, we introduce a cgMLST scheme for Vibrio cholerae, a bacterium abundant in marine and freshwater environments and the etiologic agent of cholera. A set of 2,443 core genes ubiquitous in V. cholerae were used to analyze a comprehensive data set of 1,262 clinical and environmental strains collected from 52 countries, including 65 newly sequenced genomes in this study. We established a sublineage threshold based on 133 allelic differences that creates clusters nearly identical to traditional MLST types, providing backwards compatibility to new cgMLST classifications. We also defined an outbreak threshold based on seven allelic differences that is capable of identifying strains from the same outbreak and closely related isolates that could give clues on outbreak origin. Using cgMLST, we confirmed the South Asian origin of modern epidemics and identified clustering affinity among sublineages of environmental isolates from the same geographic origin. Advantages of this method are highlighted by direct comparison with existing classification methods, such as MLST and single-nucleotide polymorphism-based methods. cgMLST outperforms all existing methods in terms of resolution, standardization, and ease of use. We anticipate this scheme will serve as a basis for a universally applicable and standardized classification system for V. cholerae research and epidemiological surveillance in the future. This cgMLST scheme is publicly available on PubMLST (https://pubmlst.org/vcholerae/).IMPORTANCE Toxigenic Vibrio cholerae isolates of the O1 and O139 serogroups are the causative agents of cholera, an acute diarrheal disease that plagued the world for centuries, if not millennia. Here, we introduce a core genome multilocus sequence typing scheme for V. cholerae Using this scheme, we have standardized the definition for subspecies-level classification, facilitating global collaboration in the surveillance of V. cholerae In addition, this typing scheme allows for quick identification of outbreak-related isolates that can guide subsequent analyses, serving as an important first step in epidemiological research. This scheme is also easily scalable to analyze thousands of isolates at various levels of resolution, making it an invaluable tool for large-scale ecological and evolutionary analyses.
Collapse
Affiliation(s)
- Kevin Y H Liang
- Department of Biological Sciences, University of Alberta, Edmonton, Alberta, Canada
| | - Fabini D Orata
- Department of Biological Sciences, University of Alberta, Edmonton, Alberta, Canada
| | | | - Tania Nasreen
- Department of Biological Sciences, University of Alberta, Edmonton, Alberta, Canada
| | - Munirul Alam
- Infectious Diseases Division, International Centre for Diarrhoeal Disease Research, Dhaka, Bangladesh
| | - Cheryl L Tarr
- Enteric Diseases Laboratory Branch, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Yann F Boucher
- Department of Biological Sciences, University of Alberta, Edmonton, Alberta, Canada
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore
- Singapore Center for Environmental Life Sciences Engineering, National University of Singapore, Singapore, Singapore
| |
Collapse
|
21
|
Carroll LM, Wiedmann M. Cereulide Synthetase Acquisition and Loss Events within the Evolutionary History of Group III Bacillus cereus Sensu Lato Facilitate the Transition between Emetic and Diarrheal Foodborne Pathogens. mBio 2020; 11:e01263-20. [PMID: 32843545 PMCID: PMC7448271 DOI: 10.1128/mbio.01263-20] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 07/17/2020] [Indexed: 11/20/2022] Open
Abstract
Cereulide-producing members of Bacillus cereussensu lato group III (also known as emetic B. cereus) possess cereulide synthetase, a plasmid-encoded, nonribosomal peptide synthetase encoded by the ces gene cluster. Despite the documented risks that cereulide-producing strains pose to public health, the level of genomic diversity encompassed by emetic B. cereus has never been evaluated at a whole-genome scale. Here, we employ a phylogenomic approach to characterize group III B. cereussensu lato genomes which possess ces (ces positive) alongside their closely related, ces-negative counterparts (i) to assess the genomic diversity encompassed by emetic B. cereus and (ii) to identify potential ces loss and/or gain events within the evolutionary history of the high-risk and medically relevant sequence type (ST) 26 lineage often associated with emetic foodborne illness. Using all publicly available ces-positive group III B. cereussensu lato genomes and the ces-negative genomes interspersed among them (n = 159), we show that emetic B. cereus is not clonal; rather, multiple lineages within group III harbor cereulide-producing strains, all of which share an ancestor incapable of producing cereulide (posterior probability = 0.86 to 0.89). Members of ST 26 share an ancestor that existed circa 1748 (95% highest posterior density [HPD] interval = 1246.89 to 1915.64) and first acquired the ability to produce cereulide before 1876 (95% HPD = 1641.43 to 1946.70). Within ST 26 alone, two subsequent ces gain events were observed, as well as three ces loss events, including among isolates responsible for B. cereussensu lato toxicoinfection (i.e., "diarrheal" illness).IMPORTANCEB. cereus is responsible for thousands of cases of foodborne disease each year worldwide, causing two distinct forms of illness: (i) intoxication via cereulide (i.e., emetic syndrome) or (ii) toxicoinfection via multiple enterotoxins (i.e., diarrheal syndrome). Here, we show that emetic B. cereus is not a clonal, homogenous unit that resulted from a single cereulide synthetase gain event followed by subsequent proliferation; rather, cereulide synthetase acquisition and loss is a dynamic, ongoing process that occurs across lineages, allowing some group III B. cereussensu lato populations to oscillate between diarrheal and emetic foodborne pathogens over the course of their evolutionary histories. We also highlight the care that must be taken when selecting a reference genome for whole-genome sequencing-based investigation of emetic B. cereussensu lato outbreaks, since some reference genome selections can lead to a confounding loss of resolution and potentially hinder epidemiological investigations.
Collapse
Affiliation(s)
- Laura M Carroll
- Department of Food Science, Cornell University, Ithaca, New York, USA
| | - Martin Wiedmann
- Department of Food Science, Cornell University, Ithaca, New York, USA
| |
Collapse
|
22
|
Frentrup M, Zhou Z, Steglich M, Meier-Kolthoff JP, Göker M, Riedel T, Bunk B, Spröer C, Overmann J, Blaschitz M, Indra A, von Müller L, Kohl TA, Niemann S, Seyboldt C, Klawonn F, Kumar N, Lawley TD, García-Fernández S, Cantón R, del Campo R, Zimmermann O, Groß U, Achtman M, Nübel U. A publicly accessible database for Clostridioides difficile genome sequences supports tracing of transmission chains and epidemics. Microb Genom 2020; 6:mgen000410. [PMID: 32726198 PMCID: PMC7641423 DOI: 10.1099/mgen.0.000410] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Accepted: 06/30/2020] [Indexed: 01/02/2023] Open
Abstract
Clostridioides difficile is the primary infectious cause of antibiotic-associated diarrhea. Local transmissions and international outbreaks of this pathogen have been previously elucidated by bacterial whole-genome sequencing, but comparative genomic analyses at the global scale were hampered by the lack of specific bioinformatic tools. Here we introduce a publicly accessible database within EnteroBase (http://enterobase.warwick.ac.uk) that automatically retrieves and assembles C. difficile short-reads from the public domain, and calls alleles for core-genome multilocus sequence typing (cgMLST). We demonstrate that comparable levels of resolution and precision are attained by EnteroBase cgMLST and single-nucleotide polymorphism analysis. EnteroBase currently contains 18 254 quality-controlled C. difficile genomes, which have been assigned to hierarchical sets of single-linkage clusters by cgMLST distances. This hierarchical clustering is used to identify and name populations of C. difficile at all epidemiological levels, from recent transmission chains through to epidemic and endemic strains. Moreover, it puts newly collected isolates into phylogenetic and epidemiological context by identifying related strains among all previously published genome data. For example, HC2 clusters (i.e. chains of genomes with pairwise distances of up to two cgMLST alleles) were statistically associated with specific hospitals (P<10-4) or single wards (P=0.01) within hospitals, indicating they represented local transmission clusters. We also detected several HC2 clusters spanning more than one hospital that by retrospective epidemiological analysis were confirmed to be associated with inter-hospital patient transfers. In contrast, clustering at level HC150 correlated with k-mer-based classification and was largely compatible with PCR ribotyping, thus enabling comparisons to earlier surveillance data. EnteroBase enables contextual interpretation of a growing collection of assembled, quality-controlled C. difficile genome sequences and their associated metadata. Hierarchical clustering rapidly identifies database entries that are related at multiple levels of genetic distance, facilitating communication among researchers, clinicians and public-health officials who are combatting disease caused by C. difficile.
Collapse
Affiliation(s)
| | - Zhemin Zhou
- Warwick Medical School, University of Warwick, UK
| | - Matthias Steglich
- Leibniz Institute DSMZ, Braunschweig, Germany
- German Center for Infection Research (DZIF), Partner site Hannover-Braunschweig, Germany
| | | | | | - Thomas Riedel
- Leibniz Institute DSMZ, Braunschweig, Germany
- German Center for Infection Research (DZIF), Partner site Hannover-Braunschweig, Germany
| | - Boyke Bunk
- Leibniz Institute DSMZ, Braunschweig, Germany
| | | | - Jörg Overmann
- Leibniz Institute DSMZ, Braunschweig, Germany
- German Center for Infection Research (DZIF), Partner site Hannover-Braunschweig, Germany
- Braunschweig Integrated Center of Systems Biology (BRICS), Technical University, Braunschweig, Germany
| | - Marion Blaschitz
- AGES-Austrian Agency for Health and Food Safety, Vienna, Austria
| | - Alexander Indra
- AGES-Austrian Agency for Health and Food Safety, Vienna, Austria
| | | | - Thomas A. Kohl
- Research Center Borstel, Germany
- German Center for Infection Research (DZIF), Partner site Hamburg-Lübeck-Borstel, Germany
| | - Stefan Niemann
- Research Center Borstel, Germany
- German Center for Infection Research (DZIF), Partner site Hamburg-Lübeck-Borstel, Germany
| | | | - Frank Klawonn
- Biostatistics, Helmholtz Centre for Infection Research, Braunschweig, Germany
- Institute for Information Engineering, Ostfalia University, Wolfenbüttel, Germany
| | | | | | - Sergio García-Fernández
- Servicio de Microbiología, Hospital Universitario Ramón y Cajal, and Instituto Ramón y Cajal de Investigación Sanitaria (IRYCIS), Madrid, Spain
- Red Española de Investigación en Patología Infecciosa (REIPI), Madrid, Spain
| | - Rafael Cantón
- Servicio de Microbiología, Hospital Universitario Ramón y Cajal, and Instituto Ramón y Cajal de Investigación Sanitaria (IRYCIS), Madrid, Spain
- Red Española de Investigación en Patología Infecciosa (REIPI), Madrid, Spain
| | - Rosa del Campo
- Servicio de Microbiología, Hospital Universitario Ramón y Cajal, and Instituto Ramón y Cajal de Investigación Sanitaria (IRYCIS), Madrid, Spain
- Red Española de Investigación en Patología Infecciosa (REIPI), Madrid, Spain
| | | | - Uwe Groß
- University Medical Center Göttingen, Germany
| | - Mark Achtman
- Warwick Medical School, University of Warwick, UK
| | - Ulrich Nübel
- Leibniz Institute DSMZ, Braunschweig, Germany
- German Center for Infection Research (DZIF), Partner site Hannover-Braunschweig, Germany
- Braunschweig Integrated Center of Systems Biology (BRICS), Technical University, Braunschweig, Germany
| |
Collapse
|
23
|
Blanc DS, Magalhães B, Koenig I, Senn L, Grandbastien B. Comparison of Whole Genome (wg-) and Core Genome (cg-) MLST (BioNumerics TM) Versus SNP Variant Calling for Epidemiological Investigation of Pseudomonas aeruginosa. Front Microbiol 2020; 11:1729. [PMID: 32793169 PMCID: PMC7387498 DOI: 10.3389/fmicb.2020.01729] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Accepted: 07/02/2020] [Indexed: 12/29/2022] Open
Abstract
Whole genome sequencing (WGS) is increasingly used for epidemiological investigations of pathogens. While SNP variant calling is currently considered as the most suitable method, the choice of a representative reference genome and the isolate dependency of results limit standardization and affect resolution in an unknown manner. Whole or core genome Multi Locus Sequence Typing (wg-, cg-MLST) represents an attractive alternative. Here, we assess the accuracy of wg- and cg-MLST by comparing results of four Pseudomonas aeruginosa datasets for which epidemiological and genomic data were previously described. Three datasets included 155 isolates from three different sequence types (ST) of P. aeruginosa collected in our ICUs over a 5-year period. The fourth dataset consisted of 10 isolates from an investigation of P. aeruginosa contaminated hand soap. All isolates were previously analyzed by a core SNP approach. In this study, wg- and cg-MLST were performed in BioNumericsTM using a scheme developed by Applied-Maths. Correlation between SNP calling and wg- or cg-MLST results were evaluated by calculating linear regressions and their coefficient of correlations (R2) between the number of SNPs and the number of allele differences in pairwise comparison of isolates. The number of SNPs and allele difference between isolates with close epidemiological linkage varies between 0–26 and 0–13, respectively. When compared to core-SNP calling, a higher coefficient of correlation was obtained with cgMLST (R2 of 0.92–0.99) than with wgMLST (0.78–0.99). In one dataset, a putative homologous recombination of a large DNA fragment (202 loci) was identified among these isolates, affecting its phylogeny, but with no impact on the epidemiological analysis of outbreak isolates. In conclusion, we showed that the P. aeruginosa wgMLST scheme in BioNumericsTM is as discriminatory as the core-SNP calling approach and apparently useful for outbreak investigations. We also showed that epidemiological linked isolates showed less than 26 SNPs or 13 allele differences. These are important figures for the distinction between outbreak and non-outbreak isolates when interpreting WGS results. However, as P. aeruginosa is highly recombinant, a cgMLST approach is preferable and caution should be addressed to possible recombination of large DNA fragments.
Collapse
Affiliation(s)
- Dominique S Blanc
- Service of Hospital Preventive Medicine, Lausanne University Hospital, University of Lausanne, Lausanne, Switzerland
| | - Bárbara Magalhães
- Service of Hospital Preventive Medicine, Lausanne University Hospital, University of Lausanne, Lausanne, Switzerland
| | - Isabelle Koenig
- Service of Hospital Preventive Medicine, Lausanne University Hospital, University of Lausanne, Lausanne, Switzerland
| | - Laurence Senn
- Service of Hospital Preventive Medicine, Lausanne University Hospital, University of Lausanne, Lausanne, Switzerland
| | - Bruno Grandbastien
- Service of Hospital Preventive Medicine, Lausanne University Hospital, University of Lausanne, Lausanne, Switzerland
| |
Collapse
|
24
|
Uelze L, Grützke J, Borowiak M, Hammerl JA, Juraschek K, Deneke C, Tausch SH, Malorny B. Typing methods based on whole genome sequencing data. ONE HEALTH OUTLOOK 2020; 2:3. [PMID: 33829127 PMCID: PMC7993478 DOI: 10.1186/s42522-020-0010-1] [Citation(s) in RCA: 116] [Impact Index Per Article: 23.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/04/2019] [Accepted: 01/08/2020] [Indexed: 05/12/2023]
Abstract
Whole genome sequencing (WGS) of foodborne pathogens has become an effective method for investigating the information contained in the genome sequence of bacterial pathogens. In addition, its highly discriminative power enables the comparison of genetic relatedness between bacteria even on a sub-species level. For this reason, WGS is being implemented worldwide and across sectors (human, veterinary, food, and environment) for the investigation of disease outbreaks, source attribution, and improved risk characterization models. In order to extract relevant information from the large quantity and complex data produced by WGS, a host of bioinformatics tools has been developed, allowing users to analyze and interpret sequencing data, starting from simple gene-searches to complex phylogenetic studies. Depending on the research question, the complexity of the dataset and their bioinformatics skill set, users can choose between a great variety of tools for the analysis of WGS data. In this review, we describe the relevant approaches for phylogenomic studies for outbreak studies and give an overview of selected tools for the characterization of foodborne pathogens based on WGS data. Despite the efforts of the last years, harmonization and standardization of typing tools are still urgently needed to allow for an easy comparison of data between laboratories, moving towards a one health worldwide surveillance system for foodborne pathogens.
Collapse
Affiliation(s)
- Laura Uelze
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| | - Josephine Grützke
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| | - Maria Borowiak
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| | - Jens Andre Hammerl
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| | - Katharina Juraschek
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| | - Carlus Deneke
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| | - Simon H. Tausch
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| | - Burkhard Malorny
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| |
Collapse
|
25
|
Bush SJ, Foster D, Eyre DW, Clark EL, De Maio N, Shaw LP, Stoesser N, Peto TEA, Crook DW, Walker AS. Genomic diversity affects the accuracy of bacterial single-nucleotide polymorphism-calling pipelines. Gigascience 2020; 9:giaa007. [PMID: 32025702 PMCID: PMC7002876 DOI: 10.1093/gigascience/giaa007] [Citation(s) in RCA: 76] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 12/02/2019] [Accepted: 01/15/2020] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Accurately identifying single-nucleotide polymorphisms (SNPs) from bacterial sequencing data is an essential requirement for using genomics to track transmission and predict important phenotypes such as antimicrobial resistance. However, most previous performance evaluations of SNP calling have been restricted to eukaryotic (human) data. Additionally, bacterial SNP calling requires choosing an appropriate reference genome to align reads to, which, together with the bioinformatic pipeline, affects the accuracy and completeness of a set of SNP calls obtained. This study evaluates the performance of 209 SNP-calling pipelines using a combination of simulated data from 254 strains of 10 clinically common bacteria and real data from environmentally sourced and genomically diverse isolates within the genera Citrobacter, Enterobacter, Escherichia, and Klebsiella. RESULTS We evaluated the performance of 209 SNP-calling pipelines, aligning reads to genomes of the same or a divergent strain. Irrespective of pipeline, a principal determinant of reliable SNP calling was reference genome selection. Across multiple taxa, there was a strong inverse relationship between pipeline sensitivity and precision, and the Mash distance (a proxy for average nucleotide divergence) between reads and reference genome. The effect was especially pronounced for diverse, recombinogenic bacteria such as Escherichia coli but less dominant for clonal species such as Mycobacterium tuberculosis. CONCLUSIONS The accuracy of SNP calling for a given species is compromised by increasing intra-species diversity. When reads were aligned to the same genome from which they were sequenced, among the highest-performing pipelines was Novoalign/GATK. By contrast, when reads were aligned to particularly divergent genomes, the highest-performing pipelines often used the aligners NextGenMap or SMALT, and/or the variant callers LoFreq, mpileup, or Strelka.
Collapse
Affiliation(s)
- Stephen J Bush
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
- National Institute for Health Research Health Research Protection Unit in Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford in partnership with Public Health England, Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
| | - Dona Foster
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
- National Institute for Health Research Oxford Biomedical Research Centre, Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
| | - David W Eyre
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
| | - Emily L Clark
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK
| | - Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SH, UK
| | - Liam P Shaw
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
| | - Nicole Stoesser
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
| | - Tim E A Peto
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
- National Institute for Health Research Health Research Protection Unit in Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford in partnership with Public Health England, Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
- National Institute for Health Research Oxford Biomedical Research Centre, Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
| | - Derrick W Crook
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
- National Institute for Health Research Health Research Protection Unit in Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford in partnership with Public Health England, Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
- National Institute for Health Research Oxford Biomedical Research Centre, Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
| | - A Sarah Walker
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
- National Institute for Health Research Health Research Protection Unit in Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford in partnership with Public Health England, Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
- National Institute for Health Research Oxford Biomedical Research Centre, Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
| |
Collapse
|
26
|
Pearce ME, Chattaway MA, Grant K, Maiden MCJ. A proposed core genome scheme for analyses of the Salmonella genus. Genomics 2020; 112:371-378. [PMID: 30905613 PMCID: PMC6978875 DOI: 10.1016/j.ygeno.2019.02.016] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2018] [Revised: 02/19/2019] [Accepted: 02/22/2019] [Indexed: 12/03/2022]
Abstract
The salmonellae are found in a wide range of animal hosts and many food products for human consumption. Most cases of human disease are caused by S. enterica subspecies I; however as opportunistic pathogens the other subspecies (II-VI) and S. bongori are capable of causing disease. Loci that were not consistently present in all of the species and subspecies were removed from a previously proposed core genome scheme (EBcgMLSTv2.0), the removal of these 252 loci resulted in a core genus scheme (SalmcgMLSTv1.0). SalmcgMLSTv1.0 clustered isolates from the same subspecies more rapidly and more accurately grouped isolates from different subspecies when compared with EBcgMLSTv2.0. All loci within the EBcgMLSTv2.0 scheme were present in over 98% of S. enterica subspecies I isolates and should, therefore, continue to be used for subspecies I analyses, while the SalmcgMLSTv1.0 scheme is more appropriate for cross genus investigations.
Collapse
Affiliation(s)
- Madison E Pearce
- Department of Zoology, University of Oxford, Peter Medawar Building for Pathogen Research, South Parks Road, Oxford OX1 3SY, United Kingdom; National Institute for Health Research, Health Protection Research Unit, Gastrointestinal Infections, University of Oxford, United Kingdom.
| | - Marie A Chattaway
- Public Health England, Gastrointestinal Bacteria Reference Unit, 61 Colindale Avenue, London NW9 5EQ, United Kingdom.
| | - Kathie Grant
- Public Health England, Gastrointestinal Bacteria Reference Unit, 61 Colindale Avenue, London NW9 5EQ, United Kingdom.
| | - Martin C J Maiden
- Department of Zoology, University of Oxford, Peter Medawar Building for Pathogen Research, South Parks Road, Oxford OX1 3SY, United Kingdom; National Institute for Health Research, Health Protection Research Unit, Gastrointestinal Infections, University of Oxford, United Kingdom.
| |
Collapse
|
27
|
Pightling AW, Pettengill JB, Wang Y, Rand H, Strain E. Within-species contamination of bacterial whole-genome sequence data has a greater influence on clustering analyses than between-species contamination. Genome Biol 2019; 20:286. [PMID: 31849328 PMCID: PMC6918607 DOI: 10.1186/s13059-019-1914-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2019] [Accepted: 12/06/2019] [Indexed: 11/30/2022] Open
Abstract
Although it is assumed that contamination in bacterial whole-genome sequencing causes errors, the influences of contamination on clustering analyses, such as single-nucleotide polymorphism discovery, phylogenetics, and multi-locus sequencing typing, have not been quantified. By developing and analyzing 720 Listeria monocytogenes, Salmonella enterica, and Escherichia coli short-read datasets, we demonstrate that within-species contamination causes errors that confound clustering analyses, while between-species contamination generally does not. Contaminant reads mapping to references or becoming incorporated into chimeric sequences during assembly are the sources of those errors. Contamination sufficient to influence clustering analyses is present in public sequence databases.
Collapse
Affiliation(s)
- Arthur W Pightling
- Center for Food Safety and Applied Nutrition, US Food and Drug Administration, College Park, MD, USA.
| | - James B Pettengill
- Center for Food Safety and Applied Nutrition, US Food and Drug Administration, College Park, MD, USA
| | - Yu Wang
- Center for Food Safety and Applied Nutrition, US Food and Drug Administration, College Park, MD, USA
| | - Hugh Rand
- Center for Food Safety and Applied Nutrition, US Food and Drug Administration, College Park, MD, USA
| | - Errol Strain
- Center for Food Safety and Applied Nutrition, US Food and Drug Administration, College Park, MD, USA
| |
Collapse
|
28
|
Quijada NM, Hernández M, Rodríguez-Lázaro D. High-throughput sequencing and food microbiology. ADVANCES IN FOOD AND NUTRITION RESEARCH 2019; 91:275-300. [PMID: 32035598 DOI: 10.1016/bs.afnr.2019.10.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Massive parallel sequencing (High-Throughput Sequencing, HTS) permits reading of sequenced millions to billions short DNAs in parallel (reads) and is revolutionizing microbiology and food safety research from the laboratory methods to computational analysis, with the inevitable use of Bioinformatics. The time and cost reduction of microbiota, microbiome and metagenome studies allows the rapid progress in diagnosis, taxonomy, epidemiology, comparative genomics, virulence, discovery of genes or variants of interest and the association of microorganisms with food spoilage and foodborne infections.
Collapse
Affiliation(s)
- Narciso M Quijada
- Laboratorio de Biología Molecular y Microbiología, Ibstituto tecnológico Agrario de Castilla y León (ITACyL), Valladolid, Spain
| | - Marta Hernández
- Laboratorio de Biología Molecular y Microbiología, Ibstituto tecnológico Agrario de Castilla y León (ITACyL), Valladolid, Spain; Microbiology Division, Department of Biotechnology and Food Science, Faculty of Sciences, University of Burgos, Burgos, Spain
| | - David Rodríguez-Lázaro
- Microbiology Division, Department of Biotechnology and Food Science, Faculty of Sciences, University of Burgos, Burgos, Spain.
| |
Collapse
|
29
|
Bhardwaj A, Bag SK. PLANET-SNP pipeline: PLants based ANnotation and Establishment of True SNP pipeline. Genomics 2019; 111:1066-1077. [PMID: 31533899 DOI: 10.1016/j.ygeno.2018.07.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2017] [Revised: 06/10/2018] [Accepted: 07/02/2018] [Indexed: 12/30/2022]
Abstract
Acute prediction of SNPs (Single Nucleotide Polymorphisms) from high throughput sequencing data is a challenging problem, having potential to explore possible variation within plants species. For the extraction of profitable information from bulk of data, machine learning (ML) could lead to development of accurate model based on the learning of prior information. We performed state of art, in-depth learning on six different plant species. Comparative evaluation of five different algorithms showed that Random Forest substantially outperformed in selection of potential SNPs, with markedly improved prediction accuracy via 10-fold cross validation technique and integrated in system known as PLANET-SNP. We present the accurate method to extract the potential SNPs with user specific customizable parameters. It will facilitate the identification of efficient and functional SNPs in most easy and intuitive way. PLANET-SNP pipeline is very flexible in terms of data input and output formats. PLANET-SNP Pipeline is available at http://www.ncgd.nbri.res.in/PLANET-SNP-Pipeline.aspx.
Collapse
Affiliation(s)
- Archana Bhardwaj
- Academy of Scientific and Innovative Research (AcSIR), CSIR-NBRI Campus, Lucknow, India; Computational Biology Lab, Council of Scientific and Industrial Research - National Botanical Research Institute (CSIR-NBRI), Rana Pratap Marg, Lucknow, Uttar Pradesh 226001, India
| | - Sumit K Bag
- Academy of Scientific and Innovative Research (AcSIR), CSIR-NBRI Campus, Lucknow, India; Computational Biology Lab, Council of Scientific and Industrial Research - National Botanical Research Institute (CSIR-NBRI), Rana Pratap Marg, Lucknow, Uttar Pradesh 226001, India.
| |
Collapse
|
30
|
Petronella N, Kundra P, Auclair O, Hébert K, Rao M, Kingsley K, De Bruyne K, Banerjee S, Gill A, Pagotto F, Tamber S, Ronholm J. Changes detected in the genome sequences of Escherichia coli, Listeria monocytogenes, Vibrio parahaemolyticus, and Salmonella enterica after serial subculturing. Can J Microbiol 2019; 65:842-850. [PMID: 31356758 DOI: 10.1139/cjm-2019-0235] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Whole genome sequencing (WGS) is rapidly replacing other molecular techniques for identifying and subtyping bacterial isolates. The resolution or discrimination offered by WGS is significantly higher than that offered by other molecular techniques, and WGS readily allows infrequent differences that occur between 2 closely related strains to be found. In this investigation, WGS was used to identify the changes that occurred in the genomes of 13 strains of bacterial foodborne pathogens after 100 serial subcultures. Pure cultures of Shiga-toxin-producing Escherichia coli, Salmonella enterica, Listeria monocytogenes, and Vibrio parahaemolyticus were subcultured daily for 100 successive days. The 1st and 100th subcultures were whole-genome sequenced using short-read sequencing. Single nucleotide polymorphisms (SNPs) were identified between the 1st and final culture using 2 different approaches, and multilocus sequence typing of the whole genome was also performed to detect any changes at the allelic level. The number of observed genomic changes varied by strain, species, and the SNP caller used. This study provides insight into the genomic variation that can be detected using next-generation sequencing and analysis methods after repeated subculturing of 4 important bacterial pathogens.
Collapse
Affiliation(s)
- Nicholas Petronella
- Biostatistics and Modeling Division, Bureau of Food Surveillance and Science Integration, Food Directorate, Health Canada, 251 Sir Frederick Banting Driveway, Ottawa, Ontario, Canada
| | - Palni Kundra
- Department of Food Science and Agricultural Chemistry, Faculty of Agricultural and Environmental Sciences, Macdonald Campus, McGill University, 21111 Lakeshore Road, Sainte-Anne-de-Bellevue, Quebec, Canada
| | - Olivia Auclair
- Department of Food Science and Agricultural Chemistry, Faculty of Agricultural and Environmental Sciences, Macdonald Campus, McGill University, 21111 Lakeshore Road, Sainte-Anne-de-Bellevue, Quebec, Canada
| | - Karine Hébert
- Bureau of Microbial Hazards, Food Directorate, Health Canada, 251 Sir Frederick Banting Driveway, Ottawa, Ontario, Canada.,Listeriosis Reference Service, Bureau of Microbial Hazards, Food Directorate, Health Canada, 251 Sir Frederick Banting Driveway, Ottawa, Ontario, Canada
| | - Mary Rao
- Bureau of Microbial Hazards, Food Directorate, Health Canada, 251 Sir Frederick Banting Driveway, Ottawa, Ontario, Canada
| | - Kyle Kingsley
- Applied Maths, Data Analytics Unit, bioMérieux, Austin, Texas, USA
| | | | - Swapan Banerjee
- Bureau of Microbial Hazards, Food Directorate, Health Canada, 251 Sir Frederick Banting Driveway, Ottawa, Ontario, Canada
| | - Alexander Gill
- Bureau of Microbial Hazards, Food Directorate, Health Canada, 251 Sir Frederick Banting Driveway, Ottawa, Ontario, Canada
| | - Franco Pagotto
- Bureau of Microbial Hazards, Food Directorate, Health Canada, 251 Sir Frederick Banting Driveway, Ottawa, Ontario, Canada.,Listeriosis Reference Service, Bureau of Microbial Hazards, Food Directorate, Health Canada, 251 Sir Frederick Banting Driveway, Ottawa, Ontario, Canada
| | - Sandeep Tamber
- Bureau of Microbial Hazards, Food Directorate, Health Canada, 251 Sir Frederick Banting Driveway, Ottawa, Ontario, Canada
| | - Jennifer Ronholm
- Department of Food Science and Agricultural Chemistry, Faculty of Agricultural and Environmental Sciences, Macdonald Campus, McGill University, 21111 Lakeshore Road, Sainte-Anne-de-Bellevue, Quebec, Canada.,Department of Animal Science, Faculty of Agricultural and Environmental Sciences, Macdonald Campus, McGill University, 21111 Lakeshore Road, Sainte-Anne-de-Bellevue, Quebec, Canada
| |
Collapse
|
31
|
Tang S, Orsi RH, Luo H, Ge C, Zhang G, Baker RC, Stevenson A, Wiedmann M. Assessment and Comparison of Molecular Subtyping and Characterization Methods for Salmonella. Front Microbiol 2019; 10:1591. [PMID: 31354679 PMCID: PMC6639432 DOI: 10.3389/fmicb.2019.01591] [Citation(s) in RCA: 58] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Accepted: 06/26/2019] [Indexed: 01/26/2023] Open
Abstract
The food industry is facing a major transition regarding methods for confirmation, characterization, and subtyping of Salmonella. Whole-genome sequencing (WGS) is rapidly becoming both the method of choice and the gold standard for Salmonella subtyping; however, routine use of WGS by the food industry is often not feasible due to cost constraints or the need for rapid results. To facilitate selection of subtyping methods by the food industry, we present: (i) a comparison between classical serotyping and selected widely used molecular-based subtyping methods including pulsed-field gel electrophoresis, multilocus sequence typing, and WGS (including WGS-based serovar prediction) and (ii) a scoring system to evaluate and compare Salmonella subtyping assays. This literature-based assessment supports the superior discriminatory power of WGS for source tracking and root cause elimination in food safety incident; however, circumstances in which use of other subtyping methods may be warranted were also identified. This review provides practical guidance for the food industry and presents a starting point for further comparative evaluation of Salmonella characterization and subtyping methods.
Collapse
Affiliation(s)
- Silin Tang
- Mars Global Food Safety Center, Beijing, China
| | - Renato H. Orsi
- Department of Food Science, College of Agriculture and Life Sciences, Cornell University, Ithaca, NY, United States
| | - Hao Luo
- Mars Global Food Safety Center, Beijing, China
| | - Chongtao Ge
- Mars Global Food Safety Center, Beijing, China
| | | | | | | | - Martin Wiedmann
- Department of Food Science, College of Agriculture and Life Sciences, Cornell University, Ithaca, NY, United States
| |
Collapse
|
32
|
|
33
|
Jagadeesan B, Baert L, Wiedmann M, Orsi RH. Comparative Analysis of Tools and Approaches for Source Tracking Listeria monocytogenes in a Food Facility Using Whole-Genome Sequence Data. Front Microbiol 2019; 10:947. [PMID: 31143162 PMCID: PMC6521219 DOI: 10.3389/fmicb.2019.00947] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2019] [Accepted: 04/15/2019] [Indexed: 12/04/2022] Open
Abstract
As WGS is increasingly used by food industry to characterize pathogen isolates, users are challenged by the variety of analysis approaches available, ranging from methods that require extensive bioinformatics expertise to commercial software packages. This study aimed to assess the impact of analysis pipelines (i.e., different hqSNP pipelines, a cg/wgMLST pipeline) and the reference genome selection on analysis results (i.e., hqSNP and allelic differences as well as tree topologies) and conclusion drawn. For these comparisons, whole genome sequences were obtained for 40 Listeria monocytogenes isolates collected over 18 years from a cold-smoked salmon facility and 2 other isolates obtained from different facilities as part of academic research activities; WGS data were analyzed with three hqSNP pipelines and two MLST pipelines. After initial clustering using a k-mer based approach, hqSNP pipelines were run using two types of reference genomes: (i) closely related closed genomes (“closed references”) and (ii) high-quality de novo assemblies of the dataset isolates (“draft references”). All hqSNP pipelines identified similar hqSNP difference ranges among isolates in a given cluster; use of different reference genomes showed minimal impacts on hqSNP differences identified between isolate pairs. Allelic differences obtained by wgMLST showed similar ranges as hqSNP differences among isolates in a given cluster; cgMLST consistently showed fewer differences than wgMLST. However, phylogenetic trees and dendrograms, obtained based on hqSNP and cg/wgMLST data, did show some incongruences, typically linked to clades supported by low bootstrap values in the trees. When a hqSNP cutoff was used to classify isolates as “related” or “unrelated,” use of different pipelines yielded a considerable number of discordances; this finding supports that cut-off values are valuable to provide a starting point for an investigation, but supporting and epidemiological evidence should be used to interpret WGS data. Overall, our data suggest that cgMLST-based data analyses provide for appropriate subtype differentiation and can be used without the need for preliminary data analyses (e.g., k-mer based clustering) or external closed reference genomes, simplifying data analyses needs. hqSNP or wgMLST analyses can be performed on the isolate clusters identified by cgMLST to increase the precision on determining the genomic similarity between isolates.
Collapse
Affiliation(s)
- Balamurugan Jagadeesan
- Nestlé Institute of Food Safety and Analytical Sciences, Nestlé Research, Lausanne, Switzerland
| | - Leen Baert
- Nestlé Institute of Food Safety and Analytical Sciences, Nestlé Research, Lausanne, Switzerland
| | - Martin Wiedmann
- Department of Food Science, Cornell University, Ithaca, NY, United States
| | - Renato H Orsi
- Department of Food Science, Cornell University, Ithaca, NY, United States
| |
Collapse
|
34
|
Carroll LM, Wiedmann M, Mukherjee M, Nicholas DC, Mingle LA, Dumas NB, Cole JA, Kovac J. Characterization of Emetic and Diarrheal Bacillus cereus Strains From a 2016 Foodborne Outbreak Using Whole-Genome Sequencing: Addressing the Microbiological, Epidemiological, and Bioinformatic Challenges. Front Microbiol 2019; 10:144. [PMID: 30809204 PMCID: PMC6379260 DOI: 10.3389/fmicb.2019.00144] [Citation(s) in RCA: 79] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2018] [Accepted: 01/21/2019] [Indexed: 12/21/2022] Open
Abstract
The Bacillus cereus group comprises multiple species capable of causing emetic or diarrheal foodborne illness. Despite being responsible for tens of thousands of illnesses each year in the U.S. alone, whole-genome sequencing (WGS) is not yet routinely employed to characterize B. cereus group isolates from foodborne outbreaks. Here, we describe the first WGS-based characterization of isolates linked to an outbreak caused by members of the B. cereus group. In conjunction with a 2016 outbreak traced to a supplier of refried beans served by a fast food restaurant chain in upstate New York, a total of 33 B. cereus group isolates were obtained from human cases (n = 7) and food samples (n = 26). Emetic (n = 30) and diarrheal (n = 3) isolates were most closely related to B. paranthracis (group III) and B. cereus sensu stricto (group IV), respectively. WGS indicated that the 30 emetic isolates (24 and 6 from food and humans, respectively) were closely related and formed a well-supported clade distinct from publicly available emetic group III genomes with an identical sequence type (ST 26). The 30 emetic group III isolates from this outbreak differed from each other by a mean of 8.3 to 11.9 core single nucleotide polymorphisms (SNPs), while differing from publicly available emetic group III ST 26 B. cereus group genomes by a mean of 301.7-528.0 core SNPs, depending on the SNP calling methodology used. Using a WST-1 cell proliferation assay, the strains isolated from this outbreak had only mild detrimental effects on HeLa cell metabolic activity compared to reference diarrheal strain B. cereus ATCC 14579. We hypothesize that the outbreak was a single source outbreak caused by emetic group III B. cereus belonging to the B. paranthracis species, although food samples were not tested for presence of the emetic toxin cereulide. In addition to showcasing how WGS can be used to characterize B. cereus group strains linked to a foodborne outbreak, we also discuss potential microbiological and epidemiological challenges presented by B. cereus group outbreaks, and we offer recommendations for analyzing WGS data from the isolates associated with them.
Collapse
Affiliation(s)
- Laura M. Carroll
- Department of Food Science, Cornell University, Ithaca, NY, United States
| | - Martin Wiedmann
- Department of Food Science, Cornell University, Ithaca, NY, United States
| | - Manjari Mukherjee
- Department of Food Science, The Pennsylvania State University, University Park, PA, United States
| | - David C. Nicholas
- New York State Department of Health, Corning Tower, Empire State Plaza, Albany, NY, United States
| | - Lisa A. Mingle
- New York State Department of Health, Wadsworth Center, Albany, NY, United States
| | - Nellie B. Dumas
- New York State Department of Health, Wadsworth Center, Albany, NY, United States
| | - Jocelyn A. Cole
- New York State Department of Health, Wadsworth Center, Albany, NY, United States
| | - Jasna Kovac
- Department of Food Science, The Pennsylvania State University, University Park, PA, United States
| |
Collapse
|
35
|
Wang YU, Pettengill JB, Pightling A, Timme R, Allard M, Strain E, Rand H. Genetic Diversity of Salmonella and Listeria Isolates from Food Facilities. J Food Prot 2018; 81:2082-2089. [PMID: 30485763 DOI: 10.4315/0362-028x.jfp-18-093] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Food production-related facilities (farms, packing houses, etc.) are monitored for foodborne pathogens, and data from these facilities can provide a rich source of information about the population structure and genetic diversity of Salmonella and Listeria. This information is of both academic interest for understanding the evolutionary forces acting on these organisms and of practical interest to those responsible for controlling pathogens in facilities and to those analyzing data from facilities in the context of public health decision making. We have collected information about all positive isolates from facility inspections performed by the U.S. Food and Drug Administration for which whole genome sequencing data are available. The within- and between-facilities observed genetic diversity of isolates was computed and related to the common origin of isolates (as the common collected facility). This relationship provides quantification for assessing the relationship between isolates based on their genetic similarity quantified by single-nucleotide polymorphisms (SNPs). Our results show that if the genetic distance ( D) between two isolates is low, then more likely than not they are from the same facility or have some overlap in their supply chain. For example, if the genetic distance is no more than 20 SNPs, the probability ( P) that two isolates come from the same facility = 0.66 for Salmonella and 0.70 for Listeria. However, if two isolates come from different facilities, their genetic distance is likely large (for Salmonella, P( D > 20 SNPs) = 0.99982; for Listeria, P( D > 20 SNPs) = 0.99949); even if two isolates come from the same facility, their genetic distance is also very likely large (for Salmonella, P( D > 20 SNPs) = 0.794; for Listeria, P( D > 20 SNPs) = 0.692). These results provide insight into what SNP thresholds might be appropriate when determining whether two isolates are from the same facility and thus would be of interest to those investigating foodborne outbreaks and conducting traceback investigations.
Collapse
Affiliation(s)
- Y U Wang
- Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration, College Park, Maryland 20740, USA
| | - James B Pettengill
- Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration, College Park, Maryland 20740, USA
| | - Arthur Pightling
- Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration, College Park, Maryland 20740, USA
| | - Ruth Timme
- Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration, College Park, Maryland 20740, USA
| | - Marc Allard
- Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration, College Park, Maryland 20740, USA
| | - Errol Strain
- Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration, College Park, Maryland 20740, USA
| | - Hugh Rand
- Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration, College Park, Maryland 20740, USA
| |
Collapse
|
36
|
Cook PW, Nightingale KK. Use of omics methods for the advancement of food quality and food safety. Anim Front 2018; 8:33-41. [PMID: 32002228 DOI: 10.1093/af/vfy024] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Affiliation(s)
- Peter W Cook
- Center for Food Safety, University of Georgia, Griffin, GA.,Influenza Division, National Center for Immunization and Respiratory Diseases, Centers for Disease Control and Prevention, Atlanta, GA
| | | |
Collapse
|
37
|
Core Genome Multilocus Sequence Typing and Single Nucleotide Polymorphism Analysis in the Epidemiology of Brucella melitensis Infections. J Clin Microbiol 2018; 56:JCM.00517-18. [PMID: 29925641 PMCID: PMC6113479 DOI: 10.1128/jcm.00517-18] [Citation(s) in RCA: 52] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2018] [Accepted: 06/09/2018] [Indexed: 02/07/2023] Open
Abstract
The use of whole-genome sequencing (WGS) using next-generation sequencing (NGS) technology has become a widely accepted method for microbiology laboratories in the application of molecular typing for outbreak tracing and genomic epidemiology. Several studies demonstrated the usefulness of WGS data analysis through single-nucleotide polymorphism (SNP) calling from a reference sequence analysis for Brucella melitensis, whereas gene-by-gene comparison through core-genome multilocus sequence typing (cgMLST) has not been explored so far. The current study developed an allele-based cgMLST method and compared its performance to that of the genome-wide SNP approach and the traditional multilocus variable-number tandem repeat analysis (MLVA) on a defined sample collection. The data set was comprised of 37 epidemiologically linked animal cases of brucellosis as well as 71 isolates with unknown epidemiological status, composed of human and animal samples collected in Italy. The cgMLST scheme generated in this study contained 2,704 targets of the B. melitensis 16M reference genome. We established the potential criteria necessary for inclusion of an isolate into a brucellosis outbreak cluster to be ≤6 loci in the cgMLST and ≤7 in WGS SNP analysis. Higher phylogenetic distance resolution was achieved with cgMLST and SNP analysis than with MLVA, particularly for strains belonging to the same lineage, thereby allowing diverse and unrelated genotypes to be identified with greater confidence. The application of a cgMLST scheme to the characterization of B. melitensis strains provided insights into the epidemiology of this pathogen, and it is a candidate to be a benchmark tool for outbreak investigations in human and animal brucellosis.
Collapse
|
38
|
Deblais L, Lorentz B, Scaria J, Nagaraja KV, Nisar M, Lauer D, Voss S, Rajashekara G. Comparative Genomic Studies of Salmonella Heidelberg Isolated From Chicken- and Turkey-Associated Farm Environmental Samples. Front Microbiol 2018; 9:1841. [PMID: 30147682 PMCID: PMC6097345 DOI: 10.3389/fmicb.2018.01841] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2018] [Accepted: 07/23/2018] [Indexed: 11/13/2022] Open
Abstract
Salmonella is one of the leading causes of human foodborne gastroenteritis in the United States. In addition, Salmonella contributes to morbidity and mortality in livestock. The control of Salmonella is an increasing problematic issue in livestock production due to lack of effective control methods and the constant adaptation of Salmonella to new management practices, which is often related to horizontal acquisition of virulence or antibiotic resistance genes. Salmonella enterica serotype Heidelberg is one of the most commonly isolated serotypes in all poultry production systems in North America. Emergence and persistence of multi-drug resistant Salmonella Heidelberg isolates further impact the poultry production and public health. We hypothesized that distinct poultry production environments affect Salmonella genomic content, and by consequence its survival and virulence abilities. This study compared the genomic composition of S. Heidelberg isolated from environmental samples (19 chicken and 12 turkey isolates) of different breeder farms (16 chicken and 8 turkey farms) in the Midwest, United States. Whole genome comparison of 31 genomes using RAST and SEED identified differences in specific sub-systems in isolates between the chicken- and turkey-associated farm environmental samples. Genes associated with the type IV secretion system (n = 12) and conjugative transfer (n = 3) were absent in turkey farm isolates compared to the chicken ones (p-value < 0.01); Further, turkey farm isolates were enriched in prophage proteins (n = 53; p-value < 0.01). Complementary studies using PHASTER showed that prophages were all Caudovirales phages and were more represented in turkey environmental isolates than the chicken isolates. This study corroborates that isolates from distinct farm environment show differences in S. Heidelberg genome content related to horizontal transfer between bacteria or through viral infections. Complementary microbiome studies of these samples would provide critical insights on sources of these variations. Overall, our findings enhance the understanding of Salmonella genome plasticity and may aid in the development of future effective management practices to control Salmonella.
Collapse
Affiliation(s)
- Loïc Deblais
- Food Animal Health Research Program, Department of Veterinary Preventive Medicine, The Ohio State University, OARDC, Wooster, OH, United States.,Department of Plant Pathology, The Ohio State University, OARDC, Wooster, OH, United States
| | - Benjamin Lorentz
- Food Animal Health Research Program, Department of Veterinary Preventive Medicine, The Ohio State University, OARDC, Wooster, OH, United States
| | - Joy Scaria
- Department of Veterinary and Biomedical Sciences, South Dakota State University, Brookings, SD, United States
| | - Kakambi V Nagaraja
- Department of Veterinary and Biomedical Sciences, College of Veterinary Medicine, University of Minnesota, Saint Paul, MN, United States
| | - Muhammad Nisar
- Department of Veterinary and Biomedical Sciences, College of Veterinary Medicine, University of Minnesota, Saint Paul, MN, United States
| | - Dale Lauer
- Minnesota Poultry Testing Laboratory, University of Minnesota Veterinary Diagnostic Laboratory, Minnesota Board of Animal Health, Willmar, MN, United States
| | - Shauna Voss
- Minnesota Poultry Testing Laboratory, University of Minnesota Veterinary Diagnostic Laboratory, Minnesota Board of Animal Health, Willmar, MN, United States
| | - Gireesh Rajashekara
- Food Animal Health Research Program, Department of Veterinary Preventive Medicine, The Ohio State University, OARDC, Wooster, OH, United States
| |
Collapse
|
39
|
Pightling AW, Pettengill JB, Luo Y, Baugher JD, Rand H, Strain E. Interpreting Whole-Genome Sequence Analyses of Foodborne Bacteria for Regulatory Applications and Outbreak Investigations. Front Microbiol 2018; 9:1482. [PMID: 30042741 PMCID: PMC6048267 DOI: 10.3389/fmicb.2018.01482] [Citation(s) in RCA: 176] [Impact Index Per Article: 25.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2018] [Accepted: 06/13/2018] [Indexed: 12/05/2022] Open
Abstract
Whole-genome sequence (WGS) analysis has revolutionized the food safety industry by enabling high-resolution typing of foodborne bacteria. Higher resolving power allows investigators to identify origins of contamination during illness outbreaks and regulatory activities quickly and accurately. Government agencies and industry stakeholders worldwide are now analyzing WGS data routinely. Although researchers have published many studies that assess the efficacy of WGS data analysis for source attribution, guidance for interpreting WGS analyses is lacking. Here, we provide the framework for interpreting WGS analyses used by the Food and Drug Administration's Center for Food Safety and Applied Nutrition (CFSAN). We based this framework on the experiences of CFSAN investigators, collaborations and interactions with government and industry partners, and evaluation of the published literature. A fundamental question for investigators is whether two or more bacteria arose from the same source of contamination. Analysts often count the numbers of nucleotide differences [single-nucleotide polymorphisms (SNPs)] between two or more genome sequences to measure genetic distances. However, using SNP thresholds alone to assess whether bacteria originated from the same source can be misleading. Bacteria that are isolated from food, environmental, or clinical samples are representatives of bacterial populations. These populations are subject to evolutionary forces that can change genome sequences. Therefore, interpreting WGS analyses of foodborne bacteria requires a more sophisticated approach. Here, we present a framework for interpreting WGS analyses that combines SNP counts with phylogenetic tree topologies and bootstrap support. We also clarify the roles of WGS, epidemiological, traceback, and other evidence in forming the conclusions of investigations. Finally, we present examples that illustrate the application of this framework to real-world situations.
Collapse
Affiliation(s)
- Arthur W. Pightling
- Biostatistics and Bioinformatics, Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration, College Park, MD, United States
| | | | | | | | | | | |
Collapse
|
40
|
Pearce ME, Alikhan NF, Dallman TJ, Zhou Z, Grant K, Maiden MCJ. Comparative analysis of core genome MLST and SNP typing within a European Salmonella serovar Enteritidis outbreak. Int J Food Microbiol 2018; 274:1-11. [PMID: 29574242 PMCID: PMC5899760 DOI: 10.1016/j.ijfoodmicro.2018.02.023] [Citation(s) in RCA: 116] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2017] [Revised: 02/23/2018] [Accepted: 02/27/2018] [Indexed: 01/10/2023]
Abstract
Multi-country outbreaks of foodborne bacterial disease present challenges in their detection, tracking, and notification. As food is increasingly distributed across borders, such outbreaks are becoming more common. This increases the need for high-resolution, accessible, and replicable isolate typing schemes. Here we evaluate a core genome multilocus typing (cgMLST) scheme for the high-resolution reproducible typing of Salmonella enterica (S. enterica) isolates, by its application to a large European outbreak of S. enterica serovar Enteritidis. This outbreak had been extensively characterised using single nucleotide polymorphism (SNP)-based approaches. The cgMLST analysis was congruent with the original SNP-based analysis, the epidemiological data, and whole genome MLST (wgMLST) analysis. Combination of the cgMLST and epidemiological data confirmed that the genetic diversity among the isolates predated the outbreak, and was likely present at the infection source. There was consequently no link between country of isolation and genetic diversity, but the cgMLST clusters were congruent with date of isolation. Furthermore, comparison with publicly available Enteritidis isolate data demonstrated that the cgMLST scheme presented is highly scalable, enabling outbreaks to be contextualised within the Salmonella genus. The cgMLST scheme is therefore shown to be a standardised and scalable typing method, which allows Salmonella outbreaks to be analysed and compared across laboratories and jurisdictions. cgMLST is proposed as a universal typing scheme for Salmonella. cgMLST is congruent with SNP analyses and easier to implement across laboratories. Genomic data are consistent with the epidemiology of the outbreak.
Collapse
Affiliation(s)
- Madison E Pearce
- Department of Zoology, University of Oxford, Peter Medawar Building for Pathogen Research, South Parks Road, Oxford OX1 3SY, United Kingdom; National Institute for Health Research, Health Protection Research Unit, Gastrointestinal Infections, University of Oxford, United Kingdom.
| | - Nabil-Fareed Alikhan
- Warwick Medical School, University of Warwick, Coventry CV4 7AL, United Kingdom.
| | - Timothy J Dallman
- Public Health England, Gastrointestinal Bacteria Reference Unit, 61 Colindale Avenue, London NW9 5EQ, United Kingdom.
| | - Zhemin Zhou
- Warwick Medical School, University of Warwick, Coventry CV4 7AL, United Kingdom.
| | - Kathie Grant
- Public Health England, Gastrointestinal Bacteria Reference Unit, 61 Colindale Avenue, London NW9 5EQ, United Kingdom.
| | - Martin C J Maiden
- Department of Zoology, University of Oxford, Peter Medawar Building for Pathogen Research, South Parks Road, Oxford OX1 3SY, United Kingdom; National Institute for Health Research, Health Protection Research Unit, Gastrointestinal Infections, University of Oxford, United Kingdom.
| |
Collapse
|
41
|
Abdelbary MMH, Senn L, Moulin E, Prod'hom G, Croxatto A, Greub G, Blanc DS. Evaluating the use of whole-genome sequencing for outbreak investigations in the lack of closely related reference genome. INFECTION GENETICS AND EVOLUTION 2018; 59:1-6. [PMID: 29367013 DOI: 10.1016/j.meegid.2018.01.014] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Revised: 10/10/2017] [Accepted: 01/18/2018] [Indexed: 12/01/2022]
Abstract
Whole-genome sequencing (WGS) has emerged as a powerful molecular typing method for outbreak analysis enabling the rapid discrimination between outbreak and non-outbreak isolates. However, such analysis can be challenging in the absence of closely related reference genomes. In this study, we assessed the use of WGS in investigating an outbreak of a relatively understudied bacterial pathogen with no publicly available closely related reference genome. Eleven Burkholderia cepacia complex (Bcc) isolates (seven from patients and four from disposable dermal gloves packages) that were collected during an outbreak were sequenced using the Illumina MiSeq platform. Our results showed that mapping the 11 sequenced Bcc outbreak isolates against a genetically distant reference genome yield loses coverage (31.6-48.3%) and a high number of detected false single-nucleotide polymorphisms (SNPs) (1123-2139). Therefore, a reference genome consensus from an outbreak clinical isolate was generated by combining both de novo assembly and mapping approaches. Based on this approach, we were able to demonstrate that the Bcc outbreak isolates were closely related and were phylogenetically distinct from the 11 publically available Bcc genomes. In addition, the pairwise SNP distance analysis detected only 1 to 6 SNPs differences among the outbreak isolates, confirming that contaminated disposable dermal gloves were the cause of the outbreak.
Collapse
Affiliation(s)
- Mohamed M H Abdelbary
- Service of Hospital Preventive Medicine, Lausanne University Hospital, Lausanne, Switzerland.
| | - Laurence Senn
- Service of Hospital Preventive Medicine, Lausanne University Hospital, Lausanne, Switzerland
| | - Estelle Moulin
- Service of Hospital Preventive Medicine, Lausanne University Hospital, Lausanne, Switzerland
| | - Guy Prod'hom
- Institute of Microbiology, Lausanne University Hospital, Lausanne, Switzerland
| | - Antony Croxatto
- Institute of Microbiology, Lausanne University Hospital, Lausanne, Switzerland
| | - Gilbert Greub
- Institute of Microbiology, Lausanne University Hospital, Lausanne, Switzerland
| | - Dominique S Blanc
- Service of Hospital Preventive Medicine, Lausanne University Hospital, Lausanne, Switzerland; Institute of Microbiology, Lausanne University Hospital, Lausanne, Switzerland
| |
Collapse
|
42
|
Comparison of advanced whole genome sequence-based methods to distinguish strains of Salmonella enterica serovar Heidelberg involved in foodborne outbreaks in Québec. Food Microbiol 2018. [PMID: 29526232 DOI: 10.1016/j.fm.2018.01.004] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Salmonella enterica serovar Heidelberg (S. Heidelberg) is one of the top serovars causing human salmonellosis. This serovar ranks second and third among serovars that cause human infections in Québec and Canada, respectively, and has been associated with severe infections. Traditional typing methods such as PFGE do not display adequate discrimination required to resolve outbreak investigations due to the low level of genetic diversity of isolates belonging to this serovar. This study evaluates the ability of four whole genome sequence (WGS)-based typing methods to differentiate among 145 S. Heidelberg strains involved in four distinct outbreak events and sporadic cases of salmonellosis that occurred in Québec between 2007 and 2016. Isolates from all outbreaks were indistinguishable by PFGE. The core genome single nucleotide variant (SNV), core genome multilocus sequence typing (MLST) and whole genome MLST approaches were highly discriminatory and separated outbreak strains into four distinct phylogenetic clusters that were concordant with the epidemiological data. The clustered regularly interspaced short palindromic repeats (CRISPR) typing method was less discriminatory. However, CRISPR typing may be used as a secondary method to differentiate isolates of S. Heidelberg that are genetically similar but epidemiologically unrelated to outbreak events. WGS-based typing methods provide a highly discriminatory alternative to PFGE for the laboratory investigation of foodborne outbreaks.
Collapse
|
43
|
Henri C, Leekitcharoenphon P, Carleton HA, Radomski N, Kaas RS, Mariet JF, Felten A, Aarestrup FM, Gerner Smidt P, Roussel S, Guillier L, Mistou MY, Hendriksen RS. An Assessment of Different Genomic Approaches for Inferring Phylogeny of Listeria monocytogenes. Front Microbiol 2017; 8:2351. [PMID: 29238330 PMCID: PMC5712588 DOI: 10.3389/fmicb.2017.02351] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2017] [Accepted: 11/15/2017] [Indexed: 11/13/2022] Open
Abstract
Background/objectives: Whole genome sequencing (WGS) has proven to be a powerful subtyping tool for foodborne pathogenic bacteria like L. monocytogenes. The interests of genome-scale analysis for national surveillance, outbreak detection or source tracking has been largely documented. The genomic data however can be exploited with many different bioinformatics methods like single nucleotide polymorphism (SNP), core-genome multi locus sequence typing (cgMLST), whole-genome multi locus sequence typing (wgMLST) or multi locus predicted protein sequence typing (MLPPST) on either core-genome (cgMLPPST) or pan-genome (wgMLPPST). Currently, there are little comparisons studies of these different analytical approaches. Our objective was to assess and compare different genomic methods that can be implemented in order to cluster isolates of L. monocytogenes. Methods: The clustering methods were evaluated on a collection of 207 L. monocytogenes genomes of food origin representative of the genetic diversity of the Anses collection. The trees were then compared using robust statistical analyses. Results: The backward comparability between conventional typing methods and genomic methods revealed a near-perfect concordance. The importance of selecting a proper reference when calling SNPs was highlighted, although distances between strains remained identical. The analysis also revealed that the topology of the phylogenetic trees between wgMLST and cgMLST were remarkably similar. The comparison between SNP and cgMLST or SNP and wgMLST approaches showed that the topologies of phylogenic trees were statistically similar with an almost equivalent clustering. Conclusion: Our study revealed high concordance between wgMLST, cgMLST, and SNP approaches which are all suitable for typing of L. monocytogenes. The comparable clustering is an important observation considering that the two approaches have been variously implemented among reference laboratories.
Collapse
Affiliation(s)
- Clémentine Henri
- Agence Nationale de Sécurité Sanitaire de l'Alimentation, Maisons-Alfort Laboratory for Food Safety, University Paris-Est, Maisons-Alfort, France
| | - Pimlapas Leekitcharoenphon
- European Union Reference Laboratory for Antimicrobial Resistance, National Food Institute, WHO Collaborating Center for Antimicrobial Resistance in Food Borne Pathogens and Genomics, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Heather A Carleton
- National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, GA, United States
| | - Nicolas Radomski
- Agence Nationale de Sécurité Sanitaire de l'Alimentation, Maisons-Alfort Laboratory for Food Safety, University Paris-Est, Maisons-Alfort, France
| | - Rolf S Kaas
- European Union Reference Laboratory for Antimicrobial Resistance, National Food Institute, WHO Collaborating Center for Antimicrobial Resistance in Food Borne Pathogens and Genomics, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Jean-François Mariet
- Agence Nationale de Sécurité Sanitaire de l'Alimentation, Maisons-Alfort Laboratory for Food Safety, University Paris-Est, Maisons-Alfort, France
| | - Arnaud Felten
- Agence Nationale de Sécurité Sanitaire de l'Alimentation, Maisons-Alfort Laboratory for Food Safety, University Paris-Est, Maisons-Alfort, France
| | - Frank M Aarestrup
- European Union Reference Laboratory for Antimicrobial Resistance, National Food Institute, WHO Collaborating Center for Antimicrobial Resistance in Food Borne Pathogens and Genomics, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Peter Gerner Smidt
- National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, GA, United States
| | - Sophie Roussel
- Agence Nationale de Sécurité Sanitaire de l'Alimentation, Maisons-Alfort Laboratory for Food Safety, University Paris-Est, Maisons-Alfort, France
| | - Laurent Guillier
- Agence Nationale de Sécurité Sanitaire de l'Alimentation, Maisons-Alfort Laboratory for Food Safety, University Paris-Est, Maisons-Alfort, France
| | - Michel-Yves Mistou
- Agence Nationale de Sécurité Sanitaire de l'Alimentation, Maisons-Alfort Laboratory for Food Safety, University Paris-Est, Maisons-Alfort, France
| | - René S Hendriksen
- European Union Reference Laboratory for Antimicrobial Resistance, National Food Institute, WHO Collaborating Center for Antimicrobial Resistance in Food Borne Pathogens and Genomics, Technical University of Denmark, Kongens Lyngby, Denmark
| |
Collapse
|
44
|
Datta AR, Burall LS. Serotype to genotype: The changing landscape of listeriosis outbreak investigations. Food Microbiol 2017; 75:18-27. [PMID: 30056958 DOI: 10.1016/j.fm.2017.06.013] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2017] [Revised: 06/08/2017] [Accepted: 06/15/2017] [Indexed: 02/07/2023]
Abstract
The classical definition of a disease outbreak is the occurrence of cases of disease in excess of what would normally be expected in a community, geographical area or time period. The establishment of an outbreak then starts with the identification of an incidence of cases above the normally expected threshold during a given time period. Subsequently, the cases are examined using a variety of subtyping methods to identify potential linkages. As listeriosis disease has a long incubation period, relating a single source or multiple sources of contaminated food to clinical disease is challenging and time consuming. The vast majority of human listeriosis cases are caused by three serotypes, 1/2a, 1/2b, and 4b. Thus serotyping of isolates from suspected foods and clinical samples, although useful for eliminating some food sources, has a very limited discriminatory power. The advent of faster and more affordable sequencing technology, coupled with increased computational power, has permitted comparisons of whole Listeria genome sequences from isolates recovered from clinical, food, and environmental sources. These analyses made it possible to identify outbreaks and the source much more accurately and faster, thus leading to a reduction in number of illnesses as well as a reduction in economic losses. Initial DNA sequence information also facilitated the development of a simple molecular serotype protocol which allowed for the identification of major disease causing serotypes of L. monocytogenes, including a clade of 4b variant (4bV) strains of L. monocytogenes involved in at least 3 more recent listeriosis outbreaks in the US. Furthermore, data generated using whole genome sequence (WGS) analyses was successfully utilized to develop a pan-genomic DNA microarray as well as a single nucleotide polymorphism (SNP) based analysis. Herein, we present and compare, the two recently developed sub-typing technologies and discuss how these methods are not only important in outbreak investigations, but could also shed light on possible adaptations to different foods and environments.
Collapse
Affiliation(s)
- Atin R Datta
- Center for Food Safety and Applied Nutrition, Food and Drug Administration, Laurel, MD, 20708, USA.
| | - Laurel S Burall
- Center for Food Safety and Applied Nutrition, Food and Drug Administration, Laurel, MD, 20708, USA
| |
Collapse
|
45
|
Burall LS, Grim CJ, Datta AR. A clade of Listeria monocytogenes serotype 4b variant strains linked to recent listeriosis outbreaks associated with produce from a defined geographic region in the US. PLoS One 2017; 12:e0176912. [PMID: 28464038 PMCID: PMC5413027 DOI: 10.1371/journal.pone.0176912] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2017] [Accepted: 04/19/2017] [Indexed: 11/19/2022] Open
Abstract
Four listeriosis incidences/outbreaks, spanning 19 months, have been linked to Listeria monocytogenes serotype 4b variant (4bV) strains. Three of these incidents can be linked to a defined geographical region, while the fourth is likely to be linked. In this study, whole genome sequencing (WGS) of strains from these incidents was used for genomic comparisons using two approached. The first was JSpecies tetramer, which analyzed tetranucleotide frequency to assess relatedness. The second, the CFSAN SNP Pipeline, was used to perform WGS SNP analyses against three different reference genomes to evaluate relatedness by SNP distances. In each case, unrelated strains were included as controls. The analyses showed that strains from these incidents form a highly related clade with SNP differences of ≤101 within the clade and >9000 against other strains. Multi-Virulence-Locus Sequence Typing, a third standardized approach for evaluation relatedness, was used to assess the genetic drift in six conserved, known virulence loci and showed a different clustering pattern indicating possible differences in selection pressure experienced by these genes. These data suggest a high degree of relatedness among these 4bV strains linked to a defined geographic region and also highlight the possibility of alterations related to adaptation and virulence.
Collapse
Affiliation(s)
- Laurel S. Burall
- Center for Food Safety and Applied Nutrition, Food and Drug Administration Laurel, Maryland, United States of America
- * E-mail: (LSB); (ARD)
| | - Christopher J. Grim
- Center for Food Safety and Applied Nutrition, Food and Drug Administration Laurel, Maryland, United States of America
| | - Atin R. Datta
- Center for Food Safety and Applied Nutrition, Food and Drug Administration Laurel, Maryland, United States of America
- * E-mail: (LSB); (ARD)
| |
Collapse
|
46
|
TreeToReads - a pipeline for simulating raw reads from phylogenies. BMC Bioinformatics 2017; 18:178. [PMID: 28320310 PMCID: PMC5359950 DOI: 10.1186/s12859-017-1592-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2016] [Accepted: 03/10/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Using phylogenomic analysis tools for tracking pathogens has become standard practice in academia, public health agencies, and large industries. Using the same raw read genomic data as input, there are several different approaches being used to infer phylogenetic tree. These include many different SNP pipelines, wgMLST approaches, k-mer algorithms, whole genome alignment and others; each of these has advantages and disadvantages, some have been extensively validated, some are faster, some have higher resolution. A few of these analysis approaches are well-integrated into the regulatory process of US Federal agencies (e.g. the FDA's SNP pipeline for tracking foodborne pathogens). However, despite extensive validation on benchmark datasets and comparison with other pipelines, we lack methods for fully exploring the effects of multiple parameter values in each pipeline that can potentially have an effect on whether the correct phylogenetic tree is recovered. RESULTS To resolve this problem, we offer a program, TreeToReads, which can generate raw read data from mutated genomes simulated under a known phylogeny. This simulation pipeline allows direct comparisons of simulated and observed data in a controlled environment. At each step of these simulations, researchers can vary parameters of interest (e.g., input tree topology, amount of sequence divergence, rate of indels, read coverage, distance of reference genome, etc) to assess the effects of various parameter values on correctly calling SNPs and reconstructing an accurate tree. CONCLUSIONS Such critical assessments of the accuracy and robustness of analytical pipelines are essential to progress in both research and applied settings.
Collapse
|
47
|
Nasheri N, Petronella N, Ronholm J, Bidawid S, Corneau N. Characterization of the Genomic Diversity of Norovirus in Linked Patients Using a Metagenomic Deep Sequencing Approach. Front Microbiol 2017; 8:73. [PMID: 28197136 PMCID: PMC5282449 DOI: 10.3389/fmicb.2017.00073] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2016] [Accepted: 01/11/2017] [Indexed: 01/14/2023] Open
Abstract
Norovirus (NoV) is the leading cause of gastroenteritis worldwide. A robust cell culture system does not exist for NoV and therefore detailed characterization of outbreak and sporadic strains relies on molecular techniques. In this study, we employed a metagenomic approach that uses non-specific amplification followed by next-generation sequencing to whole genome sequence NoV genomes directly from clinical samples obtained from 8 linked patients. Enough sequencing depth was obtained for each sample to use a de novo assembly of near-complete genome sequences. The resultant consensus sequences were then used to identify inter-host nucleotide variations that occur after direct transmission, analyze amino acid variations in the major capsid protein, and provide evidence of recombination events. The analysis of intra-host quasispecies diversity was possible due to high coverage-depth. We also observed a linear relationship between NoV viral load in the clinical sample and the number of sequence reads that could be attributed to NoV. The method demonstrated here has the potential for future use in whole genome sequence analyses of other RNA viruses isolated from clinical, environmental, and food specimens.
Collapse
Affiliation(s)
- Neda Nasheri
- National Food Virology Reference Centre, Bureau of Microbial Hazards, Food Directorate, Health Canada Ottawa, ON, Canada
| | - Nicholas Petronella
- Biostatistics and Modeling Division, Bureau of Food Surveillance and Science Integration, Food Directorate, Health Canada Ottawa, ON, Canada
| | - Jennifer Ronholm
- Department of Food Science and Agricultural Chemistry, Faculty of Agricultural and Environmental Sciences, Macdonald Campus, McGill UniversityMontreal, QC, Canada; Department of Animal Science, Faculty of Agricultural and Environmental Sciences, Macdonald Campus, McGill UniversityMontreal, QC, Canada
| | - Sabah Bidawid
- National Food Virology Reference Centre, Bureau of Microbial Hazards, Food Directorate, Health Canada Ottawa, ON, Canada
| | - Nathalie Corneau
- National Food Virology Reference Centre, Bureau of Microbial Hazards, Food Directorate, Health Canada Ottawa, ON, Canada
| |
Collapse
|
48
|
Chan CH, Octavia S, Sintchenko V, Lan R. SnpFilt: A pipeline for reference-free assembly-based identification of SNPs in bacterial genomes. Comput Biol Chem 2016; 65:178-184. [DOI: 10.1016/j.compbiolchem.2016.09.004] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2016] [Accepted: 09/07/2016] [Indexed: 10/21/2022]
|
49
|
Moura A, Criscuolo A, Pouseele H, Maury MM, Leclercq A, Tarr C, Björkman JT, Dallman T, Reimer A, Enouf V, Larsonneur E, Carleton H, Bracq-Dieye H, Katz LS, Jones L, Touchon M, Tourdjman M, Walker M, Stroika S, Cantinelli T, Chenal-Francisque V, Kucerova Z, Rocha EPC, Nadon C, Grant K, Nielsen EM, Pot B, Gerner-Smidt P, Lecuit M, Brisse S. Whole genome-based population biology and epidemiological surveillance of Listeria monocytogenes. Nat Microbiol 2016; 2:16185. [PMID: 27723724 DOI: 10.1038/nmicrobiol.2016.185] [Citation(s) in RCA: 455] [Impact Index Per Article: 50.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2016] [Accepted: 08/30/2016] [Indexed: 01/31/2023]
Abstract
Listeria monocytogenes (Lm) is a major human foodborne pathogen. Numerous Lm outbreaks have been reported worldwide and associated with a high case fatality rate, reinforcing the need for strongly coordinated surveillance and outbreak control. We developed a universally applicable genome-wide strain genotyping approach and investigated the population diversity of Lm using 1,696 isolates from diverse sources and geographical locations. We define, with unprecedented precision, the population structure of Lm, demonstrate the occurrence of international circulation of strains and reveal the extent of heterogeneity in virulence and stress resistance genomic features among clinical and food isolates. Using historical isolates, we show that the evolutionary rate of Lm from lineage I and lineage II is low (∼2.5 × 10-7 substitutions per site per year, as inferred from the core genome) and that major sublineages (corresponding to so-called 'epidemic clones') are estimated to be at least 50-150 years old. This work demonstrates the urgent need to monitor Lm strains at the global level and provides the unified approach needed for global harmonization of Lm genome-based typing and population biology.
Collapse
Affiliation(s)
- Alexandra Moura
- National Reference Centre and World Health Organization Collaborating Center for Listeria, Institut Pasteur, 75724 Paris, France.,Biology of Infection Unit, Institut Pasteur, 75724 Paris, France.,Inserm U1117, 75015 Paris, France.,Microbial Evolutionary Genomics Unit, Institut Pasteur, 75724 Paris, France.,CNRS, UMR 3525, 75015 Paris, France
| | - Alexis Criscuolo
- Institut Pasteur-Hub Bioinformatique et Biostatistique-C3BI, USR 3756 IP CNRS, 75724 Paris, France
| | | | - Mylène M Maury
- National Reference Centre and World Health Organization Collaborating Center for Listeria, Institut Pasteur, 75724 Paris, France.,Biology of Infection Unit, Institut Pasteur, 75724 Paris, France.,Inserm U1117, 75015 Paris, France.,Microbial Evolutionary Genomics Unit, Institut Pasteur, 75724 Paris, France.,CNRS, UMR 3525, 75015 Paris, France.,Sorbonne Paris Cité, Cellule Pasteur, Paris Diderot University, 75013 Paris, France
| | - Alexandre Leclercq
- National Reference Centre and World Health Organization Collaborating Center for Listeria, Institut Pasteur, 75724 Paris, France.,Biology of Infection Unit, Institut Pasteur, 75724 Paris, France
| | - Cheryl Tarr
- Centers for Disease Control and Prevention, Atlanta, Georgia 30333, USA
| | | | | | - Aleisha Reimer
- Public Health Agency of Canada, Winnipeg, Manitoba R3E 3R2, Canada
| | - Vincent Enouf
- Pasteur International Bioresources network (PIBnet), Mutualized Microbiology Platform (P2M), Institut Pasteur, 75724 Paris, France
| | - Elise Larsonneur
- Microbial Evolutionary Genomics Unit, Institut Pasteur, 75724 Paris, France.,Institut Pasteur-Hub Bioinformatique et Biostatistique-C3BI, USR 3756 IP CNRS, 75724 Paris, France.,CNRS, UMS 3601 IFB-Core, 91198 Gif-sur-Yvette, France
| | - Heather Carleton
- Centers for Disease Control and Prevention, Atlanta, Georgia 30333, USA
| | - Hélène Bracq-Dieye
- National Reference Centre and World Health Organization Collaborating Center for Listeria, Institut Pasteur, 75724 Paris, France.,Biology of Infection Unit, Institut Pasteur, 75724 Paris, France
| | - Lee S Katz
- Centers for Disease Control and Prevention, Atlanta, Georgia 30333, USA
| | - Louis Jones
- Institut Pasteur-Hub Bioinformatique et Biostatistique-C3BI, USR 3756 IP CNRS, 75724 Paris, France
| | - Marie Touchon
- Microbial Evolutionary Genomics Unit, Institut Pasteur, 75724 Paris, France.,CNRS, UMR 3525, 75015 Paris, France
| | | | - Matthew Walker
- Public Health Agency of Canada, Winnipeg, Manitoba R3E 3R2, Canada
| | - Steven Stroika
- Centers for Disease Control and Prevention, Atlanta, Georgia 30333, USA
| | - Thomas Cantinelli
- National Reference Centre and World Health Organization Collaborating Center for Listeria, Institut Pasteur, 75724 Paris, France
| | - Viviane Chenal-Francisque
- National Reference Centre and World Health Organization Collaborating Center for Listeria, Institut Pasteur, 75724 Paris, France
| | - Zuzana Kucerova
- Centers for Disease Control and Prevention, Atlanta, Georgia 30333, USA
| | - Eduardo P C Rocha
- Microbial Evolutionary Genomics Unit, Institut Pasteur, 75724 Paris, France.,CNRS, UMR 3525, 75015 Paris, France
| | - Celine Nadon
- Public Health Agency of Canada, Winnipeg, Manitoba R3E 3R2, Canada
| | | | | | - Bruno Pot
- Applied-Maths, 9830 Sint-Martens-Latem, Belgium
| | | | - Marc Lecuit
- National Reference Centre and World Health Organization Collaborating Center for Listeria, Institut Pasteur, 75724 Paris, France.,Biology of Infection Unit, Institut Pasteur, 75724 Paris, France.,Inserm U1117, 75015 Paris, France.,Sorbonne Paris Cité, Institut Imagine, 75006 Paris, Necker-Enfants Malades University Hospital, Division of Infectious Diseases and Tropical Medicine, APHP, Paris Descartes University, 75015 Paris, France
| | - Sylvain Brisse
- Microbial Evolutionary Genomics Unit, Institut Pasteur, 75724 Paris, France.,CNRS, UMR 3525, 75015 Paris, France
| |
Collapse
|
50
|
Ronholm J, Nasheri N, Petronella N, Pagotto F. Navigating Microbiological Food Safety in the Era of Whole-Genome Sequencing. Clin Microbiol Rev 2016; 29:837-57. [PMID: 27559074 PMCID: PMC5010751 DOI: 10.1128/cmr.00056-16] [Citation(s) in RCA: 96] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
The epidemiological investigation of a foodborne outbreak, including identification of related cases, source attribution, and development of intervention strategies, relies heavily on the ability to subtype the etiological agent at a high enough resolution to differentiate related from nonrelated cases. Historically, several different molecular subtyping methods have been used for this purpose; however, emerging techniques, such as single nucleotide polymorphism (SNP)-based techniques, that use whole-genome sequencing (WGS) offer a resolution that was previously not possible. With WGS, unlike traditional subtyping methods that lack complete information, data can be used to elucidate phylogenetic relationships and disease-causing lineages can be tracked and monitored over time. The subtyping resolution and evolutionary context provided by WGS data allow investigators to connect related illnesses that would be missed by traditional techniques. The added advantage of data generated by WGS is that these data can also be used for secondary analyses, such as virulence gene detection, antibiotic resistance gene profiling, synteny comparisons, mobile genetic element identification, and geographic attribution. In addition, several software packages are now available to generate in silico results for traditional molecular subtyping methods from the whole-genome sequence, allowing for efficient comparison with historical databases. Metagenomic approaches using next-generation sequencing have also been successful in the detection of nonculturable foodborne pathogens. This review addresses state-of-the-art techniques in microbial WGS and analysis and then discusses how this technology can be used to help support food safety investigations. Retrospective outbreak investigations using WGS are presented to provide organism-specific examples of the benefits, and challenges, associated with WGS in comparison to traditional molecular subtyping techniques.
Collapse
Affiliation(s)
- J Ronholm
- Bureau of Microbial Hazards, Food Directorate, Health Canada, Ottawa, ON, Canada
| | - Neda Nasheri
- Bureau of Microbial Hazards, Food Directorate, Health Canada, Ottawa, ON, Canada
| | - Nicholas Petronella
- Biostatistics and Modelling Division, Bureau of Food Surveillance and Science Integration, Food Directorate, Health Canada, Ottawa, ON, Canada
| | - Franco Pagotto
- Bureau of Microbial Hazards, Food Directorate, Health Canada, Ottawa, ON, Canada Listeriosis Reference Centre, Bureau of Microbial Hazards, Food Directorate, Health Canada, Ottawa, ON, Canada
| |
Collapse
|