1
|
Castelli P, De Ruvo A, Bucciacchio A, D'Alterio N, Cammà C, Di Pasquale A, Radomski N. Harmonization of supervised machine learning practices for efficient source attribution of Listeria monocytogenes based on genomic data. BMC Genomics 2023; 24:560. [PMID: 37736708 PMCID: PMC10515079 DOI: 10.1186/s12864-023-09667-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 09/10/2023] [Indexed: 09/23/2023] Open
Abstract
BACKGROUND Genomic data-based machine learning tools are promising for real-time surveillance activities performing source attribution of foodborne bacteria such as Listeria monocytogenes. Given the heterogeneity of machine learning practices, our aim was to identify those influencing the source prediction performance of the usual holdout method combined with the repeated k-fold cross-validation method. METHODS A large collection of 1 100 L. monocytogenes genomes with known sources was built according to several genomic metrics to ensure authenticity and completeness of genomic profiles. Based on these genomic profiles (i.e. 7-locus alleles, core alleles, accessory genes, core SNPs and pan kmers), we developed a versatile workflow assessing prediction performance of different combinations of training dataset splitting (i.e. 50, 60, 70, 80 and 90%), data preprocessing (i.e. with or without near-zero variance removal), and learning models (i.e. BLR, ERT, RF, SGB, SVM and XGB). The performance metrics included accuracy, Cohen's kappa, F1-score, area under the curves from receiver operating characteristic curve, precision recall curve or precision recall gain curve, and execution time. RESULTS The testing average accuracies from accessory genes and pan kmers were significantly higher than accuracies from core alleles or SNPs. While the accuracies from 70 and 80% of training dataset splitting were not significantly different, those from 80% were significantly higher than the other tested proportions. The near-zero variance removal did not allow to produce results for 7-locus alleles, did not impact significantly the accuracy for core alleles, accessory genes and pan kmers, and decreased significantly accuracy for core SNPs. The SVM and XGB models did not present significant differences in accuracy between each other and reached significantly higher accuracies than BLR, SGB, ERT and RF, in this order of magnitude. However, the SVM model required more computing power than the XGB model, especially for high amount of descriptors such like core SNPs and pan kmers. CONCLUSIONS In addition to recommendations about machine learning practices for L. monocytogenes source attribution based on genomic data, the present study also provides a freely available workflow to solve other balanced or unbalanced multiclass phenotypes from binary and categorical genomic profiles of other microorganisms without source code modifications.
Collapse
Affiliation(s)
- Pierluigi Castelli
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy
| | - Andrea De Ruvo
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy
| | - Andrea Bucciacchio
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy
| | - Nicola D'Alterio
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy
| | - Cesare Cammà
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy
| | - Adriano Di Pasquale
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy
| | - Nicolas Radomski
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy.
| |
Collapse
|
2
|
Maguire M, Ramachandran P, Tallent S, Mammel MK, Brown EW, Allard MW, Musser SM, González-Escalona N. Precision metagenomics sequencing for food safety: hybrid assembly of Shiga toxin-producing Escherichia coli in enriched agricultural water. Front Microbiol 2023; 14:1221668. [PMID: 37720160 PMCID: PMC10500926 DOI: 10.3389/fmicb.2023.1221668] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Accepted: 08/04/2023] [Indexed: 09/19/2023] Open
Abstract
Culture-independent metagenomic sequencing of enriched agricultural water could expedite the detection and virulotyping of Shiga toxin-producing Escherichia coli (STEC). We previously determined the limits of a complete, closed metagenome-assembled genome (MAG) assembly and of a complete, fragmented MAG assembly for O157:H7 in enriched agricultural water using long reads (Oxford Nanopore Technologies, Oxford), which were 107 and 105 CFU/ml, respectively. However, the nanopore assemblies did not have enough accuracy to be used in Single Nucleotide Polymorphism (SNP) phylogenies and cannot be used for the precise identification of an outbreak STEC strain. The present study aimed to determine the limits of detection and assembly for STECs in enriched agricultural water by Illumina MiSeq sequencing technology alone, followed by establishing the limit of hybrid assembly with nanopore long-read sequencing using three different hybrid assemblers (SPAdes, Unicycler, and OPERA-MS). We also aimed to generate a genome with enough accuracy to be used in a SNP phylogeny. The classification of MiSeq and nanopore sequencing identified the same highly abundant species. Using the totality of the MiSeq output and a precision metagenomics approach in which the E. coli reads are binned before assembly, the limit of detection and assembly of STECs by MiSeq were determined to be 105 and 107 CFU/ml, respectively. While a complete, closed MAG could not be generated at any concentration, a complete, fragmented MAG was produced using the SPAdes assembler with an STEC concentration of at least 107 CFU/ml. At this concentration, hybrid assembled contigs aligned to the nanopore-assembled genome could be accurately placed in a neighbor-joining tree. The MiSeq limit of detection and assembly was less sensitive than nanopore sequencing, which was likely due to factors including the small starting material (50 vs. 1 μg) and the dilution of the library loaded on the cartridge. This pilot study demonstrates that MiSeq sequencing requires higher coverage in precision metagenomic samples; however, with sufficient concentration, STECs can be characterized and phylogeny can be accurately determined.
Collapse
Affiliation(s)
- Meghan Maguire
- Center for Food Safety and Applied Nutrition, Office of Regulatory Science, College Park, MD, United States
| | - Padmini Ramachandran
- Center for Food Safety and Applied Nutrition, Office of Regulatory Science, College Park, MD, United States
| | - Sandra Tallent
- Center for Food Safety and Applied Nutrition, Office of Regulatory Science, College Park, MD, United States
| | - Mark K. Mammel
- Office of Applied Research and Safety Assessment, Food and Drug Administration, College Park, MD, United States
| | - Eric W. Brown
- Center for Food Safety and Applied Nutrition, Office of Regulatory Science, College Park, MD, United States
| | - Marc W. Allard
- Center for Food Safety and Applied Nutrition, Office of Regulatory Science, College Park, MD, United States
| | - Steven M. Musser
- Center for Food Safety and Applied Nutrition, Office of Regulatory Science, College Park, MD, United States
| | - Narjol González-Escalona
- Center for Food Safety and Applied Nutrition, Office of Regulatory Science, College Park, MD, United States
| |
Collapse
|
3
|
Bağcı C, Albrecht B, Huson DH. MAIRA: Protein-based Analysis of MinION Reads on a Laptop. Methods Mol Biol 2023; 2649:223-234. [PMID: 37258865 DOI: 10.1007/978-1-0716-3072-3_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Third-generation sequencing technologies are being increasingly used in microbiome research and this has given rise to new challenges in computational microbiome analysis. Oxford Nanopore's MinION is a portable sequencer that streams data that can be basecalled on-the-fly. Here we give an introduction to the MAIRA software, which is designed to analyze MinION sequencing reads from a microbiome sample, as they are produced in real-time, on a laptop. The software processes reads in batches and updates the presented analysis after each batch. There are two analysis steps: First, protein alignments are calculated to determine which genera might be present in a sample. When strong evidence for a genus is found, then, in a second step, a more detailed analysis is performed by aligning the reads against the proteins of all species in the detected genus. The program presents a detailed analysis of species, antibiotic resistance genes, and virulence factors.
Collapse
Affiliation(s)
- Caner Bağcı
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany.
- International Max Planck Research School "From Molecules to Organisms", Max Planck Institute for Developmental Biology, Tübingen, Germany.
| | | | - Daniel H Huson
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany.
| |
Collapse
|
4
|
Banerjee G, Agarwal S, Marshall A, Jones DH, Sulaiman IM, Sur S, Banerjee P. Application of advanced genomic tools in food safety rapid diagnostics: challenges and opportunities. Curr Opin Food Sci 2022. [DOI: 10.1016/j.cofs.2022.100886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
5
|
Azinheiro S, Roumani F, Costa-Ribeiro A, Prado M, Garrido-Maestu A. Application of MinION sequencing as a tool for the rapid detection and characterization of Listeria monocytogenes in smoked salmon. Front Microbiol 2022; 13:931810. [PMID: 36033887 PMCID: PMC9399719 DOI: 10.3389/fmicb.2022.931810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Accepted: 07/15/2022] [Indexed: 11/13/2022] Open
Abstract
Microbial pathogens may be present in different types of foods, and hence the development of novel methods to assure consumers' safeness is of great interest. Molecular methods are known to provide sensitive and rapid results; however, they are typically targeted approaches. In recent years, the advent of non-targeted approaches based on next-generation sequencing (NGS) has emerged as a rational way to proceed. This technology allows for the detection of several pathogens simultaneously. Furthermore, with the same set of data, it is possible to characterize the microorganisms in terms of serotype, virulence, and/ or resistance genes, among other molecular features. In the current study, a novel method for the detection of Listeria monocytogenes based on the "quasimetagenomics" approach was developed. Different enrichment media and immunomagnetic separation (IMS) strategies were compared to determine the best approach in terms of L. monocytogenes sequences generated from smoked salmon samples. Finally, the data generated were analyzed with a user-friendly workflow that simultaneously provided the species identification, serotype, and antimicrobial resistance genes. The new method was thoroughly evaluated against a culture-based approach, using smoked salmon inoculated with L. monocytogenes as the matrix of choice. The sequencing method reached a very low limit of detection (LOD50, 1.2 CFU/ 25 g) along with high diagnostic sensitivity and specificity (100%), and a perfect correlation with the culture-based method (Cohen's k = 1.00). Overall, the proposed method overcomes all the major limitations reported for the implementation of NGS as a routine food testing technology and paves the way for future developments taking its advantage into consideration.
Collapse
Affiliation(s)
- Sarah Azinheiro
- Food Quality and Safety Research Group, International Iberian Nanotechnology Laboratory, Braga, Portugal
- Department of Analytical Chemistry, Nutrition and Food Science, Faculty of Veterinary Science, University of Santiago de Compostela, Lugo, Spain
| | - Foteini Roumani
- Food Quality and Safety Research Group, International Iberian Nanotechnology Laboratory, Braga, Portugal
- Department of Analytical Chemistry, Nutrition and Food Science, Faculty of Veterinary Science, University of Santiago de Compostela, Lugo, Spain
| | - Ana Costa-Ribeiro
- Food Quality and Safety Research Group, International Iberian Nanotechnology Laboratory, Braga, Portugal
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain
| | - Marta Prado
- Food Quality and Safety Research Group, International Iberian Nanotechnology Laboratory, Braga, Portugal
| | - Alejandro Garrido-Maestu
- Food Quality and Safety Research Group, International Iberian Nanotechnology Laboratory, Braga, Portugal
| |
Collapse
|
6
|
Lourenco A, Linke K, Wagner M, Stessl B. The Saprophytic Lifestyle of Listeria monocytogenes and Entry Into the Food-Processing Environment. Front Microbiol 2022; 13:789801. [PMID: 35350628 PMCID: PMC8957868 DOI: 10.3389/fmicb.2022.789801] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Accepted: 02/03/2022] [Indexed: 11/13/2022] Open
Abstract
Listeria monocytogenes is an environmentally adapted saprophyte that can change into a human and animal bacterial pathogen with zoonotic potential through several regulatory systems. In this review, the focus is on the occurrence of Listeria sensu stricto and sensu lato in different ecological niches, the detection methods, and their analytical limitations. It also highlights the occurrence of L. monocytogenes genotypes in the environment (soil, water, and wildlife), reflects on the molecular determinants of L. monocytogenes for the saprophytic lifestyle and the potential for antibiotic resistance. In particular, the strain-specific properties with which some genotypes circulate in wastewater, surface water, soil, wildlife, and agricultural environments are of particular interest for the continuously updating risk analysis.
Collapse
Affiliation(s)
- Antonio Lourenco
- Department of Food Biosciences, Teagasc Food Research Centre, Co. Cork, Ireland
- Unit for Food Microbiology, Institute for Food Safety, Food Technology and Veterinary Public Health, University of Veterinary Medicine, Vienna, Austria
| | - Kristina Linke
- Unit for Food Microbiology, Institute for Food Safety, Food Technology and Veterinary Public Health, University of Veterinary Medicine, Vienna, Austria
| | - Martin Wagner
- Unit for Food Microbiology, Institute for Food Safety, Food Technology and Veterinary Public Health, University of Veterinary Medicine, Vienna, Austria
- Austrian Competence Center for Feed and Food Quality, Safety and Innovation, Tulln, Austria
| | - Beatrix Stessl
- Unit for Food Microbiology, Institute for Food Safety, Food Technology and Veterinary Public Health, University of Veterinary Medicine, Vienna, Austria
| |
Collapse
|
7
|
Leonard SR, Simko I, Mammel MK, Richter TKS, Brandl MT. Seasonality, shelf life and storage atmosphere are main drivers of the microbiome and E. coli O157:H7 colonization of post-harvest lettuce cultivated in a major production area in California. ENVIRONMENTAL MICROBIOME 2021; 16:25. [PMID: 34930479 PMCID: PMC8686551 DOI: 10.1186/s40793-021-00393-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Accepted: 11/30/2021] [Indexed: 05/10/2023]
Abstract
BACKGROUND Lettuce is linked to recurrent outbreaks of Shiga toxin-producing Escherichia coli (STEC) infections, the seasonality of which remains unresolved. Infections have occurred largely from processed lettuce, which undergoes substantial physiological changes during storage. We investigated the microbiome and STEC O157:H7 (EcO157) colonization of fresh-cut lettuce of two cultivars with long and short shelf life harvested in the spring and fall in California and stored in modified atmosphere packaging (MAP) at cold and warm temperatures. RESULTS Inoculated EcO157 declined significantly less on the cold-stored cultivar with short shelf life, while multiplying rapidly at 24 °C independently of cultivar. Metagenomic sequencing of the lettuce microbiome revealed that the pre-storage bacterial community was variable but dominated by species in the Erwiniaceae and Pseudomonadaceae. After cold storage, the microbiome composition differed between cultivars, with a greater relative abundance (RA) of Erwiniaceae and Yersiniaceae on the cultivar with short shelf life. Storage at 24 °C shifted the microbiome to higher RAs of Erwiniaceae and Enterobacteriaceae and lower RA of Pseudomonadaceae compared with 6 °C. Fall harvest followed by lettuce deterioration were identified by recursive partitioning as important factors associated with high EcO157 survival at 6 °C, whereas elevated package CO2 levels correlated with high EcO157 multiplication at 24 °C. EcO157 population change correlated with the lettuce microbiome during 6 °C storage, with fall microbiomes supporting the greatest EcO157 survival on both cultivars. Fall and spring microbiomes differed before and during storage at both temperatures. High representation of Pantoea agglomerans was a predictor of fall microbiomes, lettuce deterioration, and enhanced EcO157 survival at 6 °C. In contrast, higher RAs of Erwinia persicina, Rahnella aquatilis, and Serratia liquefaciens were biomarkers of spring microbiomes and lower EcO157 survival. CONCLUSIONS The microbiome of processed MAP lettuce evolves extensively during storage. Under temperature abuse, high CO2 promotes a lettuce microbiome enriched in taxa with anaerobic capability and EcO157 multiplication. In cold storage, our results strongly support a role for season and lettuce deterioration in EcO157 survival and microbiome composition, suggesting that the physiology and microbiomes of fall- and spring-harvested lettuce may contribute to the seasonality of STEC outbreaks associated with lettuce grown in coastal California.
Collapse
Affiliation(s)
- Susan R Leonard
- Office of Applied Research and Safety Assessment, Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration, Laurel, MD, USA
| | - Ivan Simko
- Crop Improvement and Protection Research Unit, US Department of Agriculture, Agricultural Research Service, Salinas, CA, USA
| | - Mark K Mammel
- Office of Applied Research and Safety Assessment, Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration, Laurel, MD, USA
| | - Taylor K S Richter
- Office of Applied Research and Safety Assessment, Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration, Laurel, MD, USA
| | - Maria T Brandl
- Produce Safety and Microbiology Research Unit, US Department of Agriculture, Agricultural Research Service, Albany, CA, USA.
| |
Collapse
|
8
|
Surveillance of Listeria monocytogenes: Early Detection, Population Dynamics, and Quasimetagenomic Sequencing during Selective Enrichment. Appl Environ Microbiol 2021; 87:e0177421. [PMID: 34613762 PMCID: PMC8612253 DOI: 10.1128/aem.01774-21] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
In this study, we addressed different aspects regarding the implementation of quasimetagenomic sequencing as a hybrid surveillance method in combination with enrichment for early detection of Listeria monocytogenes in the food industry. Different experimental enrichment cultures were used, comprising seven L. monocytogenes strains of different sequence types (STs), with and without a background microbiota community. To assess whether the proportions of the different STs changed over time during enrichment, the growth and population dynamics were assessed using dapE colony sequencing and dapE and 16S rRNA amplicon sequencing. There was a tendency of some STs to have a higher relative abundance during the late stage of enrichment when L. monocytogenes was enriched without background microbiota. When coenriched with background microbiota, the population dynamics of the different STs was more consistent over time. To evaluate the earliest possible time point during enrichment that allows the detection of L. monocytogenes and at the same time the generation of genetic information that enables an estimation regarding the strain diversity in a sample, quasimetagenomic sequencing was performed early during enrichment in the presence of the background microbiota using Oxford Nanopore Technologies Flongle and Illumina MiSeq sequencing. The application of multiple displacement amplification (MDA) enabled detection of L. monocytogenes (and the background microbiota) after only 4 h of enrichment using both applied sequencing approaches. The MiSeq sequencing data additionally enabled the prediction of cooccurring L. monocytogenes strains in the samples. IMPORTANCE We showed that a combination of a short primary enrichment combined with MDA and Nanopore sequencing can accelerate the traditional process of cultivation and identification of L. monocytogenes. The use of Illumina MiSeq sequencing additionally allowed us to predict the presence of cooccurring L. monocytogenes strains. Our results suggest quasimetagenomic sequencing is a valuable and promising hybrid surveillance tool for the food industry that enables faster identification of L. monocytogenes during early enrichment. Routine application of this approach could lead to more efficient and proactive actions in the food industry that prevent contamination and subsequent product recalls and food destruction, economic and reputational losses, and human listeriosis cases.
Collapse
|