51
|
Abstract
Given the popularity and elegance of k-mer-based tools, finding a space-efficient way to represent a set of k-mers is important for improving the scalability of bioinformatics analyses. One popular approach is to convert the set of k-mers into the more compact set of unitigs. We generalize this approach and formulate it as the problem of finding a smallest spectrum-preserving string set (SPSS) representation. We show that this problem is equivalent to finding a smallest path cover in a compacted de Bruijn graph. Using this reduction, we prove a lower bound on the size of the optimal SPSS and propose a greedy method called UST (Unitig-STitch) that results in a smaller representation than unitigs and is nearly optimal with respect to our lower bound. We demonstrate the usefulness of the SPSS formulation with two applications of UST. The first one is a compression algorithm, UST-Compress, which, we show, can store a set of k-mers by using an order-of-magnitude less disk space than other lossless compression tools. The second one is an exact static k-mer membership index, UST-FM, which, we show, improves index size by 10%-44% compared with other state-of-the-art low-memory indices.
Collapse
Affiliation(s)
- Amatur Rahman
- Department of Computer Science and Engineering, Penn State, University Park, State College, PA, USA
| | - Paul Medevedev
- Department of Computer Science and Engineering, Penn State, University Park, State College, PA, USA
- Department of Biochemistry and Molecular Biology, Penn State, University Park, State College, PA, USA
- Center for Computational Biology and Bioinformatics, Penn State, University Park, State College, PA, USA
| |
Collapse
|
52
|
Yi H, Lin Y, Lin C, Jin W. Kssd: sequence dimensionality reduction by k-mer substring space sampling enables real-time large-scale datasets analysis. Genome Biol 2021; 22:84. [PMID: 33726811 PMCID: PMC7962209 DOI: 10.1186/s13059-021-02303-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 02/23/2021] [Indexed: 11/10/2022] Open
Abstract
Here, we develop k -mer substring space decomposition (Kssd), a sketching technique which is significantly faster and more accurate than current sketching methods. We show that it is the only method that can be used for large-scale dataset comparisons at population resolution on simulated and real data. Using Kssd, we prioritize references for all 1,019,179 bacteria whole genome sequencing (WGS) runs from NCBI Sequence Read Archive and find misidentification or contamination in 6164 of these. Additionally, we analyze WGS and exome runs of samples from the 1000 Genomes Project.
Collapse
Affiliation(s)
- Huiguang Yi
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, 518055 Guangdong China
- Institute of Life Sciences, Southeast University, Nanjing, 210096 Jiangsu China
| | - Yanling Lin
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, 518055 Guangdong China
| | - Chengqi Lin
- Institute of Life Sciences, Southeast University, Nanjing, 210096 Jiangsu China
| | - Wenfei Jin
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, 518055 Guangdong China
| |
Collapse
|
53
|
Li W, Wang A. Genomic islands mediate environmental adaptation and the spread of antibiotic resistance in multiresistant Enterococci - evidence from genomic sequences. BMC Microbiol 2021; 21:55. [PMID: 33602143 PMCID: PMC7893910 DOI: 10.1186/s12866-021-02114-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Accepted: 02/02/2021] [Indexed: 11/10/2022] Open
Abstract
Background Genomic islands (GIs) play an important role in the chromosome diversity of Enterococcus. In the current study, we aimed to investigate the spread of GIs between Enterococcus strains and their correlation with antibiotic resistance genes (ARGs). Bitsliced Genomic Signature Indexes (BIGSI) were used to screen the NCBI Sequence Read Archive (SRA) for multiple resistant Enterococcus. A total of 37 pairs of raw reads were screened from 457,000 whole-genome sequences (WGS) in the SRA database, which come from 37 Enterococci distributed in eight countries. These raw reads were assembled for the prediction and analysis of GIs, ARGs, plasmids and prophages. Results The results showed that GIs were universal in Enterococcus, with an average of 3.2 GIs in each strain. Network analysis showed that frequent genetic information exchanges mediated by GIs occurred between Enterococcus strains. Seven antibiotic-resistant genomic islands (ARGIs) were found to carry one to three ARGs, mdtG, tetM, dfrG, lnuG, and fexA, in six strains. These ARGIs were involved in the spread of antibiotic resistance in 45.9% of the 37 strains, although there was no significant positive correlation between the frequency of GI exchanges and the number of ARGs each strain harboured (r = 0. 287, p = 0.085). After comprehensively analysing the genome data, we found that partial GIs were associated with multiple mobile genetic elements (transposons, integrons, prophages and plasmids) and had potential natural transformation characteristics. Conclusions All of these results based on genomic sequencing suggest that GIs might mediate the acquisition of some ARGs and might be involved in the high genome plasticity of Enterococcus through transformation, transduction and conjugation, thus providing a fitness advantage for Enterococcus hosts under complex environmental factors. Supplementary Information The online version contains supplementary material available at 10.1186/s12866-021-02114-4.
Collapse
Affiliation(s)
- Weiwei Li
- School of Life Science,
- Ludong University, Yantai, 264025, China.
| | - Ailan Wang
- School of Life Science,
- Ludong University, Yantai, 264025, China
| |
Collapse
|
54
|
Horesh G, Blackwell GA, Tonkin-Hill G, Corander J, Heinz E, Thomson NR. A comprehensive and high-quality collection of Escherichia coli genomes and their genes. Microb Genom 2021; 7:000499. [PMID: 33417534 PMCID: PMC8208696 DOI: 10.1099/mgen.0.000499] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 12/07/2020] [Indexed: 01/25/2023] Open
Abstract
Escherichia coli is a highly diverse organism that includes a range of commensal and pathogenic variants found across a range of niches and worldwide. In addition to causing severe intestinal and extraintestinal disease, E. coli is considered a priority pathogen due to high levels of observed drug resistance. The diversity in the E. coli population is driven by high genome plasticity and a very large gene pool. All these have made E. coli one of the most well-studied organisms, as well as a commonly used laboratory strain. Today, there are thousands of sequenced E. coli genomes stored in public databases. While data is widely available, accessing the information in order to perform analyses can still be a challenge. Collecting relevant available data requires accessing different sources, where data may be stored in a range of formats, and often requires further manipulation and processing to apply various analyses and extract useful information. In this study, we collated and intensely curated a collection of over 10 000 E. coli and Shigella genomes to provide a single, uniform, high-quality dataset. Shigella were included as they are considered specialized pathovars of E. coli. We provide these data in a number of easily accessible formats that can be used as the foundation for future studies addressing the biological differences between E. coli lineages and the distribution and flow of genes in the E. coli population at a high resolution. The analysis we present emphasizes our lack of understanding of the true diversity of the E. coli species, and the biased nature of our current understanding of the genetic diversity of such a key pathogen.
Collapse
Affiliation(s)
- Gal Horesh
- Parasites and Microbes, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1RQ, UK
| | - Grace A. Blackwell
- Parasites and Microbes, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1RQ, UK
- EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, UK
| | - Gerry Tonkin-Hill
- Parasites and Microbes, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1RQ, UK
| | - Jukka Corander
- Parasites and Microbes, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1RQ, UK
- Department of Biostatistics, University of Oslo, Oslo, Norway
- Department of Mathematics and Statistics, Helsinki Institute for Information Technology (HIIT), University of Helsinki, Helsinki, Finland
| | - Eva Heinz
- Department of Vector Biology and Clinical Sciences, Liverpool School of Tropical Medicine, Liverpool L3 5QA, UK
| | - Nicholas R. Thomson
- Parasites and Microbes, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1RQ, UK
- Department of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London WC1E 7HT, UK
| |
Collapse
|
55
|
Perrin A, Rocha EPC. PanACoTA: a modular tool for massive microbial comparative genomics. NAR Genom Bioinform 2021; 3:lqaa106. [PMID: 33575648 PMCID: PMC7803007 DOI: 10.1093/nargab/lqaa106] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Revised: 11/10/2020] [Accepted: 12/01/2020] [Indexed: 02/06/2023] Open
Abstract
The study of the gene repertoires of microbial species, their pangenomes, has become a key part of microbial evolution and functional genomics. Yet, the increasing number of genomes available complicates the establishment of the basic building blocks of comparative genomics. Here, we present PanACoTA (https://github.com/gem-pasteur/PanACoTA), a tool that allows to download all genomes of a species, build a database with those passing quality and redundancy controls, uniformly annotate and then build their pangenome, several variants of core genomes, their alignments and a rapid but accurate phylogenetic tree. While many programs building pangenomes have become available in the last few years, we have focused on a modular method, that tackles all the key steps of the process, from download to phylogenetic inference. While all steps are integrated, they can also be run separately and multiple times to allow rapid and extensive exploration of the parameters of interest. PanACoTA is built in Python3, includes a singularity container and features to facilitate its future development. We believe PanACoTa is an interesting addition to the current set of comparative genomics tools, since it will accelerate and standardize the more routine parts of the work, allowing microbial genomicists to more quickly tackle their specific questions.
Collapse
Affiliation(s)
- Amandine Perrin
- Microbial Evolutionary Genomics, CNRS, UMR3525, Institut Pasteur, 28, rue Dr Roux, Paris 75015, France
| | - Eduardo P C Rocha
- Microbial Evolutionary Genomics, CNRS, UMR3525, Institut Pasteur, 28, rue Dr Roux, Paris 75015, France
| |
Collapse
|
56
|
Luhmann N, Holley G, Achtman M. BlastFrost: fast querying of 100,000s of bacterial genomes in Bifrost graphs. Genome Biol 2021; 22:30. [PMID: 33430919 PMCID: PMC7798312 DOI: 10.1186/s13059-020-02237-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Accepted: 12/11/2020] [Indexed: 12/30/2022] Open
Abstract
BlastFrost is a highly efficient method for querying 100,000s of genome assemblies, building on Bifrost, a dynamic data structure for compacted and colored de Bruijn graphs. BlastFrost queries a Bifrost data structure for sequences of interest and extracts local subgraphs, enabling the identification of the presence or absence of individual genes or single nucleotide sequence variants. We show two examples using Salmonella genomes: finding within minutes the presence of genes in the SPI-2 pathogenicity island in a collection of 926 genomes and identifying single nucleotide polymorphisms associated with fluoroquinolone resistance in three genes among 190,209 genomes. BlastFrost is available at https://github.com/nluhmann/BlastFrost/tree/master/data .
Collapse
Affiliation(s)
- Nina Luhmann
- Warwick Medical School, University of Warwick, Coventry, UK.
| | - Guillaume Holley
- Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland
| | - Mark Achtman
- Warwick Medical School, University of Warwick, Coventry, UK
| |
Collapse
|
57
|
Marchet C, Boucher C, Puglisi SJ, Medvedev P, Salson M, Chikhi R. Data structures based on k-mers for querying large collections of sequencing data sets. Genome Res 2021; 31:1-12. [PMID: 33328168 PMCID: PMC7849385 DOI: 10.1101/gr.260604.119] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2019] [Accepted: 09/14/2020] [Indexed: 12/19/2022]
Abstract
High-throughput sequencing data sets are usually deposited in public repositories (e.g., the European Nucleotide Archive) to ensure reproducibility. As the amount of data has reached petabyte scale, repositories do not allow one to perform online sequence searches, yet, such a feature would be highly useful to investigators. Toward this goal, in the last few years several computational approaches have been introduced to index and query large collections of data sets. Here, we propose an accessible survey of these approaches, which are generally based on representing data sets as sets of k-mers. We review their properties, introduce a classification, and present their general intuition. We summarize their performance and highlight their current strengths and limitations.
Collapse
Affiliation(s)
- Camille Marchet
- Université de Lille, CNRS, CRIStAL UMR 9189, F-59000 Lille, France
| | - Christina Boucher
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, Florida 32611, USA
| | - Simon J Puglisi
- Department of Computer Science, University of Helsinki, FI-00014, Helsinki, Finland
| | - Paul Medvedev
- Department of Computer Science, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Center for Computational Biology and Bioinformatics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Mikaël Salson
- Université de Lille, CNRS, CRIStAL UMR 9189, F-59000 Lille, France
| | - Rayan Chikhi
- Institut Pasteur & CNRS, C3BI USR 3756, F-75015 Paris, France
| |
Collapse
|
58
|
Fowler PW. How quickly can we predict trimethoprim resistance using alchemical free energy methods? Interface Focus 2020; 10:20190141. [PMID: 33178416 PMCID: PMC7653339 DOI: 10.1098/rsfs.2019.0141] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/11/2020] [Indexed: 12/15/2022] Open
Abstract
The emergence of antimicrobial resistance threatens modern medicine and necessitates more personalized treatment of bacterial infections. Sequencing the whole genome of the pathogen(s) in a clinical sample offers one way to improve clinical microbiology diagnostic services, and has already been adopted for tuberculosis in some countries. A key weakness of a genetics clinical microbiology is it cannot return a result for rare or novel genetic variants and therefore predictive methods are required. Non-synonymous mutations in the S. aureus dfrB gene can be successfully classified as either conferring resistance (or not) by calculating their effect on the binding free energy of the antibiotic, trimethoprim. The underlying approach, alchemical free energy methods, requires large numbers of molecular dynamics simulations to be run. We show that a large number (N = 15) of binding free energies calculated from a series of very short (50 ps) molecular dynamics simulations are able to satisfactorily classify all seven mutations in our clinically derived testset. A result for a single mutation could therefore be returned in less than an hour, thereby demonstrating that this or similar methods are now sufficiently fast and reproducible for clinical use.
Collapse
Affiliation(s)
- Philip W. Fowler
- Nuffield Department of Medicine, John Radcliffe Hospital, University of Oxford, Oxford OX3 9DU, UK
- National Institute of Health Research Oxford Biomedical Research Centre, John Radcliffe Hospital, Headley Way, Oxford OX3 9DU, UK
| |
Collapse
|
59
|
Holley G, Melsted P. Bifrost: highly parallel construction and indexing of colored and compacted de Bruijn graphs. Genome Biol 2020; 21:249. [PMID: 32943081 PMCID: PMC7499882 DOI: 10.1186/s13059-020-02135-8] [Citation(s) in RCA: 95] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Accepted: 08/06/2020] [Indexed: 02/07/2023] Open
Abstract
Memory consumption of de Bruijn graphs is often prohibitive. Most de Bruijn graph-based assemblers reduce the complexity by compacting paths into single vertices, but this is challenging as it requires the uncompacted de Bruijn graph to be available in memory. We present a parallel and memory-efficient algorithm enabling the direct construction of the compacted de Bruijn graph without producing the intermediate uncompacted graph. Bifrost features a broad range of functions, such as indexing, editing, and querying the graph, and includes a graph coloring method that maps each k-mer of the graph to the genomes it occurs in.Availability https://github.com/pmelsted/bifrost.
Collapse
Affiliation(s)
- Guillaume Holley
- Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland.
| | - Páll Melsted
- Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland
| |
Collapse
|
60
|
Nimmo C, Millard J, van Dorp L, Brien K, Moodley S, Wolf A, Grant AD, Padayatchi N, Pym AS, Balloux F, O'Donnell M. Population-level emergence of bedaquiline and clofazimine resistance-associated variants among patients with drug-resistant tuberculosis in southern Africa: a phenotypic and phylogenetic analysis. THE LANCET. MICROBE 2020; 1:e165-e174. [PMID: 32803174 PMCID: PMC7416634 DOI: 10.1016/s2666-5247(20)30031-8] [Citation(s) in RCA: 71] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
BACKGROUND Bedaquiline and clofazimine are important drugs in the treatment of drug-resistant tuberculosis and are commonly used across southern Africa, although drug susceptibility testing is not routinely performed. In this study, we did a genotypic and phenotypic analysis of drug-resistant Mycobacterium tuberculosis isolates from cohort studies in hospitals in KwaZulu-Natal, South Africa, to identify resistance-associated variants (RAVs) and assess the extent of clofazimine and bedaquiline cross-resistance. We also used a comprehensive dataset of whole-genome sequences to investigate the phylogenetic and geographical distribution of bedaquiline and clofazimine RAVs in southern Africa. METHODS In this study, we included M tuberculosis isolates reported from the PRAXIS study of patients with drug-resistant tuberculosis treated with bedaquiline (King Dinuzulu Hospital, Durban) and three other cohort studies of drug-resistant tuberculosis in other KwaZulu-Natal hospitals, and sequential isolates from six persistently culture-positive patients with extensively drug-resistant tuberculosis at the KwaZulu-Natal provincial referral laboratory. Samples were collected between 2013 and 2019. Microbiological cultures were done as part of all parent studies. We sequenced whole genomes of included isolates and measured bedaquiline and clofazimine minimum inhibitory concentrations (MICs) for isolates identified as carrying any Rv0678 variant or previously published atpE, pepQ, and Rv1979c RAVs, which were the subject of the phenotypic study. We combined all whole-genome sequences of M tuberculosis obtained in this study with publicly available sequence data from other tuberculosis studies in southern Africa (defined as the countries of the Southern African Development Community), including isolates with Rv0678 variants identified by screening public genomic databases. We used this extended dataset to reconstruct phylogenetic relationships across lineage 2 and 4 M tuberculosis isolates. FINDINGS We sequenced the whole genome of 648 isolates from 385 patients with drug-resistant tuberculosis recruited into cohort studies in KwaZulu-Natal, and 28 isolates from six patients from the KwaZulu-Natal referral laboratory. We identified 30 isolates with Rv0678 RAVs from 16 (4%) of 391 patients. We did not identify any atpE, pepQ, or Rv1979c RAVs. MICs were measured for 21 isolates with Rv0678 RAVs. MICs were above the critical concentration for bedaquiline resistance in nine (43%) of 21 isolates, in the intermediate category in nine (43%) isolates, and within the wild-type range in three (14%) isolates. Clofazimine MICs in genetically wild-type isolates ranged from 0·12-0·5 μg/mL, and in isolates with RAVs from 0·25-4·0 μg/mL. Phylogenetic analysis of the extended dataset including M tuberculosis isolates from southern Africa resolved multiple emergences of Rv0678 variants in lineages 2 and 4, documented two likely nosocomial transmission events, and identified the spread of a possibly bedaquiline and clofazimine cross-resistant clone in eSwatini. We also identified four patients with pepQ frameshift mutations that may confer resistance. INTERPRETATION Bedaquiline and clofazimine cross-resistance in southern Africa is emerging repeatedly, with evidence of onward transmission largely due to Rv0678 mutations in M tuberculosis. Roll-out of bedaquiline and clofazimine treatment in the setting of limited drug susceptibility testing could allow further spread of resistance. Designing strong regimens would help reduce the emergence of resistance. Drug susceptibility testing is required to identify where resistance does emerge. FUNDING Wellcome Trust, National Institute of Allergy and Infectious Diseases and National Center for Advancing Translational Sciences of the National Institutes of Health.
Collapse
Affiliation(s)
- Camus Nimmo
- Division of Infection and Immunity, University College London, London, UK
- UCL Genetics Institute, University College London, London, UK
- Africa Health Research Institute, Durban, South Africa
| | - James Millard
- Africa Health Research Institute, Durban, South Africa
- Wellcome Trust Liverpool Glasgow Centre for Global Health Research, Liverpool, UK
- Institute of Infection and Global Health, University of Liverpool, Liverpool, UK
| | - Lucy van Dorp
- UCL Genetics Institute, University College London, London, UK
| | - Kayleen Brien
- Africa Health Research Institute, Durban, South Africa
| | | | - Allison Wolf
- Department of Medicine, Columbia University Medical Center, New York, NY, USA
| | - Alison D Grant
- Africa Health Research Institute, Durban, South Africa
- TB Centre, London School of Hygiene & Tropical Medicine, London, UK
- School of Laboratory Medicine & Medical Sciences, University of KwaZulu-Natal, Durban, South Africa
| | - Nesri Padayatchi
- CAPRISA-MRC HIV-TB Pathogenesis and Treatment Research Unit, Centre for the Aids Programme of Research in South Africa (CAPRISA), Durban, KwaZulu-Natal, South Africa
| | | | | | - Max O'Donnell
- Department of Medicine, Columbia University Medical Center, New York, NY, USA
- Department of Epidemiology, Columbia University Medical Center, New York, NY, USA
- CAPRISA-MRC HIV-TB Pathogenesis and Treatment Research Unit, Centre for the Aids Programme of Research in South Africa (CAPRISA), Durban, KwaZulu-Natal, South Africa
| |
Collapse
|
61
|
A unified catalog of 204,938 reference genomes from the human gut microbiome. Nat Biotechnol 2020; 39:105-114. [PMID: 32690973 PMCID: PMC7801254 DOI: 10.1038/s41587-020-0603-3] [Citation(s) in RCA: 687] [Impact Index Per Article: 137.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Accepted: 05/31/2020] [Indexed: 01/08/2023]
Abstract
Comprehensive, high-quality reference genomes are required for functional characterization and taxonomic assignment of the human gut microbiota. We present the Unified Human Gastrointestinal Genome (UHGG) collection, comprising 204,938 nonredundant genomes from 4,644 gut prokaryotes. These genomes encode >170 million protein sequences, which we collated in the Unified Human Gastrointestinal Protein (UHGP) catalog. The UHGP more than doubles the number of gut proteins in comparison to those present in the Integrated Gene Catalog. More than 70% of the UHGG species lack cultured representatives, and 40% of the UHGP lack functional annotations. Intraspecies genomic variation analyses revealed a large reservoir of accessory genes and single-nucleotide variants, many of which are specific to individual human populations. The UHGG and UHGP collections will enable studies linking genotypes to phenotypes in the human gut microbiome. More than 200,000 gut prokaryotic reference genomes and the proteins they encode are collated, providing comprehensive resources for microbiome researchers.
Collapse
|
62
|
Listeria monocytogenes is prevalent in retail produce environments but Salmonella enterica is rare. Food Control 2020. [DOI: 10.1016/j.foodcont.2020.107173] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
63
|
Marchet C, Iqbal Z, Gautheret D, Salson M, Chikhi R. REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets. Bioinformatics 2020; 36:i177-i185. [PMID: 32657392 PMCID: PMC7355249 DOI: 10.1093/bioinformatics/btaa487] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
MOTIVATION In this work we present REINDEER, a novel computational method that performs indexing of sequences and records their abundances across a collection of datasets. To the best of our knowledge, other indexing methods have so far been unable to record abundances efficiently across large datasets. RESULTS We used REINDEER to index the abundances of sequences within 2585 human RNA-seq experiments in 45 h using only 56 GB of RAM. This makes REINDEER the first method able to record abundances at the scale of ∼4 billion distinct k-mers across 2585 datasets. REINDEER also supports exact presence/absence queries of k-mers. Briefly, REINDEER constructs the compacted de Bruijn graph of each dataset, then conceptually merges those de Bruijn graphs into a single global one. Then, REINDEER constructs and indexes monotigs, which in a nutshell are groups of k-mers of similar abundances. AVAILABILITY AND IMPLEMENTATION https://github.com/kamimrcht/REINDEER. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Camille Marchet
- CNRS, UMR 9189 – CRIStAL, Université de Lille, F-59000 Lille, France
| | - Zamin Iqbal
- European Bioinformatics Institute, Cambridge CB10 1SD, UK
| | - Daniel Gautheret
- CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, Gif-sur-Yvette 91190, France
| | - Mikaël Salson
- CNRS, UMR 9189 – CRIStAL, Université de Lille, F-59000 Lille, France
| | - Rayan Chikhi
- Institut Pasteur, CNRS, C3BI – USR 3756, 75015 Paris, France
| |
Collapse
|
64
|
Elworth RAL, Wang Q, Kota PK, Barberan CJ, Coleman B, Balaji A, Gupta G, Baraniuk RG, Shrivastava A, Treangen T. To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics. Nucleic Acids Res 2020; 48:5217-5234. [PMID: 32338745 PMCID: PMC7261164 DOI: 10.1093/nar/gkaa265] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 03/20/2020] [Accepted: 04/04/2020] [Indexed: 02/01/2023] Open
Abstract
As computational biologists continue to be inundated by ever increasing amounts of metagenomic data, the need for data analysis approaches that keep up with the pace of sequence archives has remained a challenge. In recent years, the accelerated pace of genomic data availability has been accompanied by the application of a wide array of highly efficient approaches from other fields to the field of metagenomics. For instance, sketching algorithms such as MinHash have seen a rapid and widespread adoption. These techniques handle increasingly large datasets with minimal sacrifices in quality for tasks such as sequence similarity calculations. Here, we briefly review the fundamentals of the most impactful probabilistic and signal processing algorithms. We also highlight more recent advances to augment previous reviews in these areas that have taken a broader approach. We then explore the application of these techniques to metagenomics, discuss their pros and cons, and speculate on their future directions.
Collapse
Affiliation(s)
| | - Qi Wang
- Systems, Synthetic, and Physical Biology (SSPB) Graduate Program, Houston, TX 77005, USA
| | - Pavan K Kota
- Department of Bioengineering, Houston, TX 77005, USA
| | - C J Barberan
- Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, USA
| | - Benjamin Coleman
- Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, USA
| | - Advait Balaji
- Department of Computer Science, Houston, TX 77005, USA
| | - Gaurav Gupta
- Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, USA
| | - Richard G Baraniuk
- Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, USA
| | - Anshumali Shrivastava
- Department of Computer Science, Houston, TX 77005, USA
- Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, USA
| | - Todd J Treangen
- Department of Computer Science, Houston, TX 77005, USA
- Systems, Synthetic, and Physical Biology (SSPB) Graduate Program, Houston, TX 77005, USA
| |
Collapse
|
65
|
Miller EA, Elnekave E, Flores-Figueroa C, Johnson A, Kearney A, Munoz-Aguayo J, Tagg KA, Tschetter L, Weber BP, Nadon CA, Boxrud D, Singer RS, Folster JP, Johnson TJ. Emergence of a Novel Salmonella enterica Serotype Reading Clonal Group Is Linked to Its Expansion in Commercial Turkey Production, Resulting in Unanticipated Human Illness in North America. mSphere 2020; 5:e00056-20. [PMID: 32295868 PMCID: PMC7160679 DOI: 10.1128/msphere.00056-20] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Accepted: 03/26/2020] [Indexed: 01/09/2023] Open
Abstract
Two separate human outbreaks of Salmonella enterica serotype Reading occurred between 2017 and 2019 in the United States and Canada, and both outbreaks were linked to the consumption of raw turkey products. In this study, a comprehensive genomic investigation was conducted to reconstruct the evolutionary history of S. Reading from turkeys and to determine the genomic context of outbreaks involving this infrequently isolated Salmonella serotype. A total of 988 isolates of U.S. origin were examined using whole-genome-based approaches, including current and historical isolates from humans, meat, and live food animals. Broadly, isolates clustered into three major clades, with one apparently highly adapted turkey clade. Within the turkey clade, isolates clustered into three subclades, including an "emergent" clade that contained only isolates dated 2016 or later, with many of the isolates from these outbreaks. Genomic differences were identified between emergent and other turkey subclades, suggesting that the apparent success of currently circulating subclades is, in part, attributable to plasmid acquisitions conferring antimicrobial resistance, gain of phage-like sequences with cargo virulence factors, and mutations in systems that may be involved in beta-glucuronidase activity and resistance towards colicins. U.S. and Canadian outbreak isolates were found interspersed throughout the emergent subclade and the other circulating subclade. The emergence of a novel S Reading turkey subclade, coinciding temporally with expansion in commercial turkey production and with U.S. and Canadian human outbreaks, indicates that emergent strains with higher potential for niche success were likely vertically transferred and rapidly disseminated from a common source.IMPORTANCE Increasingly, outbreak investigations involving foodborne pathogens are difficult due to the interconnectedness of food animal production and distribution, and homogeneous nature of industry integration, necessitating high-resolution genomic investigations to determine their basis. Fortunately, surveillance and whole-genome sequencing, combined with the public availability of these data, enable comprehensive queries to determine underlying causes of such outbreaks. Utilizing this pipeline, it was determined that a novel clone of Salmonella Reading has emerged that coincided with increased abundance in raw turkey products and two outbreaks of human illness in North America. The rapid dissemination of this highly adapted and conserved clone indicates that it was likely obtained from a common source and rapidly disseminated across turkey production. Key genomic changes may have contributed to its apparent continued success in commercial turkeys and ability to cause illness in humans.
Collapse
Affiliation(s)
- Elizabeth A Miller
- Department of Veterinary and Biomedical Sciences, College of Veterinary Medicine, University of Minnesota, Saint Paul, Minnesota, USA
| | - Ehud Elnekave
- Department of Veterinary Population Medicine, College of Veterinary Medicine, University of Minnesota, Saint Paul, Minnesota, USA
| | | | - Abigail Johnson
- Department of Veterinary and Biomedical Sciences, College of Veterinary Medicine, University of Minnesota, Saint Paul, Minnesota, USA
| | - Ashley Kearney
- Public Health Agency of Canada, National Microbiology Laboratory, Winnipeg, Canada
| | - Jeannette Munoz-Aguayo
- Mid-Central Research and Outreach Center, University of Minnesota, Willmar, Minnesota, USA
| | | | - Lorelee Tschetter
- Public Health Agency of Canada, National Microbiology Laboratory, Winnipeg, Canada
| | - Bonnie P Weber
- Department of Veterinary and Biomedical Sciences, College of Veterinary Medicine, University of Minnesota, Saint Paul, Minnesota, USA
| | - Celine A Nadon
- Public Health Agency of Canada, National Microbiology Laboratory, Winnipeg, Canada
| | - Dave Boxrud
- Minnesota Department of Health, Saint Paul, Minnesota, USA
| | - Randall S Singer
- Department of Veterinary and Biomedical Sciences, College of Veterinary Medicine, University of Minnesota, Saint Paul, Minnesota, USA
| | - Jason P Folster
- Division of Foodborne, Waterborne, and Environmental Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Timothy J Johnson
- Department of Veterinary and Biomedical Sciences, College of Veterinary Medicine, University of Minnesota, Saint Paul, Minnesota, USA
- Mid-Central Research and Outreach Center, University of Minnesota, Willmar, Minnesota, USA
| |
Collapse
|
66
|
Cummins ML, Hamidian M, Djordjevic SP. Salmonella Genomic Island 1 is Broadly Disseminated within Gammaproteobacteriaceae. Microorganisms 2020; 8:microorganisms8020161. [PMID: 31979280 PMCID: PMC7074787 DOI: 10.3390/microorganisms8020161] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2020] [Revised: 01/20/2020] [Accepted: 01/20/2020] [Indexed: 12/28/2022] Open
Abstract
Salmonella genomic island 1 (SGI1) is an integrative mobilisable element that plays an important role in the capture and spread of multiple drug resistance. To date, SGI1 has been found in clinical isolates of Salmonella enterica serovars, Proteus mirabilis, Morganella morganii, Acinetobacter baumannii, Providencia stuartii, Enterobacter spp, and recently in Escherichia coli. SGI1 preferentially targets the 3´-end of trmE, a conserved gene found in the Enterobacteriaceae and among members of the Gammaproteobacteria. It is, therefore, hypothesised that SGI1 and SGI1-related elements (SGI1-REs) may have been acquired by diverse bacterial genera. Here, Bitsliced Genomic Signature Indexes (BIGSI) was used to screen the NCBI Sequence Read Archive (SRA) for putative SGI1-REs in Gammaproteobacteria. Novel SGI-REs were identified in diverse genera including Cronobacter spp, Klebsiella spp, and Vibrio spp and in two additional isolates of Escherichia coli. An extensively drug-resistant human clonal lineage of Klebsiella pneumoniae carrying an SGI1-RE in the United Kingdom and an SGI1-RE that lacks a class 1 integron were also identified. These findings provide insight into the origins of this diverse family of clinically important genomic islands and expand the knowledge of the potential host range of SGI1-REs within the Gammaproteobacteria.
Collapse
Affiliation(s)
- Max Laurence Cummins
- The ithree institute, University of Technology Sydney, Ultimo, NSW 2007, Australia; (M.L.C.); (M.H.)
- Australian Centre for Genomic Epidemiological Microbiology, University of Technology Sydney, PO Box 123, Broadway, NSW 2007, Australia
| | - Mohammad Hamidian
- The ithree institute, University of Technology Sydney, Ultimo, NSW 2007, Australia; (M.L.C.); (M.H.)
| | - Steven Philip Djordjevic
- The ithree institute, University of Technology Sydney, Ultimo, NSW 2007, Australia; (M.L.C.); (M.H.)
- Australian Centre for Genomic Epidemiological Microbiology, University of Technology Sydney, PO Box 123, Broadway, NSW 2007, Australia
- Correspondence:
| |
Collapse
|
67
|
Giulieri SG, Tong SYC, Williamson DA. Using genomics to understand meticillin- and vancomycin-resistant Staphylococcus aureus infections. Microb Genom 2020; 6:e000324. [PMID: 31913111 PMCID: PMC7067033 DOI: 10.1099/mgen.0.000324] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2019] [Accepted: 12/12/2019] [Indexed: 12/15/2022] Open
Abstract
Resistance to meticillin and vancomycin in Staphylococcus aureus significantly complicates the management of severe infections like bacteraemia, endocarditis or osteomyelitis. Here, we review the molecular mechanisms and genomic epidemiology of resistance to these agents, with a focus on how genomics has provided insights into the emergence and evolution of major meticillin-resistant S. aureus clones. We also provide insights on the use of bacterial whole-genome sequencing to inform management of S. aureus infections and for control of transmission at the hospital and in the community.
Collapse
Affiliation(s)
- Stefano G. Giulieri
- Department of Microbiology and Immunology, University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
- Infectious Disease Department, Austin Health, Melbourne, Australia
| | - Steven Y. C. Tong
- Victorian Infectious Disease Service, Royal Melbourne Hospital, and Doherty Department University of Melbourne, at the Peter Doherty Institute for Infection and Immunity, Victoria, Australia
- Menzies School of Health Research, Darwin, Australia
| | - Deborah A. Williamson
- Department of Microbiology and Immunology, University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
- Microbiological Diagnostic Unit Public Health Laboratory, University of Melbourne at the Peter Doherty Institute of Infection and Immunity, Melbourne, Australia
- Microbiology, Royal Melbourne Hospital, Melbourne, Australia
| |
Collapse
|
68
|
PRIMEval: Optimization and screening of multiplex oligonucleotide assays. Sci Rep 2019; 9:19286. [PMID: 31848453 PMCID: PMC6917790 DOI: 10.1038/s41598-019-55883-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Accepted: 12/03/2019] [Indexed: 11/08/2022] Open
Abstract
The development of multiplex polymerase chain reaction and microarray assays is challenging due to primer dimer formation, unspecific hybridization events, the generation of unspecific by-products, primer depletion, and thus lower amplification efficiencies. We have developed a software workflow with three underlying algorithms that differ in their use case and specificity, allowing the complete in silico evaluation of such assays on user-derived data sets. We experimentally evaluated the method for the prediction of oligonucleotide hybridization events including resulting products and probes, self-dimers, cross-dimers and hairpins at different experimental conditions. The developed method allows explaining the observed artefacts through in silico WGS data and thermodynamic predictions. PRIMEval is available publicly at https://primeval.ait.ac.at.
Collapse
|
69
|
Ondov BD, Starrett GJ, Sappington A, Kostic A, Koren S, Buck CB, Phillippy AM. Mash Screen: high-throughput sequence containment estimation for genome discovery. Genome Biol 2019; 20:232. [PMID: 31690338 PMCID: PMC6833257 DOI: 10.1186/s13059-019-1841-x] [Citation(s) in RCA: 159] [Impact Index Per Article: 26.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2019] [Accepted: 09/27/2019] [Indexed: 11/17/2022] Open
Abstract
The MinHash algorithm has proven effective for rapidly estimating the resemblance of two genomes or metagenomes. However, this method cannot reliably estimate the containment of a genome within a metagenome. Here, we describe an online algorithm capable of measuring the containment of genomes and proteomes within either assembled or unassembled sequencing read sets. We describe several use cases, including contamination screening and retrospective analysis of metagenomes for novel genome discovery. Using this tool, we provide containment estimates for every NCBI RefSeq genome within every SRA metagenome and demonstrate the identification of a novel polyomavirus species from a public metagenome.
Collapse
Affiliation(s)
- Brian D. Ondov
- Genome Informatics section, National Human Genome Research Institute, Bethesda, MD USA
- Department of Computer Science, University of Maryland College Park, College Park, MD USA
| | - Gabriel J. Starrett
- Tumor Virus Molecular Biology section, National Cancer Institute, Bethesda, MD USA
| | - Anna Sappington
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA USA
| | - Aleksandra Kostic
- Department of Computer Science, Princeton University, Princeton, NJ USA
| | - Sergey Koren
- Genome Informatics section, National Human Genome Research Institute, Bethesda, MD USA
| | - Christopher B. Buck
- Tumor Virus Molecular Biology section, National Cancer Institute, Bethesda, MD USA
| | - Adam M. Phillippy
- Genome Informatics section, National Human Genome Research Institute, Bethesda, MD USA
| |
Collapse
|
70
|
Rowe WPM. When the levee breaks: a practical guide to sketching algorithms for processing the flood of genomic data. Genome Biol 2019; 20:199. [PMID: 31519212 PMCID: PMC6744645 DOI: 10.1186/s13059-019-1809-x] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Accepted: 09/02/2019] [Indexed: 01/21/2023] Open
Abstract
Considerable advances in genomics over the past decade have resulted in vast amounts of data being generated and deposited in global archives. The growth of these archives exceeds our ability to process their content, leading to significant analysis bottlenecks. Sketching algorithms produce small, approximate summaries of data and have shown great utility in tackling this flood of genomic data, while using minimal compute resources. This article reviews the current state of the field, focusing on how the algorithms work and how genomicists can utilize them effectively. References to interactive workbooks for explaining concepts and demonstrating workflows are included at https://github.com/will-rowe/genome-sketching .
Collapse
Affiliation(s)
- Will P M Rowe
- Institute of Microbiology and Infection, School of Biosciences, University of Birmingham, Birmingham, B15 2TT, UK.
- Scientific Computing Department, The Hartree Centre, STFC Daresbury Laboratory, Warrington, WA4 4AD, UK.
| |
Collapse
|
71
|
Beale MA, Marks M, Sahi SK, Tantalo LC, Nori AV, French P, Lukehart SA, Marra CM, Thomson NR. Genomic epidemiology of syphilis reveals independent emergence of macrolide resistance across multiple circulating lineages. Nat Commun 2019; 10:3255. [PMID: 31332179 PMCID: PMC6646400 DOI: 10.1038/s41467-019-11216-7] [Citation(s) in RCA: 54] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2018] [Accepted: 07/01/2019] [Indexed: 11/09/2022] Open
Abstract
Syphilis is a sexually transmitted infection caused by Treponema pallidum subspecies pallidum and may lead to severe complications. Recent years have seen striking increases in syphilis in many countries. Previous analyses have suggested one lineage of syphilis, SS14, may have expanded recently, indicating emergence of a single pandemic azithromycin-resistant cluster. Here we use direct sequencing of T. pallidum combined with phylogenomic analyses to show that both SS14- and Nichols-lineages are simultaneously circulating in clinically relevant populations in multiple countries. We correlate the appearance of genotypic macrolide resistance with multiple independently evolved SS14 sub-lineages and show that genotypically resistant and sensitive sub-lineages are spreading contemporaneously. These findings inform our understanding of the current syphilis epidemic by demonstrating how macrolide resistance evolves in Treponema subspecies and provide a warning on broader issues of antimicrobial resistance.
Collapse
Affiliation(s)
- Mathew A Beale
- Parasites and Microbes, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, UK.
| | - Michael Marks
- Clinical Research Department, Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, London, UK
- Hospital for Tropical Diseases, London, UK
| | - Sharon K Sahi
- Department of Neurology, University of Washington, Seattle, WA, 98195, USA
| | - Lauren C Tantalo
- Department of Neurology, University of Washington, Seattle, WA, 98195, USA
| | | | - Patrick French
- The Mortimer Market Centre CNWL, Camden Provider Services, London, UK
| | - Sheila A Lukehart
- Departments of Medicine and Global Health, University of Washington, Seattle, WA, 98195, USA
| | - Christina M Marra
- Department of Neurology, University of Washington, Seattle, WA, 98195, USA
| | - Nicholas R Thomson
- Parasites and Microbes, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, UK.
- Department of Pathogen Molecular Biology, Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, London, UK.
| |
Collapse
|
72
|
A Novel, Widespread qacA Allele Results in Reduced Chlorhexidine Susceptibility in Staphylococcus epidermidis. Antimicrob Agents Chemother 2019; 63:AAC.02607-18. [PMID: 30988144 DOI: 10.1128/aac.02607-18] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2018] [Accepted: 03/13/2019] [Indexed: 12/12/2022] Open
Abstract
Chlorhexidine gluconate (CHG) is a topical antiseptic widely used in health care settings. In Staphylococcus spp., the pump QacA effluxes CHG, while the closely related QacB cannot due to a single amino acid substitution. We characterized 1,050 cutaneous Staphylococcus isolates obtained from 173 pediatric oncology patients enrolled in a multicenter CHG bathing trial. CHG susceptibility testing revealed that 63 (6%) of these isolates had elevated CHG MICs (≥4 μg/ml). Screening of all 1,050 isolates for the qacA/B gene (the same qac gene with A or B allele) by restriction fragment length polymorphism (RFLP) yielded 56 isolates with a novel qacA/B RFLP pattern, qacA/B273 The CHG MIC was significantly higher for qacA/B273 -positive isolates (MIC50, 4 μg/ml; MIC range, 0.5 to 4 μg/ml) than for other qac groups: qacA-positive isolates (n = 559; MIC50, 1 μg/ml; MIC range, 0.5 to 4 μg/ml), qacB-positive isolates (n = 17; MIC50, 1 μg/ml; MIC range, 0.25 to 2 μg/ml), and qacA/B-negative isolates (n = 418, MIC50, 1 μg/ml; MIC range, 0.125 to 2 μg/ml) (P = 0.001). A high proportion of the qacA/B273 -positive isolates also displayed methicillin resistance (96.4%) compared to the other qac groups (24.9 to 61.7%) (P = 0.001). Whole-genome sequencing revealed that qacA/B273 -positive isolates encoded a variant of QacA with 2 amino acid substitutions. This new allele, named qacA4, was carried on the novel plasmid pAQZ1. The qacA4-carrying isolates belonged to the highly resistant Staphylococcus epidermidis sequence type 2 clone. By searching available sequence data sets, we identified 39 additional qacA4-carrying S. epidermidis strains from 5 countries. Curing an isolate of qacA4 resulted in a 4-fold decrease in the CHG MIC, confirming the role of qacA4 in the elevated CHG MIC. Our results highlight the importance of further studying qacA4 and its functional role in clinical staphylococci.
Collapse
|
73
|
Branchu P, Charity OJ, Bawn M, Thilliez G, Dallman TJ, Petrovska L, Kingsley RA. SGI-4 in Monophasic Salmonella Typhimurium ST34 Is a Novel ICE That Enhances Resistance to Copper. Front Microbiol 2019; 10:1118. [PMID: 31178839 PMCID: PMC6543542 DOI: 10.3389/fmicb.2019.01118] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2019] [Accepted: 05/03/2019] [Indexed: 12/23/2022] Open
Abstract
A multi drug resistant Salmonella enterica 4,[5],12:i- of sequence type 34 (monophasic S. Typhimurium ST34) is a current pandemic clone associated with livestock, particularly pigs, and numerous outbreaks in the human population. A large genomic island, termed SGI-4, is present in the monophasic Typhimurium ST34 clade and absent from other S. Typhimurium strains. SGI-4 consists of 87 open reading frames including sil and pco genes previously implicated in resistance to copper (Cu) and silver, and multiple genes predicted to be involved in mobilization and transfer by conjugation. SGI-4 was excised from the chromosome, circularized, and transferred to recipient strains of S. Typhimurium at a frequency influenced by stress induced by mitomycin C, and oxygen tension. The presence of SGI-4 was associated with increased resistance to Cu, particularly but not exclusively under anaerobic conditions. The presence of silCBA genes, predicted to encode an RND family efflux pump that transports Cu from the periplasm to the external milieu, was sufficient to impart the observed enhanced resistance to Cu, above that commonly associated with S. Typhimurium isolates. The presence of these genes resulted in the absence of Cu-dependent induction of pco genes encoding multiple proteins linked to Cu resistance, also present on SGI-4, suggesting that the system effectively limits the Cu availability in the periplasm, but did not affect SodCI-dependent macrophage survival.
Collapse
Affiliation(s)
| | | | - Matt Bawn
- Quadram Institute Bioscience, Norwich, United Kingdom
| | | | - Timothy J. Dallman
- Gastrointestinal Bacteria Reference Unit, National Infection Service, Public Health England, London, United Kingdom
| | | | - Robert A. Kingsley
- Quadram Institute Bioscience, Norwich, United Kingdom
- School of Biological Sciences, University of East Anglia, Norwich, United Kingdom
| |
Collapse
|
74
|
Shapshak P, Balaji S, Kangueane P, Chiappelli F, Somboonwit C, Menezes LJ, Sinnott JT. Innovative Technologies for Advancement of WHO Risk Group 4 Pathogens Research. GLOBAL VIROLOGY III: VIROLOGY IN THE 21ST CENTURY 2019. [PMCID: PMC7122670 DOI: 10.1007/978-3-030-29022-1_15] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Affiliation(s)
- Paul Shapshak
- Department of Internal Medicine, University of South Florida, Tampa, FL USA
| | - Seetharaman Balaji
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka India
| | | | - Francesco Chiappelli
- Oral Biology and Medicine, CHS 63-090, UCLA School of Dentistry Oral Biology and Medicine, CHS 63-090, Los Angeles, CA USA
| | | | - Lynette J. Menezes
- Department of Internal Medicine, University of South Florida, Tampa, FL USA
| | - John T. Sinnott
- Department of Internal Medicine, University of South Florida, Tampa, FL USA
| |
Collapse
|