1
|
Munk P, Yang D, Röder T, Maier L, Petersen TN, Duarte ASR, Clausen PTLC, Brinch C, Van Gompel L, Luiken R, Wagenaar JA, Schmitt H, Heederik DJJ, Mevius DJ, Smit LAM, Bossers A, Aarestrup FM. The European livestock resistome. mSystems 2024; 9:e0132823. [PMID: 38501800 PMCID: PMC11019871 DOI: 10.1128/msystems.01328-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 02/21/2024] [Indexed: 03/20/2024] Open
Abstract
Metagenomic sequencing has proven to be a powerful tool in the monitoring of antimicrobial resistance (AMR). Here, we provide a comparative analysis of the resistome from pigs, poultry, veal calves, turkey, and rainbow trout, for a total of 538 herds across nine European countries. We calculated the effects of per-farm management practices and antimicrobial usage (AMU) on the resistome in pigs, broilers, and veal calves. We also provide an in-depth study of the associations between bacterial diversity, resistome diversity, and AMR abundances as well as co-occurrence analysis of bacterial taxa and antimicrobial resistance genes (ARGs) and the universality of the latter. The resistomes of veal calves and pigs clustered together, as did those of avian origin, while the rainbow trout resistome was different. Moreover, we identified clear core resistomes for each specific food-producing animal species. We identified positive associations between bacterial alpha diversity and both resistome alpha diversity and abundance. Network analyses revealed very few taxa-ARG associations in pigs but a large number for the avian species. Using updated reference databases and optimized bioinformatics, previously reported significant associations between AMU, biosecurity, and AMR in pig and poultry farms were validated. AMU is an important driver for AMR; however, our integrated analyses suggest that factors contributing to increased bacterial diversity might also be associated with higher AMR load. We also found that dispersal limitations of ARGs are shaping livestock resistomes, and future efforts to fight AMR should continue to emphasize biosecurity measures.IMPORTANCEUnderstanding the occurrence, diversity, and drivers for antimicrobial resistance (AMR) is important to focus future control efforts. So far, almost all attempts to limit AMR in livestock have addressed antimicrobial consumption. We here performed an integrated analysis of the resistomes of five important farmed animal populations across Europe finding that the resistome and AMR levels are also shaped by factors related to bacterial diversity, as well as dispersal limitations. Thus, future studies and interventions aimed at reducing AMR should not only address antimicrobial usage but also consider other epidemiological and ecological factors.
Collapse
Affiliation(s)
- Patrick Munk
- National Food Institute, Technical University of Denmark, Lyngby, Denmark
| | - Dongsheng Yang
- Institute for Risk Assessment Sciences, Faculty of Veterinary Medicine, Utrecht University, The Netherlands, Utrecht
| | - Timo Röder
- National Food Institute, Technical University of Denmark, Lyngby, Denmark
| | - Leonie Maier
- School of Biological Sciences, University of Edinburgh, Max Born Crescent, Edinburgh, United Kingdom
| | | | | | | | - Christian Brinch
- National Food Institute, Technical University of Denmark, Lyngby, Denmark
| | - Liese Van Gompel
- Institute for Risk Assessment Sciences, Faculty of Veterinary Medicine, Utrecht University, The Netherlands, Utrecht
| | - Roosmarijn Luiken
- Institute for Risk Assessment Sciences, Faculty of Veterinary Medicine, Utrecht University, The Netherlands, Utrecht
| | - Jaap A. Wagenaar
- Department of Infectious Diseases and Immunology, Faculty of Veterinary Medicine, Utrecht University, The Netherlands, Utrecht
| | - Heike Schmitt
- Institute for Risk Assessment Sciences, Faculty of Veterinary Medicine, Utrecht University, The Netherlands, Utrecht
| | - Dick J. J. Heederik
- Institute for Risk Assessment Sciences, Faculty of Veterinary Medicine, Utrecht University, The Netherlands, Utrecht
| | - Dik J. Mevius
- Department of Infectious Diseases and Immunology, Faculty of Veterinary Medicine, Utrecht University, The Netherlands, Utrecht
- Wageningen Bioveterinary Research, Wageningen University & Research, Lelystad, The Netherlands
| | - Lidwien A. M. Smit
- Institute for Risk Assessment Sciences, Faculty of Veterinary Medicine, Utrecht University, The Netherlands, Utrecht
| | - EFFORT ConsortiumGravelandHaitskeGonzalez-ZornBrunoMoyanoGabrielSandersPascalChauvinClaireBattistiAntonioDewulfJeroenWadepohlKatharinaWasylDariuszSkarzyńskaMagdalenaZajacMagdalenaPękala-SafińskaAgnieszkaDaskalovHristoStärkKatharina D. C.
- National Food Institute, Technical University of Denmark, Lyngby, Denmark
- Institute for Risk Assessment Sciences, Faculty of Veterinary Medicine, Utrecht University, The Netherlands, Utrecht
- School of Biological Sciences, University of Edinburgh, Max Born Crescent, Edinburgh, United Kingdom
- Department of Infectious Diseases and Immunology, Faculty of Veterinary Medicine, Utrecht University, The Netherlands, Utrecht
- Wageningen Bioveterinary Research, Wageningen University & Research, Lelystad, The Netherlands
| | - Alex Bossers
- Institute for Risk Assessment Sciences, Faculty of Veterinary Medicine, Utrecht University, The Netherlands, Utrecht
- Wageningen Bioveterinary Research, Wageningen University & Research, Lelystad, The Netherlands
| | - Frank M. Aarestrup
- National Food Institute, Technical University of Denmark, Lyngby, Denmark
| |
Collapse
|
2
|
Martiny HM, Pyrounakis N, Petersen TN, Lukjančenko O, Aarestrup FM, Clausen PTLC, Munk P. ARGprofiler-a pipeline for large-scale analysis of antimicrobial resistance genes and their flanking regions in metagenomic datasets. Bioinformatics 2024; 40:btae086. [PMID: 38377397 PMCID: PMC10918635 DOI: 10.1093/bioinformatics/btae086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 12/11/2023] [Accepted: 02/19/2024] [Indexed: 02/22/2024] Open
Abstract
MOTIVATION Analyzing metagenomic data can be highly valuable for understanding the function and distribution of antimicrobial resistance genes (ARGs). However, there is a need for standardized and reproducible workflows to ensure the comparability of studies, as the current options involve various tools and reference databases, each designed with a specific purpose in mind. RESULTS In this work, we have created the workflow ARGprofiler to process large amounts of raw sequencing reads for studying the composition, distribution, and function of ARGs. ARGprofiler tackles the challenge of deciding which reference database to use by providing the PanRes database of 14 078 unique ARGs that combines several existing collections into one. Our pipeline is designed to not only produce abundance tables of genes and microbes but also to reconstruct the flanking regions of ARGs with ARGextender. ARGextender is a bioinformatic approach combining KMA and SPAdes to recruit reads for a targeted de novo assembly. While our aim is on ARGs, the pipeline also creates Mash sketches for fast searching and comparisons of sequencing runs. AVAILABILITY AND IMPLEMENTATION The ARGprofiler pipeline is a Snakemake workflow that supports the reuse of metagenomic sequencing data and is easily installable and maintained at https://github.com/genomicepidemiology/ARGprofiler.
Collapse
Affiliation(s)
- Hannah-Marie Martiny
- Research Group for Genomic Epidemiology, Technical University of Denmark, Henrik Danms Allé, Bygning 204, Kongens Lyngby 2800, Denmark
| | - Nikiforos Pyrounakis
- Research Group for Genomic Epidemiology, Technical University of Denmark, Henrik Danms Allé, Bygning 204, Kongens Lyngby 2800, Denmark
| | - Thomas N Petersen
- Research Group for Genomic Epidemiology, Technical University of Denmark, Henrik Danms Allé, Bygning 204, Kongens Lyngby 2800, Denmark
| | - Oksana Lukjančenko
- Research Group for Genomic Epidemiology, Technical University of Denmark, Henrik Danms Allé, Bygning 204, Kongens Lyngby 2800, Denmark
| | - Frank M Aarestrup
- Research Group for Genomic Epidemiology, Technical University of Denmark, Henrik Danms Allé, Bygning 204, Kongens Lyngby 2800, Denmark
| | - Philip T L C Clausen
- Research Group for Genomic Epidemiology, Technical University of Denmark, Henrik Danms Allé, Bygning 204, Kongens Lyngby 2800, Denmark
| | - Patrick Munk
- Research Group for Genomic Epidemiology, Technical University of Denmark, Henrik Danms Allé, Bygning 204, Kongens Lyngby 2800, Denmark
| |
Collapse
|
3
|
Aytan-Aktug D, Grigorjev V, Szarvas J, Clausen PTLC, Munk P, Nguyen M, Davis JJ, Aarestrup FM, Lund O. SourceFinder: a Machine-Learning-Based Tool for Identification of Chromosomal, Plasmid, and Bacteriophage Sequences from Assemblies. Microbiol Spectr 2022; 10:e0264122. [PMID: 36377945 PMCID: PMC9769690 DOI: 10.1128/spectrum.02641-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Accepted: 11/01/2022] [Indexed: 11/16/2022] Open
Abstract
High-throughput genome sequencing technologies enable the investigation of complex genetic interactions, including the horizontal gene transfer of plasmids and bacteriophages. However, identifying these elements from assembled reads remains challenging due to genome sequence plasticity and the difficulty in assembling complete sequences. In this study, we developed a classifier, using random forest, to identify whether sequences originated from bacterial chromosomes, plasmids, or bacteriophages. The classifier was trained on a diverse collection of 23,211 chromosomal, plasmid, and bacteriophage sequences from hundreds of bacterial species. In order to adapt the classifier to incomplete sequences, each complete sequence was subsampled into 5,000 nucleotide fragments and further subdivided into k-mers. This three-class classifier succeeded in identifying chromosomes, plasmids, and bacteriophages using k-mer distributions of complete and partial genome sequences, including simulated metagenomic scaffolds with minimum performance of 0.939 area under the receiver operating characteristic curve (AUC). This classifier, implemented as SourceFinder, has been made available as an online web service to help the community with predicting the chromosomal, plasmid, and bacteriophage sources of assembled bacterial sequence data (https://cge.food.dtu.dk/services/SourceFinder/). IMPORTANCE Extra-chromosomal genes encoding antimicrobial resistance, metal resistance, and virulence provide selective advantages for bacterial survival under stress conditions and pose serious threats to human and animal health. These accessory genes can impact the composition of microbiomes by providing selective advantages to their hosts. Accurately identifying extra-chromosomal elements in genome sequence data are critical for understanding gene dissemination trajectories and taking preventative measures. Therefore, in this study, we developed a random forest classifier for identifying the source of bacterial chromosomal, plasmid, and bacteriophage sequences.
Collapse
Affiliation(s)
- Derya Aytan-Aktug
- National Food Institute, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Vladislav Grigorjev
- National Food Institute, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Judit Szarvas
- National Food Institute, Technical University of Denmark, Kongens Lyngby, Denmark
| | | | - Patrick Munk
- National Food Institute, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Marcus Nguyen
- Consortium for Advanced Science and Engineering, University of Chicago, Chicago, Illinois, USA
- Data Science and Learning Division, Argonne National Laboratory, Argonne, Illinois, USA
- Northwestern Argonne Institute for Science and Engineering, Evanston, Illinois, USA
| | - James J. Davis
- Consortium for Advanced Science and Engineering, University of Chicago, Chicago, Illinois, USA
- Data Science and Learning Division, Argonne National Laboratory, Argonne, Illinois, USA
- Northwestern Argonne Institute for Science and Engineering, Evanston, Illinois, USA
| | - Frank M. Aarestrup
- National Food Institute, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Ole Lund
- National Food Institute, Technical University of Denmark, Kongens Lyngby, Denmark
| |
Collapse
|
4
|
Clausen PTLC. Scaling neighbor joining to one million taxa with dynamic and heuristic neighbor joining. Bioinformatics 2022; 39:6858462. [PMID: 36453849 PMCID: PMC9805563 DOI: 10.1093/bioinformatics/btac774] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 11/23/2022] [Accepted: 11/30/2022] [Indexed: 12/03/2022] Open
Abstract
MOTIVATION The neighbor-joining (NJ) algorithm is a widely used method to perform iterative clustering and forms the basis for phylogenetic reconstruction in several bioinformatic pipelines. Although NJ is considered to be a computationally efficient algorithm, it does not scale well for datasets exceeding several thousand taxa (>100 000). Optimizations to the canonical NJ algorithm have been proposed; these optimizations are, however, achieved through approximations or extensive memory usage, which is not feasible for large datasets. RESULTS In this article, two new algorithms, dynamic neighbor joining (DNJ) and heuristic neighbor joining (HNJ), are presented, which optimize the canonical NJ method to scale to millions of taxa without increasing the memory requirements. Both DNJ and HNJ outperform the current gold standard methods to construct NJ trees, while DNJ is guaranteed to produce exact NJ trees. AVAILABILITY AND IMPLEMENTATION https://bitbucket.org/genomicepidemiology/ccphylo.git. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
5
|
Janes VA, Matamoros S, Munk P, Clausen PTLC, Koekkoek SM, Koster LAM, Jakobs ME, de Wever B, Visser CE, Aarestrup FM, Lund O, de Jong MD, Bossuyt PMM, Mende DR, Schultsz C. Metagenomic DNA sequencing for semi-quantitative pathogen detection from urine: a prospective, laboratory-based, proof-of-concept study. The Lancet Microbe 2022; 3:e588-e597. [DOI: 10.1016/s2666-5247(22)00088-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 03/11/2022] [Accepted: 03/31/2022] [Indexed: 10/18/2022] Open
|
6
|
Aytan-Aktug D, Clausen PTLC, Szarvas J, Munk P, Otani S, Nguyen M, Davis JJ, Lund O, Aarestrup FM. PlasmidHostFinder: Prediction of Plasmid Hosts Using Random Forest. mSystems 2022; 7:e0118021. [PMID: 35382558 PMCID: PMC9040769 DOI: 10.1128/msystems.01180-21] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Accepted: 03/16/2022] [Indexed: 11/20/2022] Open
Abstract
Plasmids play a major role facilitating the spread of antimicrobial resistance between bacteria. Understanding the host range and dissemination trajectories of plasmids is critical for surveillance and prevention of antimicrobial resistance. Identification of plasmid host ranges could be improved using automated pattern detection methods compared to homology-based methods due to the diversity and genetic plasticity of plasmids. In this study, we developed a method for predicting the host range of plasmids using machine learning-specifically, random forests. We trained the models with 8,519 plasmids from 359 different bacterial species per taxonomic level; the models achieved Matthews correlation coefficients of 0.662 and 0.867 at the species and order levels, respectively. Our results suggest that despite the diverse nature and genetic plasticity of plasmids, our random forest model can accurately distinguish between plasmid hosts. This tool is available online through the Center for Genomic Epidemiology (https://cge.cbs.dtu.dk/services/PlasmidHostFinder/). IMPORTANCE Antimicrobial resistance is a global health threat to humans and animals, causing high mortality and morbidity while effectively ending decades of success in fighting against bacterial infections. Plasmids confer extra genetic capabilities to the host organisms through accessory genes that can encode antimicrobial resistance and virulence. In addition to lateral inheritance, plasmids can be transferred horizontally between bacterial taxa. Therefore, detection of the host range of plasmids is crucial for understanding and predicting the dissemination trajectories of extrachromosomal genes and bacterial evolution as well as taking effective countermeasures against antimicrobial resistance.
Collapse
Affiliation(s)
- Derya Aytan-Aktug
- National Food Institute, Technical University of Denmark, Kgs. Lyngby, Denmark
| | | | - Judit Szarvas
- National Food Institute, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Patrick Munk
- National Food Institute, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Saria Otani
- National Food Institute, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Marcus Nguyen
- Consortium for Advanced Science and Engineering, University of Chicago, Chicago, Illinois, USA
- Data Science and Learning Division, Argonne National Laboratory, Argonne, Illinois, USA
| | - James J. Davis
- Consortium for Advanced Science and Engineering, University of Chicago, Chicago, Illinois, USA
- Data Science and Learning Division, Argonne National Laboratory, Argonne, Illinois, USA
- Northwestern Argonne Institute for Science and Engineering, Evanston, Illinois, USA
| | - Ole Lund
- National Food Institute, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Frank M. Aarestrup
- National Food Institute, Technical University of Denmark, Kgs. Lyngby, Denmark
| |
Collapse
|
7
|
Meyer F, Fritz A, Deng ZL, Koslicki D, Lesker TR, Gurevich A, Robertson G, Alser M, Antipov D, Beghini F, Bertrand D, Brito JJ, Brown CT, Buchmann J, Buluç A, Chen B, Chikhi R, Clausen PTLC, Cristian A, Dabrowski PW, Darling AE, Egan R, Eskin E, Georganas E, Goltsman E, Gray MA, Hansen LH, Hofmeyr S, Huang P, Irber L, Jia H, Jørgensen TS, Kieser SD, Klemetsen T, Kola A, Kolmogorov M, Korobeynikov A, Kwan J, LaPierre N, Lemaitre C, Li C, Limasset A, Malcher-Miranda F, Mangul S, Marcelino VR, Marchet C, Marijon P, Meleshko D, Mende DR, Milanese A, Nagarajan N, Nissen J, Nurk S, Oliker L, Paoli L, Peterlongo P, Piro VC, Porter JS, Rasmussen S, Rees ER, Reinert K, Renard B, Robertsen EM, Rosen GL, Ruscheweyh HJ, Sarwal V, Segata N, Seiler E, Shi L, Sun F, Sunagawa S, Sørensen SJ, Thomas A, Tong C, Trajkovski M, Tremblay J, Uritskiy G, Vicedomini R, Wang Z, Wang Z, Wang Z, Warren A, Willassen NP, Yelick K, You R, Zeller G, Zhao Z, Zhu S, Zhu J, Garrido-Oter R, Gastmeier P, Hacquard S, Häußler S, Khaledi A, Maechler F, Mesny F, Radutoiu S, Schulze-Lefert P, Smit N, Strowig T, Bremges A, Sczyrba A, McHardy AC. Critical Assessment of Metagenome Interpretation: the second round of challenges. Nat Methods 2022; 19:429-440. [PMID: 35396482 PMCID: PMC9007738 DOI: 10.1038/s41592-022-01431-4] [Citation(s) in RCA: 89] [Impact Index Per Article: 44.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Accepted: 02/14/2022] [Indexed: 12/20/2022]
Abstract
Evaluating metagenomic software is key for optimizing metagenome interpretation and focus of the Initiative for the Critical Assessment of Metagenome Interpretation (CAMI). The CAMI II challenge engaged the community to assess methods on realistic and complex datasets with long- and short-read sequences, created computationally from around 1,700 new and known genomes, as well as 600 new plasmids and viruses. Here we analyze 5,002 results by 76 program versions. Substantial improvements were seen in assembly, some due to long-read data. Related strains still were challenging for assembly and genome recovery through binning, as was assembly quality for the latter. Profilers markedly matured, with taxon profilers and binners excelling at higher bacterial ranks, but underperforming for viruses and Archaea. Clinical pathogen detection results revealed a need to improve reproducibility. Runtime and memory usage analyses identified efficient programs, including top performers with other metrics. The results identify challenges and guide researchers in selecting methods for analyses. This study presents the results of the second round of the Critical Assessment of Metagenome Interpretation challenges (CAMI II), which is a community-driven effort for comprehensively benchmarking tools for metagenomics data analysis.
Collapse
Affiliation(s)
- Fernando Meyer
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Adrian Fritz
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany.,German Center for Infection Research (DZIF), Hannover-Braunschweig Site, Braunschweig, Germany
| | - Zhi-Luo Deng
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany.,Cluster of Excellence RESIST (EXC 2155), Hannover Medical School, Hannover, Germany
| | | | - Till Robin Lesker
- German Center for Infection Research (DZIF), Hannover-Braunschweig Site, Braunschweig, Germany.,Helmholtz Centre for Infection Research, Braunschweig, Germany
| | | | - Gary Robertson
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Mohammed Alser
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zurich, Switzerland
| | - Dmitry Antipov
- Center for Algorithmic Biotechnology, Saint Petersburg State University, Saint Petersburg, Russia
| | | | | | | | | | - Jan Buchmann
- Institute for Biological Data Science, Heinrich-Heine-University, Düsseldorf, Germany
| | - Aydin Buluç
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,University of California, Berkeley, Berkeley, CA, USA
| | - Bo Chen
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,University of California, Berkeley, Berkeley, CA, USA
| | | | - Philip T L C Clausen
- National Food Institute, Division of Global Surveillance, Technical University of Denmark, Lyngby, Denmark
| | - Alexandru Cristian
- Drexel University, Philadelphia, PA, USA.,Google Inc., Philadelphia, PA, USA
| | - Piotr Wojciech Dabrowski
- Robert Koch-Institut, Berlin, Germany.,Hochschule für Technik und Wirtschaft Berlin, Berlin, Germany
| | | | - Rob Egan
- DOE Joint Genome Institute, Berkeley, CA, USA.,Lawrence Berkeley National Laboratories, Berkeley, CA, USA
| | - Eleazar Eskin
- University of California, Los Angeles, Los Angeles, CA, USA
| | | | - Eugene Goltsman
- DOE Joint Genome Institute, Berkeley, CA, USA.,Lawrence Berkeley National Laboratories, Berkeley, CA, USA
| | - Melissa A Gray
- Drexel University, Philadelphia, PA, USA.,Ecological and Evolutionary Signal-Processing and Informatics Laboratory, Philadelphia, PA, USA
| | - Lars Hestbjerg Hansen
- University of Copenhagen, Department of Plant and Environmental Science, Frederiksberg, Denmark
| | - Steven Hofmeyr
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,University of California, Berkeley, Berkeley, CA, USA
| | - Pingqin Huang
- School of Computer Science, Fudan University, Shanghai, China
| | - Luiz Irber
- University of California, Davis, Davis, CA, USA
| | - Huijue Jia
- BGI-Shenzhen, Shenzhen, China.,Shenzhen Key Laboratory of Human Commensal Microorganisms and Health Research, BGI-Shenzhen, Shenzhen, China
| | - Tue Sparholt Jørgensen
- Technical University of Denmark, Novo Nordisk Foundation Center for Biosustainability, Lyngby, Denmark.,Aarhus University, Department of Environmental Science, Roskilde, Denmark
| | - Silas D Kieser
- Department of Cell Physiology and Metabolism, Faculty of Medicine, University of Geneva, Geneva, Switzerland.,Swiss Institute of Bioinformatics, Geneva, Switzerland
| | | | - Axel Kola
- Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Mikhail Kolmogorov
- Department of Computer Science and Engineering, University of California San Diego, San Diego, CA, USA
| | - Anton Korobeynikov
- Center for Algorithmic Biotechnology, Saint Petersburg State University, Saint Petersburg, Russia.,Department of Statistical Modelling, Saint Petersburg State University, Saint Petersburg, Russia
| | - Jason Kwan
- University of Wisconsin-Madison, Madison, WI, USA
| | | | | | - Chenhao Li
- Genome Institute of Singapore, Singapore, Singapore
| | | | - Fabio Malcher-Miranda
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
| | | | - Vanessa R Marcelino
- Sydney Medical School, The University of Sydney, Sydney, Australia.,Centre for Innate Immunity and Infectious Diseases, Hudson Institute of Medical Research, Clayton, Australia
| | | | - Pierre Marijon
- Department of Computer Science, Inria, University of Lille, CNRS, Lille, France
| | - Dmitry Meleshko
- Center for Algorithmic Biotechnology, Saint Petersburg State University, Saint Petersburg, Russia
| | - Daniel R Mende
- Amsterdam University Medical Center, Amsterdam, the Netherlands
| | - Alessio Milanese
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zürich, Switzerland.,Structural and Computational Biology Unit, EMBL, Heidelberg, Germany
| | - Niranjan Nagarajan
- Genome Institute of Singapore, A*STAR, Singapore, Singapore.,National University of Singapore, Singapore, Singapore
| | | | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Leonid Oliker
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,University of California, Berkeley, Berkeley, CA, USA
| | - Lucas Paoli
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zürich, Switzerland
| | | | - Vitor C Piro
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
| | | | - Simon Rasmussen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Evan R Rees
- University of Wisconsin-Madison, Madison, WI, USA
| | - Knut Reinert
- Institute for Bioinformatics, FU Berlin, Berlin, Germany
| | - Bernhard Renard
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany.,Bioinformatics Unit (MF1), Robert Koch Institute, Berlin, Germany
| | | | - Gail L Rosen
- Drexel University, Philadelphia, PA, USA.,Ecological and Evolutionary Signal-Processing and Informatics Laboratory, Philadelphia, PA, USA.,Center for Biological Discovery from Big Data, Philadelphia, PA, USA
| | - Hans-Joachim Ruscheweyh
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zürich, Switzerland
| | - Varuni Sarwal
- University of California, Los Angeles, Los Angeles, CA, USA
| | - Nicola Segata
- Department CIBIO, University of Trento, Trento, Italy
| | - Enrico Seiler
- Institute for Bioinformatics, FU Berlin, Berlin, Germany
| | - Lizhen Shi
- Florida Polytechnic University, Lakeland, FL, USA
| | - Fengzhu Sun
- Quantitative and Computational Biology Department, University of Southern California, Los Angeles, CA, USA
| | - Shinichi Sunagawa
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zürich, Switzerland
| | | | - Ashleigh Thomas
- DOE Joint Genome Institute, Berkeley, CA, USA.,University of British Columbia, Vancouver, British Columbia, Canada
| | | | - Mirko Trajkovski
- Department of Cell Physiology and Metabolism, Faculty of Medicine, University of Geneva, Geneva, Switzerland.,Diabetes Center, Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | - Julien Tremblay
- Energy, Mining and Environment, National Research Council Canada, Montreal, Quebec, Canada
| | | | | | - Zhengyang Wang
- School of Computer Science, Fudan University, Shanghai, China
| | - Ziye Wang
- School of Mathematical Sciences, Fudan University, Shanghai, China
| | - Zhong Wang
- Department of Energy Joint Genome Institute, Berkeley, CA, USA.,Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,School of Natural Sciences, University of California at Merced, Merced, CA, USA
| | | | | | - Katherine Yelick
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,University of California, Berkeley, Berkeley, CA, USA
| | - Ronghui You
- School of Computer Science, Fudan University, Shanghai, China
| | - Georg Zeller
- Structural and Computational Biology Unit, EMBL, Heidelberg, Germany
| | | | - Shanfeng Zhu
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China.,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, China
| | - Jie Zhu
- BGI-Shenzhen, Shenzhen, China.,Shenzhen Key Laboratory of Human Commensal Microorganisms and Health Research, BGI-Shenzhen, Shenzhen, China
| | | | | | | | - Susanne Häußler
- Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Ariane Khaledi
- Helmholtz Centre for Infection Research, Braunschweig, Germany
| | | | - Fantin Mesny
- Max Planck Institute for Plant Breeding Research, Köln, Germany
| | | | | | - Nathiana Smit
- Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Till Strowig
- Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Andreas Bremges
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,German Center for Infection Research (DZIF), Hannover-Braunschweig Site, Braunschweig, Germany
| | - Alexander Sczyrba
- Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany
| | - Alice Carolyn McHardy
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany. .,Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany. .,German Center for Infection Research (DZIF), Hannover-Braunschweig Site, Braunschweig, Germany. .,Cluster of Excellence RESIST (EXC 2155), Hannover Medical School, Hannover, Germany.
| |
Collapse
|
8
|
Allesøe RL, Lemvigh CK, Phan MVT, Clausen PTLC, Florensa AF, Koopmans MPG, Lund O, Cotten M. Automated download and clean-up of family-specific databases for kmer-based virus identification. Bioinformatics 2021; 37:705-710. [PMID: 33031509 PMCID: PMC8097684 DOI: 10.1093/bioinformatics/btaa857] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Revised: 09/09/2020] [Accepted: 09/23/2020] [Indexed: 12/17/2022] Open
Abstract
SUMMARY Here, we present an automated pipeline for Download Of NCBI Entries (DONE) and continuous updating of a local sequence database based on user-specified queries. The database can be created with either protein or nucleotide sequences containing all entries or complete genomes only. The pipeline can automatically clean the database by removing entries with matches to a database of user-specified sequence contaminants. The default contamination entries include sequences from the UniVec database of plasmids, marker genes and sequencing adapters from NCBI, an E.coli genome, rRNA sequences, vectors and satellite sequences. Furthermore, duplicates are removed and the database is automatically screened for sequences from green fluorescent protein, luciferase and antibiotic resistance genes that might be present in some GenBank viral entries, and could lead to false positives in virus identification. For utilizing the database, we present a useful opportunity for dealing with possible human contamination. We show the applicability of DONE by downloading a virus database comprising 37 virus families. We observed an average increase of 16 776 new entries downloaded per month for the 37 families. In addition, we demonstrate the utility of a custom database compared to a standard reference database for classifying both simulated and real sequence data. AVAILABILITYAND IMPLEMENTATION The DONE pipeline for downloading and cleaning is deposited in a publicly available repository (https://bitbucket.org/genomicepidemiology/done/src/master/). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rosa L Allesøe
- National Food Institute, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark.,Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200 Copenhagen N, Denmark
| | - Camilla K Lemvigh
- National Food Institute, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark.,Department of Health Technology, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| | - My V T Phan
- Department of Viroscience, Erasmus University Medical Centre, 3000 CA Rotterdam, The Netherlands
| | - Philip T L C Clausen
- National Food Institute, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark
| | - Alfred F Florensa
- National Food Institute, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark
| | - Marion P G Koopmans
- Department of Viroscience, Erasmus University Medical Centre, 3000 CA Rotterdam, The Netherlands
| | - Ole Lund
- National Food Institute, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark
| | - Matthew Cotten
- Department of Viroscience, Erasmus University Medical Centre, 3000 CA Rotterdam, The Netherlands.,MRC/UVRI and LSHTM Uganda Research Unit, Entebbe, Uganda.,MRC-University of Glasgow Centre for Virus Research, G61 1QH Scotland, UK
| |
Collapse
|
9
|
Hallgren MB, Overballe-Petersen S, Lund O, Hasman H, Clausen PTLC. MINTyper: an outbreak-detection method for accurate and rapid SNP typing of clonal clusters with noisy long reads. Biol Methods Protoc 2021; 6:bpab008. [PMID: 33981853 PMCID: PMC8106442 DOI: 10.1093/biomethods/bpab008] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Revised: 04/09/2021] [Accepted: 04/16/2021] [Indexed: 12/27/2022] Open
Abstract
For detection of clonal outbreaks in clinical settings, we present a complete pipeline that generates a single-nucleotide polymorphisms-distance matrix from a set of sequencing reads. Importantly, the program is able to handle a separate mix of both short reads from the Illumina sequencing platforms and long reads from Oxford Nanopore Technologies’ (ONT) platforms as input. MINTyper performs automated reference identification, alignment, alignment trimming, optional methylation masking, and pairwise distance calculations. With this approach, we could rapidly and accurately cluster a set of DNA sequenced isolates, with a known epidemiological relationship to confirm the clustering. Functions were built to allow for both high-accuracy methylation-aware base-called MinION reads (hac_m Q10) and fast generated lower-quality reads (fast Q8) to be used, also in combination with Illumina data. With fast Q8 reads a higher number of base pairs were excluded from the calculated distance matrix, compared with the high-accuracy methylation-aware Q10 base-calling of ONT data. Nonetheless, when using different qualities of ONT data with corresponding input parameters, the clustering of isolates were nearly identical.
Collapse
Affiliation(s)
- Malte B Hallgren
- National Food Institute, Technical University of Denmark, Kemitorvet 204, 2800 Kgs. Lyngby, Denmark
| | | | - Ole Lund
- National Food Institute, Technical University of Denmark, Kemitorvet 204, 2800 Kgs. Lyngby, Denmark
| | - Henrik Hasman
- Department of Bacteria, Parasites and Fungi, Statens Serum Institut, 2300 Copenhagen, Denmark
| | - Philip T L C Clausen
- National Food Institute, Technical University of Denmark, Kemitorvet 204, 2800 Kgs. Lyngby, Denmark
| |
Collapse
|
10
|
Jorquera R, González C, Clausen PTLC, Petersen B, Holmes DS. SinEx DB 2.0 update 2020: database for eukaryotic single-exon coding sequences. Database (Oxford) 2021; 2021:6122466. [PMID: 33507271 PMCID: PMC7904048 DOI: 10.1093/database/baab002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Revised: 12/01/2020] [Accepted: 01/05/2021] [Indexed: 11/27/2022]
Abstract
Single-exon coding sequences (CDSs), also known as ‘single-exon genes’ (SEGs), are defined as nuclear, protein-coding genes that lack introns in their CDSs. They have been studied not only to determine their origin and evolution but also because their expression has been linked to several types of human cancers and neurological/developmental disorders, and many exhibit tissue-specific transcription. We developed SinEx DB that houses DNA and protein sequence information of SEGs from 10 mammalian genomes including human. SinEx DB includes their functional predictions (KOG (euKaryotic Orthologous Groups)) and the relative distribution of these functions within species. Here, we report SinEx 2.0, a major update of SinEx DB that includes information of the occurrence, distribution and functional prediction of SEGs from 60 completely sequenced eukaryotic genomes, representing animals, fungi, protists and plants. The information is stored in a relational database built with MySQL Server 5.7, and the complete dataset of SEG sequences and their GO (Gene Ontology) functional assignations are available for downloading. SinEx DB 2.0 was built with a novel pipeline that helps disambiguate single-exon isoforms from SEGs. SinEx DB 2.0 is the largest available database for SEGs and provides a rich source of information for advancing our understanding of the evolution, function of SEGs and their associations with disorders including cancers and neurological and developmental diseases. Database URL:http://v2.sinex.cl/
Collapse
Affiliation(s)
- R Jorquera
- Center for Bioinformatics and Genome Biology, Fundacion Ciencia & Vida, Zañartu 1482, Ñuñoa Santiago 7780132, Chile
- Laboratorio Medicina Traslacional, Fundación Arturo López Pérez, José Manuel Infante 805, Providencia, Santiago 7500691, Chile
| | - C González
- Center for Bioinformatics and Genome Biology, Fundacion Ciencia & Vida, Zañartu 1482, Ñuñoa Santiago 7780132, Chile
- Centro de Genómica y Bioinformática, Universidad Mayor, Camino la pirámide 5750, Huechuraba, Santiago 8580745, Chile
| | - P T L C Clausen
- Department of Global Surveillance, Technical University of Denmark, Kemitorvet building 204, 2800 Kgs. Lyngby, Denmark
| | - B Petersen
- Section for Evolutionary Genomics, The GLOBE Institute, University of Copenhagen, Hovedstaden, Øster Voldgade 5–7, Copenhagen 1350, Denmark
- Centre of Excellence for Omics-Driven Computational Biodiscovery (COMBio), AIMST University, Batu 3 1/2, Jalan Bukit Air Nasi, 08100 Bedong, Kedah, Malaysia
| | - D S Holmes
- *Corresponding author: Tel: +56 2 22398969;
| |
Collapse
|
11
|
Hasman H, Clausen PTLC, Kaya H, Hansen F, Knudsen JD, Wang M, Holzknecht BJ, Samulioniené J, Røder BL, Frimodt-Møller N, Lund O, Hammerum AM. LRE-Finder, a Web tool for detection of the 23S rRNA mutations and the optrA, cfr, cfr(B) and poxtA genes encoding linezolid resistance in enterococci from whole-genome sequences. J Antimicrob Chemother 2020; 74:1473-1476. [PMID: 30863844 DOI: 10.1093/jac/dkz092] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2018] [Revised: 01/16/2019] [Accepted: 02/11/2019] [Indexed: 11/14/2022] Open
Abstract
OBJECTIVES In enterococci, resistance to linezolid is often mediated by mutations in the V domain of the 23S rRNA gene (G2576T or G2505A). Furthermore, four genes [optrA, cfr, cfr(B) and poxtA] encode linezolid resistance in enterococci. We aimed to develop a Web tool for detection of the two mutations and the four genes encoding linezolid resistance in enterococci from whole-genome sequence data. METHODS LRE-Finder (where LRE stands for linezolid-resistant enterococci) detected the fraction of Ts in position 2576 and the fraction of As in position 2505 of the 23S rRNA and the cfr, cfr(B), optrA and poxtA genes by aligning raw sequencing reads (fastq format) with k-mer alignment. For evaluation, fastq files from 21 LRE isolates were submitted to LRE-Finder. As negative controls, fastq files from 1473 non-LRE isolates were submitted to LRE-Finder. The MICs of linezolid were determined for the 21 LRE isolates. As LRE-negative controls, 26 VRE isolates were additionally selected for linezolid MIC determination. RESULTS LRE-Finder was validated and showed 100% concordance with phenotypic susceptibility testing. A cut-off of 10% mutations in position 2576 and/or position 2505 was set in LRE-Finder for predicting a linezolid resistance phenotype. This cut-off allows for detection of a single mutated 23S allele in both Enterococcus faecalis and Enterococcus faecium, while ignoring low-level sequencing noise. CONCLUSIONS A Web tool for detection of the 23S rRNA mutations (G2576T and G2505A) and the optrA, cfr, cfr(B) and poxtA genes from whole-genome sequences from enterococci is now available online.
Collapse
Affiliation(s)
- Henrik Hasman
- Department of Bacteria, Parasites and Fungi, Statens Serum Institut, Copenhagen, Denmark
| | - Philip T L C Clausen
- Department of Genomic Epidemiology, National Food Institute, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Hülya Kaya
- Department of Bacteria, Parasites and Fungi, Statens Serum Institut, Copenhagen, Denmark
| | - Frank Hansen
- Department of Bacteria, Parasites and Fungi, Statens Serum Institut, Copenhagen, Denmark
| | - Jenny Dahl Knudsen
- Department of Clinical Microbiology, Rigshospitalet, Copenhagen, Denmark
| | - Mikala Wang
- Department of Clinical Microbiology, Aarhus University Hospital, Aarhus, Denmark
| | | | - Jurgita Samulioniené
- Department of Clinical Microbiology, Aalborg University Hospital, Aalborg, Denmark
| | - Bent L Røder
- Department of Clinical Microbiology, Slagelse Hospital, Slagelse, Denmark
| | | | - Ole Lund
- Department of Genomic Epidemiology, National Food Institute, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Anette M Hammerum
- Department of Bacteria, Parasites and Fungi, Statens Serum Institut, Copenhagen, Denmark
| |
Collapse
|
12
|
Marcelino VR, Clausen PTLC, Buchmann JP, Wille M, Iredell JR, Meyer W, Lund O, Sorrell TC, Holmes EC. CCMetagen: comprehensive and accurate identification of eukaryotes and prokaryotes in metagenomic data. Genome Biol 2020; 21:103. [PMID: 32345331 PMCID: PMC7189439 DOI: 10.1186/s13059-020-02014-2] [Citation(s) in RCA: 65] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Accepted: 04/13/2020] [Indexed: 01/19/2023] Open
Abstract
There is an increasing demand for accurate and fast metagenome classifiers that can not only identify bacteria, but all members of a microbial community. We used a recently developed concept in read mapping to develop a highly accurate metagenomic classification pipeline named CCMetagen. The pipeline substantially outperforms other commonly used software in identifying bacteria and fungi and can efficiently use the entire NCBI nucleotide collection as a reference to detect species with incomplete genome data from all biological kingdoms. CCMetagen is user-friendly, and the results can be easily integrated into microbial community analysis software for streamlined and automated microbiome studies.
Collapse
Affiliation(s)
- Vanessa R Marcelino
- Marie Bashir Institute for Infectious Diseases and Biosecurity and Faculty of Medicine and Health, Sydney Medical School, Westmead Clinical School, The University of Sydney, Sydney, NSW, 2006, Australia.
- Centre for Infectious Diseases and Microbiology, Westmead Institute for Medical Research, Westmead, NSW, 2145, Australia.
- School of Life & Environmental Sciences, Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia.
| | - Philip T L C Clausen
- National Food Institute, Technical University of Denmark, 2800, Kgs Lyngby, Denmark
| | - Jan P Buchmann
- School of Life & Environmental Sciences, Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Michelle Wille
- WHO Collaborating Centre for Reference and Research on Influenza, The Peter Doherty Institute for Infection and Immunity, Melbourne, VIC, 3000, Australia
| | - Jonathan R Iredell
- Marie Bashir Institute for Infectious Diseases and Biosecurity and Faculty of Medicine and Health, Sydney Medical School, Westmead Clinical School, The University of Sydney, Sydney, NSW, 2006, Australia
- Centre for Infectious Diseases and Microbiology, Westmead Institute for Medical Research, Westmead, NSW, 2145, Australia
- Westmead Hospital (Research and Education Network), Westmead, NSW, 2145, Australia
| | - Wieland Meyer
- Marie Bashir Institute for Infectious Diseases and Biosecurity and Faculty of Medicine and Health, Sydney Medical School, Westmead Clinical School, The University of Sydney, Sydney, NSW, 2006, Australia
- Westmead Hospital (Research and Education Network), Westmead, NSW, 2145, Australia
- Molecular Mycology Research Laboratory, Centre for Infectious Diseases and Microbiology, Westmead Institute for Medical Research, Westmead, NSW, 2145, Australia
| | - Ole Lund
- National Food Institute, Technical University of Denmark, 2800, Kgs Lyngby, Denmark
| | - Tania C Sorrell
- Marie Bashir Institute for Infectious Diseases and Biosecurity and Faculty of Medicine and Health, Sydney Medical School, Westmead Clinical School, The University of Sydney, Sydney, NSW, 2006, Australia
- Centre for Infectious Diseases and Microbiology, Westmead Institute for Medical Research, Westmead, NSW, 2145, Australia
| | - Edward C Holmes
- Marie Bashir Institute for Infectious Diseases and Biosecurity and Faculty of Medicine and Health, Sydney Medical School, Westmead Clinical School, The University of Sydney, Sydney, NSW, 2006, Australia
- School of Life & Environmental Sciences, Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia
| |
Collapse
|
13
|
Johnsen CH, Clausen PTLC, Aarestrup FM, Lund O. Improved Resistance Prediction in Mycobacterium tuberculosis by Better Handling of Insertions and Deletions, Premature Stop Codons, and Filtering of Non-informative Sites. Front Microbiol 2019; 10:2464. [PMID: 31736907 PMCID: PMC6834686 DOI: 10.3389/fmicb.2019.02464] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2019] [Accepted: 10/15/2019] [Indexed: 11/21/2022] Open
Abstract
Resistance in Mycobacterium tuberculosis is a major obstacle for effective treatment of tuberculosis. Multiple studies have shown promising results for predicting drug resistance in M. tuberculosis based on whole genome sequencing (WGS) data, however, these tools are often limited to this single species. We have previously developed a common platform for resistance prediction in multiple species. This platform detects acquired resistance genes (ResFinder) and species-specific chromosomal mutations (PointFinder) associated with resistance, all based on WGS data. In this study, we present a new version of PointFinder together with an updated M. tuberculosis database. PointFinder now includes predictions based on insertions and deletions, and it explicitly reports frameshift mutations and premature stop codons. We found that premature stop codons in four resistance-associated genes (katG, ethA, pncA, and gidB) were over-represented in resistant strains, and we saw an increased prediction performance when including premature stop codons in these genes as resistance markers. Different M. tuberculosis resistance prediction tools vary in performance mostly due to the mutation library used. We found that a well-established mutation library included non-predictive linage markers, and through forward feature selection we eliminated those from the mutation library. Compared to other similar web-based tools, PointFinder performs equally good. The advantages of PointFinder is that together with ResFinder it serves as a common web-based and downloadable platform for resistance detection in multiple species. It is easy to use for clinicians and already widely used in the research community.
Collapse
Affiliation(s)
- Camilla Hundahl Johnsen
- Research Group for Genomic Epidemiology, National Food Institute, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Philip T L C Clausen
- Research Group for Genomic Epidemiology, National Food Institute, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Frank M Aarestrup
- Research Group for Genomic Epidemiology, National Food Institute, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Ole Lund
- Research Group for Genomic Epidemiology, National Food Institute, Technical University of Denmark, Kongens Lyngby, Denmark
| |
Collapse
|
14
|
Clausen PTLC, Aarestrup FM, Lund O. Rapid and precise alignment of raw reads against redundant databases with KMA. BMC Bioinformatics 2018; 19:307. [PMID: 30157759 PMCID: PMC6116485 DOI: 10.1186/s12859-018-2336-6] [Citation(s) in RCA: 344] [Impact Index Per Article: 57.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Accepted: 08/23/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND As the cost of sequencing has declined, clinical diagnostics based on next generation sequencing (NGS) have become reality. Diagnostics based on sequencing will require rapid and precise mapping against redundant databases because some of the most important determinants, such as antimicrobial resistance and core genome multilocus sequence typing (MLST) alleles, are highly similar to one another. In order to facilitate this, a novel mapping method, KMA (k-mer alignment), was designed. KMA is able to map raw reads directly against redundant databases, it also scales well for large redundant databases. KMA uses k-mer seeding to speed up mapping and the Needleman-Wunsch algorithm to accurately align extensions from k-mer seeds. Multi-mapping reads are resolved using a novel sorting scheme (ConClave scheme), ensuring an accurate selection of templates. RESULTS The functionality of KMA was compared with SRST2, MGmapper, BWA-MEM, Bowtie2, Minimap2 and Salmon, using both simulated data and a dataset of Escherichia coli mapped against resistance genes and core genome MLST alleles. KMA outperforms current methods with respect to both accuracy and speed, while using a comparable amount of memory. CONCLUSION With KMA, it was possible map raw reads directly against redundant databases with high accuracy, speed and memory efficiency.
Collapse
Affiliation(s)
- Philip T L C Clausen
- Department of Bioinformatics, Technical University of Denmark, 2800, Kgs Lyngby, Denmark. .,Research Group for Genomic Epidemiology, National Food Institute, Technical University of Denmark, 2800, Kgs Lyngby, Denmark.
| | - Frank M Aarestrup
- Research Group for Genomic Epidemiology, National Food Institute, Technical University of Denmark, 2800, Kgs Lyngby, Denmark
| | - Ole Lund
- Department of Bioinformatics, Technical University of Denmark, 2800, Kgs Lyngby, Denmark
| |
Collapse
|
15
|
Clausen PTLC, Zankari E, Aarestrup FM, Lund O. Benchmarking of methods for identification of antimicrobial resistance genes in bacterial whole genome data. J Antimicrob Chemother 2016; 71:2484-8. [DOI: 10.1093/jac/dkw184] [Citation(s) in RCA: 110] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2016] [Accepted: 04/21/2016] [Indexed: 11/12/2022] Open
|