51
|
Lupolova N, Lycett SJ, Gally DL. A guide to machine learning for bacterial host attribution using genome sequence data. Microb Genom 2020; 5. [PMID: 31778355 PMCID: PMC6939162 DOI: 10.1099/mgen.0.000317] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
With the ever-expanding number of available sequences from bacterial genomes, and the expectation that this data type will be the primary one generated from both diagnostic and research laboratories for the foreseeable future, then there is both an opportunity and a need to evaluate how effectively computational approaches can be used within bacterial genomics to predict and understand complex phenotypes, such as pathogenic potential and host source. This article applied various quantitative methods such as diversity indexes, pangenome-wide association studies (GWAS) and dimensionality reduction techniques to better understand the data and then compared how well unsupervised and supervised machine learning (ML) methods could predict the source host of the isolates. The study uses the example of the pangenomes of 1203 Salmonella enterica serovar Typhimurium isolates in order to predict 'host of isolation' using these different methods. The article is aimed as a review of recent applications of ML in infection biology, but also, by working through this specific dataset, it allows discussion of the advantages and drawbacks of the different techniques. As with all such sub-population studies, the biological relevance will be dependent on the quality and diversity of the input data. Given this major caveat, we show that supervised ML has the potential to add real value to interpretation of bacterial genomic data, as it can provide probabilistic outcomes for important phenotypes, something that is very difficult to achieve with the other methods.
Collapse
Affiliation(s)
- Nadejda Lupolova
- Division of Infection and Immunity, The Roslin Institute, University of Edinburgh, Easter Bush Campus, Edinburgh, EH25 9RG, UK
| | - Samantha J Lycett
- Division of Infection and Immunity, The Roslin Institute, University of Edinburgh, Easter Bush Campus, Edinburgh, EH25 9RG, UK
| | - David L Gally
- Division of Infection and Immunity, The Roslin Institute, University of Edinburgh, Easter Bush Campus, Edinburgh, EH25 9RG, UK
| |
Collapse
|
52
|
Khaledi A, Weimann A, Schniederjans M, Asgari E, Kuo T, Oliver A, Cabot G, Kola A, Gastmeier P, Hogardt M, Jonas D, Mofrad MRK, Bremges A, McHardy AC, Häussler S. Predicting antimicrobial resistance in Pseudomonas aeruginosa with machine learning-enabled molecular diagnostics. EMBO Mol Med 2020; 12:e10264. [PMID: 32048461 PMCID: PMC7059009 DOI: 10.15252/emmm.201910264] [Citation(s) in RCA: 112] [Impact Index Per Article: 22.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2019] [Revised: 12/24/2019] [Accepted: 01/09/2020] [Indexed: 12/20/2022] Open
Abstract
Limited therapy options due to antibiotic resistance underscore the need for optimization of current diagnostics. In some bacterial species, antimicrobial resistance can be unambiguously predicted based on their genome sequence. In this study, we sequenced the genomes and transcriptomes of 414 drug-resistant clinical Pseudomonas aeruginosa isolates. By training machine learning classifiers on information about the presence or absence of genes, their sequence variation, and expression profiles, we generated predictive models and identified biomarkers of resistance to four commonly administered antimicrobial drugs. Using these data types alone or in combination resulted in high (0.8-0.9) or very high (> 0.9) sensitivity and predictive values. For all drugs except for ciprofloxacin, gene expression information improved diagnostic performance. Our results pave the way for the development of a molecular resistance profiling tool that reliably predicts antimicrobial susceptibility based on genomic and transcriptomic markers. The implementation of a molecular susceptibility test system in routine microbiology diagnostics holds promise to provide earlier and more detailed information on antibiotic resistance profiles of bacterial pathogens and thus could change how physicians treat bacterial infections.
Collapse
Affiliation(s)
- Ariane Khaledi
- Department of Molecular BacteriologyHelmholtz Centre for Infection ResearchBraunschweigGermany
- Molecular Bacteriology GroupTWINCORE‐Centre for Experimental and Clinical Infection ResearchHannoverGermany
| | - Aaron Weimann
- Molecular Bacteriology GroupTWINCORE‐Centre for Experimental and Clinical Infection ResearchHannoverGermany
- Computational Biology of Infection ResearchHelmholtz Centre for Infection ResearchBraunschweigGermany
- German Center for Infection Research (DZIF)BraunschweigGermany
| | - Monika Schniederjans
- Department of Molecular BacteriologyHelmholtz Centre for Infection ResearchBraunschweigGermany
- Molecular Bacteriology GroupTWINCORE‐Centre for Experimental and Clinical Infection ResearchHannoverGermany
| | - Ehsaneddin Asgari
- Computational Biology of Infection ResearchHelmholtz Centre for Infection ResearchBraunschweigGermany
- Molecular Cell Biomechanics LaboratoryDepartments of Bioengineering and Mechanical EngineeringUniversity of CaliforniaBerkeleyCAUSA
| | - Tzu‐Hao Kuo
- Computational Biology of Infection ResearchHelmholtz Centre for Infection ResearchBraunschweigGermany
| | - Antonio Oliver
- Servicio de Microbiología y Unidad de Investigación Hospital Universitario Son EspasesInstituto de Investigación Sanitaria Illes Balears (IdISPa)Palma de MallorcaSpain
| | - Gabriel Cabot
- Servicio de Microbiología y Unidad de Investigación Hospital Universitario Son EspasesInstituto de Investigación Sanitaria Illes Balears (IdISPa)Palma de MallorcaSpain
| | - Axel Kola
- Institute of Hygiene and Environmental MedicineCharité – Universitätsmedizin BerlinBerlinGermany
| | - Petra Gastmeier
- Institute of Hygiene and Environmental MedicineCharité – Universitätsmedizin BerlinBerlinGermany
| | - Michael Hogardt
- Institute of Medical Microbiology and Infection ControlUniversity Hospital FrankfurtFrankfurt/MainGermany
| | - Daniel Jonas
- Faculty of MedicineInstitute for Infection Prevention and Hospital EpidemiologyMedical Center‐University of FreiburgFreiburgGermany
| | - Mohammad RK Mofrad
- Molecular Cell Biomechanics LaboratoryDepartments of Bioengineering and Mechanical EngineeringUniversity of CaliforniaBerkeleyCAUSA
- Molecular Biophysics and Integrated Bioimaging DivisionLawrence Berkeley National LabBerkeleyCAUSA
| | - Andreas Bremges
- Computational Biology of Infection ResearchHelmholtz Centre for Infection ResearchBraunschweigGermany
- German Center for Infection Research (DZIF)BraunschweigGermany
| | - Alice C McHardy
- Computational Biology of Infection ResearchHelmholtz Centre for Infection ResearchBraunschweigGermany
- German Center for Infection Research (DZIF)BraunschweigGermany
| | - Susanne Häussler
- Department of Molecular BacteriologyHelmholtz Centre for Infection ResearchBraunschweigGermany
- Molecular Bacteriology GroupTWINCORE‐Centre for Experimental and Clinical Infection ResearchHannoverGermany
| |
Collapse
|
53
|
Van Puyvelde S, Pickard D, Vandelannoote K, Heinz E, Barbé B, de Block T, Clare S, Coomber EL, Harcourt K, Sridhar S, Lees EA, Wheeler NE, Klemm EJ, Kuijpers L, Mbuyi Kalonji L, Phoba MF, Falay D, Ngbonda D, Lunguya O, Jacobs J, Dougan G, Deborggraeve S. An African Salmonella Typhimurium ST313 sublineage with extensive drug-resistance and signatures of host adaptation. Nat Commun 2019; 10:4280. [PMID: 31537784 PMCID: PMC6753159 DOI: 10.1038/s41467-019-11844-z] [Citation(s) in RCA: 84] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2019] [Accepted: 08/07/2019] [Indexed: 12/22/2022] Open
Abstract
Bloodstream infections by Salmonella enterica serovar Typhimurium constitute a major health burden in sub-Saharan Africa (SSA). These invasive non-typhoidal (iNTS) infections are dominated by isolates of the antibiotic resistance-associated sequence type (ST) 313. Here, we report emergence of ST313 sublineage II.1 in the Democratic Republic of the Congo. Sublineage II.1 exhibits extensive drug resistance, involving a combination of multidrug resistance, extended spectrum β-lactamase production and azithromycin resistance. ST313 lineage II.1 isolates harbour an IncHI2 plasmid we name pSTm-ST313-II.1, with one isolate also exhibiting decreased ciprofloxacin susceptibility. Whole genome sequencing reveals that ST313 II.1 isolates have accumulated genetic signatures potentially associated with altered pathogenicity and host adaptation, related to changes observed in biofilm formation and metabolic capacity. Sublineage II.1 emerged at the beginning of the 21st century and is involved in on-going outbreaks. Our data provide evidence of further evolution within the ST313 clade associated with iNTS in SSA.
Collapse
Affiliation(s)
- Sandra Van Puyvelde
- Department of Biomedical Sciences, Institute of Tropical Medicine, Nationalestraat 155, 2000, Antwerp, Belgium. .,Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. .,Laboratory of Medical Microbiology, Vaccine & Infectious Disease Institute, University of Antwerp, Antwerp, Belgium.
| | - Derek Pickard
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.,Department of Medicine, Addenbrooke's Hospital, University of Cambridge, Cambridge, CB2 0SP, UK
| | - Koen Vandelannoote
- Department of Biomedical Sciences, Institute of Tropical Medicine, Nationalestraat 155, 2000, Antwerp, Belgium
| | - Eva Heinz
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.,Department of Vector Biology, Liverpool School of Tropical Medicine, Pembroke Place, Liverpool, L3 5QA, UK
| | - Barbara Barbé
- Department of Clinical Sciences, Institute of Tropical Medicine, Nationalestraat 155, 2000, Antwerp, Belgium
| | - Tessa de Block
- Department of Biomedical Sciences, Institute of Tropical Medicine, Nationalestraat 155, 2000, Antwerp, Belgium
| | - Simon Clare
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Eve L Coomber
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Katherine Harcourt
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Sushmita Sridhar
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.,Department of Medicine, Addenbrooke's Hospital, University of Cambridge, Cambridge, CB2 0SP, UK
| | - Emily A Lees
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.,Department of Medicine, Addenbrooke's Hospital, University of Cambridge, Cambridge, CB2 0SP, UK
| | - Nicole E Wheeler
- Centre for Genomic Pathogen Surveillance, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Elizabeth J Klemm
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Laura Kuijpers
- Department of Clinical Sciences, Institute of Tropical Medicine, Nationalestraat 155, 2000, Antwerp, Belgium.,Department of Microbiology and Immunology, KU Leuven, Herestraat 49-box 1030, 3000, Leuven, Belgium
| | - Lisette Mbuyi Kalonji
- Department of Microbiology, National Institute for Biomedical Research, Av. De La Démocratie no, 5345, Kinshasa, Democratic Republic of the Congo.,Department of Microbiology, University Hospital of Kinshasa, Kinshasa, Democratic Republic of the Congo
| | - Marie-France Phoba
- Department of Microbiology, National Institute for Biomedical Research, Av. De La Démocratie no, 5345, Kinshasa, Democratic Republic of the Congo.,Department of Microbiology, University Hospital of Kinshasa, Kinshasa, Democratic Republic of the Congo
| | - Dadi Falay
- Department of Pediatrics, University Hospital of Kisangani, Avenue Munyororo C/Makiso, Kisangani, BP 2012, Democratic Republic of the Congo
| | - Dauly Ngbonda
- Department of Pediatrics, University Hospital of Kisangani, Avenue Munyororo C/Makiso, Kisangani, BP 2012, Democratic Republic of the Congo
| | - Octavie Lunguya
- Department of Microbiology, National Institute for Biomedical Research, Av. De La Démocratie no, 5345, Kinshasa, Democratic Republic of the Congo.,Department of Microbiology, University Hospital of Kinshasa, Kinshasa, Democratic Republic of the Congo
| | - Jan Jacobs
- Department of Clinical Sciences, Institute of Tropical Medicine, Nationalestraat 155, 2000, Antwerp, Belgium.,Department of Microbiology and Immunology, KU Leuven, Herestraat 49-box 1030, 3000, Leuven, Belgium
| | - Gordon Dougan
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.,Department of Medicine, Addenbrooke's Hospital, University of Cambridge, Cambridge, CB2 0SP, UK
| | - Stijn Deborggraeve
- Department of Biomedical Sciences, Institute of Tropical Medicine, Nationalestraat 155, 2000, Antwerp, Belgium
| |
Collapse
|
54
|
Vilne B, Meistere I, Grantiņa-Ieviņa L, Ķibilds J. Machine Learning Approaches for Epidemiological Investigations of Food-Borne Disease Outbreaks. Front Microbiol 2019; 10:1722. [PMID: 31447800 PMCID: PMC6691741 DOI: 10.3389/fmicb.2019.01722] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Accepted: 07/12/2019] [Indexed: 12/14/2022] Open
Abstract
Foodborne diseases (FBDs) are infections of the gastrointestinal tract caused by foodborne pathogens (FBPs) such as bacteria [Salmonella, Listeria monocytogenes and Shiga toxin-producing E. coli (STEC)] and several viruses, but also parasites and some fungi. Artificial intelligence (AI) and its sub-discipline machine learning (ML) are re-emerging and gaining an ever increasing popularity in the scientific community and industry, and could lead to actionable knowledge in diverse ranges of sectors including epidemiological investigations of FBD outbreaks and antimicrobial resistance (AMR). As genotyping using whole-genome sequencing (WGS) is becoming more accessible and affordable, it is increasingly used as a routine tool for the detection of pathogens, and has the potential to differentiate between outbreak strains that are closely related, identify virulence/resistance genes and provide improved understanding of transmission events within hours to days. In most cases, the computational pipeline of WGS data analysis can be divided into four (though, not necessarily consecutive) major steps: de novo genome assembly, genome characterization, comparative genomics, and inference of phylogeny or phylogenomics. In each step, ML could be used to increase the speed and potentially the accuracy (provided increasing amounts of high-quality input data) of identification of the source of ongoing outbreaks, leading to more efficient treatment and prevention of additional cases. In this review, we explore whether ML or any other form of AI algorithms have already been proposed for the respective tasks and compare those with mechanistic model-based approaches.
Collapse
Affiliation(s)
- Baiba Vilne
- Institute of Food Safety, Animal Health and Environment—“BIOR”, Riga, Latvia
- SIA net-OMICS, Riga, Latvia
| | - Irēna Meistere
- Institute of Food Safety, Animal Health and Environment—“BIOR”, Riga, Latvia
| | | | - Juris Ķibilds
- Institute of Food Safety, Animal Health and Environment—“BIOR”, Riga, Latvia
| |
Collapse
|
55
|
Computational Health Engineering Applied to Model Infectious Diseases and Antimicrobial Resistance Spread. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9122486] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Infectious diseases are the primary cause of mortality worldwide. The dangers of infectious disease are compounded with antimicrobial resistance, which remains the greatest concern for human health. Although novel approaches are under investigation, the World Health Organization predicts that by 2050, septicaemia caused by antimicrobial resistant bacteria could result in 10 million deaths per year. One of the main challenges in medical microbiology is to develop novel experimental approaches, which enable a better understanding of bacterial infections and antimicrobial resistance. After the introduction of whole genome sequencing, there was a great improvement in bacterial detection and identification, which also enabled the characterization of virulence factors and antimicrobial resistance genes. Today, the use of in silico experiments jointly with computational and machine learning offer an in depth understanding of systems biology, allowing us to use this knowledge for the prevention, prediction, and control of infectious disease. Herein, the aim of this review is to discuss the latest advances in human health engineering and their applicability in the control of infectious diseases. An in-depth knowledge of host–pathogen–protein interactions, combined with a better understanding of a host’s immune response and bacterial fitness, are key determinants for halting infectious diseases and antimicrobial resistance dissemination.
Collapse
|
56
|
Wheeler NE, Blackmore T, Reynolds AD, Midwinter AC, Marshall J, French NP, Savoian MS, Gardner PP, Biggs PJ. Genomic correlates of extraintestinal infection are linked with changes in cell morphology in Campylobacter jejuni. Microb Genom 2019; 5:e000251. [PMID: 30777818 PMCID: PMC6421344 DOI: 10.1099/mgen.0.000251] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Accepted: 12/16/2018] [Indexed: 12/12/2022] Open
Abstract
Campylobacter jejuni is the most common cause of bacterial diarrheal disease in the world. Clinical outcomes of infection can range from asymptomatic infection to life-threatening extraintestinal infections. This variability in outcomes for infected patients has raised questions as to whether genetic differences between C. jejuni isolates contribute to their likelihood of causing severe disease. In this study, we compare the genomes of ten C. jejuni isolates that were implicated in extraintestinal infections with reference gastrointestinal isolates, in order to identify unusual patterns of sequence variation associated with infection outcome. We identified a collection of genes that display a higher burden of uncommon mutations in invasive isolates compared with gastrointestinal close relatives, including some that have been previously linked to virulence and invasiveness in C. jejuni. Among the top genes identified were mreB and pgp1, which are both involved in determining cell shape. Electron microscopy confirmed morphological differences in isolates carrying unusual sequence variants of these genes, indicating a possible relationship between extraintestinal infection and changes in cell morphology.
Collapse
Affiliation(s)
- Nicole E. Wheeler
- Center for Genomic Pathogen Surveillance, Wellcome Sanger Institute, Hinxton, UK
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
- Biomolecular Interaction Centre, University of Canterbury, Christchurch, New Zealand
| | | | - Angela D. Reynolds
- EpiLab, School of Veterinary Science, Massey University, Palmerston North, New Zealand
| | - Anne C. Midwinter
- EpiLab, School of Veterinary Science, Massey University, Palmerston North, New Zealand
| | - Jonathan Marshall
- EpiLab, School of Veterinary Science, Massey University, Palmerston North, New Zealand
| | - Nigel P. French
- EpiLab, School of Veterinary Science, Massey University, Palmerston North, New Zealand
- New Zealand Food Safety Science and Research Centre, Palmerston North, New Zealand
| | - Matthew S. Savoian
- Institute of Fundamental Sciences, Massey University, Palmerston North, New Zealand
| | - Paul P. Gardner
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
- Biomolecular Interaction Centre, University of Canterbury, Christchurch, New Zealand
- Department of Biochemistry, University of Otago, Dunedin, New Zealand.
| | - Patrick J. Biggs
- EpiLab, School of Veterinary Science, Massey University, Palmerston North, New Zealand
- New Zealand Genomics Ltd (NZGL – as Massey Genome Service) Massey University, Palmerston North, New Zealand
- Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North, New Zealand
| |
Collapse
|
57
|
Canals R, Hammarlöf DL, Kröger C, Owen SV, Fong WY, Lacharme-Lora L, Zhu X, Wenner N, Carden SE, Honeycutt J, Monack DM, Kingsley RA, Brownridge P, Chaudhuri RR, Rowe WPM, Predeus AV, Hokamp K, Gordon MA, Hinton JCD. Adding function to the genome of African Salmonella Typhimurium ST313 strain D23580. PLoS Biol 2019; 17:e3000059. [PMID: 30645593 PMCID: PMC6333337 DOI: 10.1371/journal.pbio.3000059] [Citation(s) in RCA: 55] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Salmonella Typhimurium sequence type (ST) 313 causes invasive nontyphoidal Salmonella (iNTS) disease in sub-Saharan Africa, targeting susceptible HIV+, malarial, or malnourished individuals. An in-depth genomic comparison between the ST313 isolate D23580 and the well-characterized ST19 isolate 4/74 that causes gastroenteritis across the globe revealed extensive synteny. To understand how the 856 nucleotide variations generated phenotypic differences, we devised a large-scale experimental approach that involved the global gene expression analysis of strains D23580 and 4/74 grown in 16 infection-relevant growth conditions. Comparison of transcriptional patterns identified virulence and metabolic genes that were differentially expressed between D23580 versus 4/74, many of which were validated by proteomics. We also uncovered the S. Typhimurium D23580 and 4/74 genes that showed expression differences during infection of murine macrophages. Our comparative transcriptomic data are presented in a new enhanced version of the Salmonella expression compendium, SalComD23580: http://bioinf.gen.tcd.ie/cgi-bin/salcom_v2.pl. We discovered that the ablation of melibiose utilization was caused by three independent SNP mutations in D23580 that are shared across ST313 lineage 2, suggesting that the ability to catabolize this carbon source has been negatively selected during ST313 evolution. The data revealed a novel, to our knowledge, plasmid maintenance system involving a plasmid-encoded CysS cysteinyl-tRNA synthetase, highlighting the power of large-scale comparative multicondition analyses to pinpoint key phenotypic differences between bacterial pathovariants.
Collapse
Affiliation(s)
- Rocío Canals
- Institute of Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - Disa L. Hammarlöf
- Institute of Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - Carsten Kröger
- Institute of Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - Siân V. Owen
- Institute of Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - Wai Yee Fong
- Institute of Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - Lizeth Lacharme-Lora
- Institute of Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - Xiaojun Zhu
- Institute of Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - Nicolas Wenner
- Institute of Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - Sarah E. Carden
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, California, United States of America
| | - Jared Honeycutt
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, California, United States of America
| | - Denise M. Monack
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, California, United States of America
| | - Robert A. Kingsley
- Quadram Institute Bioscience, Norwich Research Park, Norwich, United Kingdom
| | - Philip Brownridge
- Institute of Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - Roy R. Chaudhuri
- Department of Molecular Biology and Biotechnology, University of Sheffield, Sheffield, United Kingdom
| | - Will P. M. Rowe
- Institute of Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - Alexander V. Predeus
- Institute of Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - Karsten Hokamp
- Department of Genetics, School of Genetics and Microbiology, Smurfit Institute of Genetics, Trinity College Dublin, Ireland
| | - Melita A. Gordon
- Institute of Infection and Global Health, University of Liverpool, Liverpool, United Kingdom
- Malawi-Liverpool-Wellcome Trust Clinical Research Programme, University of Malawi College of Medicine, Malawi, Central Africa
| | - Jay C. D. Hinton
- Institute of Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| |
Collapse
|
58
|
Moradigaravand D, Palm M, Farewell A, Mustonen V, Warringer J, Parts L. Prediction of antibiotic resistance in Escherichia coli from large-scale pan-genome data. PLoS Comput Biol 2018; 14:e1006258. [PMID: 30550564 PMCID: PMC6310291 DOI: 10.1371/journal.pcbi.1006258] [Citation(s) in RCA: 105] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Revised: 12/28/2018] [Accepted: 11/18/2018] [Indexed: 12/17/2022] Open
Abstract
The emergence of microbial antibiotic resistance is a global health threat. In clinical settings, the key to controlling spread of resistant strains is accurate and rapid detection. As traditional culture-based methods are time consuming, genetic approaches have recently been developed for this task. The detection of antibiotic resistance is typically made by measuring a few known determinants previously identified from genome sequencing, and thus requires the prior knowledge of its biological mechanisms. To overcome this limitation, we employed machine learning models to predict resistance to 11 compounds across four classes of antibiotics from existing and novel whole genome sequences of 1936 E. coli strains. We considered a range of methods, and examined population structure, isolation year, gene content, and polymorphism information as predictors. Gradient boosted decision trees consistently outperformed alternative models with an average accuracy of 0.91 on held-out data (range 0.81-0.97). While the best models most frequently employed gene content, an average accuracy score of 0.79 could be obtained using population structure information alone. Single nucleotide variation data were less useful, and significantly improved prediction only for two antibiotics, including ciprofloxacin. These results demonstrate that antibiotic resistance in E. coli can be accurately predicted from whole genome sequences without a priori knowledge of mechanisms, and that both genomic and epidemiological data can be informative. This paves way to integrating machine learning approaches into diagnostic tools in the clinic.
Collapse
Affiliation(s)
- Danesh Moradigaravand
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom
- Center for Computational Biology, Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, United Kingdom
| | - Martin Palm
- Department for Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden
- Centre for Antibiotic Resistance Research at the University of Gothenburg, Gothenburg, Sweden
| | - Anne Farewell
- Department for Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden
- Centre for Antibiotic Resistance Research at the University of Gothenburg, Gothenburg, Sweden
| | - Ville Mustonen
- Organismal and Evolutionary Biology Research Programme, Department of Computer Science, Institute of Biotechnology, University of Helsinki, Helsinki, Finland
- Helsinki Institute for Information Technology HIIT, Helsinki, Finland
| | - Jonas Warringer
- Department for Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden
- Centre for Antibiotic Resistance Research at the University of Gothenburg, Gothenburg, Sweden
| | - Leopold Parts
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom
- Department of Computer Science, University of Tartu, Tartu, Estonia
| |
Collapse
|
59
|
Aun E, Brauer A, Kisand V, Tenson T, Remm M. A k-mer-based method for the identification of phenotype-associated genomic biomarkers and predicting phenotypes of sequenced bacteria. PLoS Comput Biol 2018; 14:e1006434. [PMID: 30346947 PMCID: PMC6211763 DOI: 10.1371/journal.pcbi.1006434] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2018] [Revised: 11/01/2018] [Accepted: 08/15/2018] [Indexed: 11/18/2022] Open
Abstract
We have developed an easy-to-use and memory-efficient method called PhenotypeSeeker that (a) identifies phenotype-specific k-mers, (b) generates a k-mer-based statistical model for predicting a given phenotype and (c) predicts the phenotype from the sequencing data of a given bacterial isolate. The method was validated on 167 Klebsiella pneumoniae isolates (virulence), 200 Pseudomonas aeruginosa isolates (ciprofloxacin resistance) and 459 Clostridium difficile isolates (azithromycin resistance). The phenotype prediction models trained from these datasets obtained the F1-measure of 0.88 on the K. pneumoniae test set, 0.88 on the P. aeruginosa test set and 0.97 on the C. difficile test set. The F1-measures were the same for assembled sequences and raw sequencing data; however, building the model from assembled genomes is significantly faster. On these datasets, the model building on a mid-range Linux server takes approximately 3 to 5 hours per phenotype if assembled genomes are used and 10 hours per phenotype if raw sequencing data are used. The phenotype prediction from assembled genomes takes less than one second per isolate. Thus, PhenotypeSeeker should be well-suited for predicting phenotypes from large sequencing datasets. PhenotypeSeeker is implemented in Python programming language, is open-source software and is available at GitHub (https://github.com/bioinfo-ut/PhenotypeSeeker/).
Collapse
Affiliation(s)
- Erki Aun
- Department of Bioinformatics, Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
- * E-mail:
| | - Age Brauer
- Department of Bioinformatics, Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
| | - Veljo Kisand
- Institute of Technology, University of Tartu, Tartu, Estonia
| | - Tanel Tenson
- Institute of Technology, University of Tartu, Tartu, Estonia
| | - Maido Remm
- Department of Bioinformatics, Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
| |
Collapse
|