1
|
Orcales F, Moctezuma Tan L, Johnson-Hagler M, Suntay JM, Ali J, Recto K, Glenn P, Pennings P. Using genomic data and machine learning to predict antibiotic resistance: A tutorial paper. PLoS Comput Biol 2024; 20:e1012579. [PMID: 39775233 PMCID: PMC11684616 DOI: 10.1371/journal.pcbi.1012579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2025] Open
Abstract
Antibiotic resistance is a global public health concern. Bacteria have evolved resistance to most antibiotics, which means that for any given bacterial infection, the bacteria may be resistant to one or several antibiotics. It has been suggested that genomic sequencing and machine learning (ML) could make resistance testing more accurate and cost-effective. Given that ML is likely to become an ever more important tool in medicine, we believe that it is important for pre-health students and others in the life sciences to learn to use ML tools. This paper provides a step-by-step tutorial to train 4 different ML models (logistic regression, random forests, extreme gradient-boosted trees, and neural networks) to predict drug resistance for Escherichia coli isolates and to evaluate their performance using different metrics and cross-validation techniques. We also guide the user in how to load and prepare the data used for the ML models. The tutorial is accessible to beginners and does not require any software to be installed as it is based on Google Colab notebooks and provides a basic understanding of the different ML models. The tutorial can be used in undergraduate and graduate classes for students in Biology, Public Health, Computer Science, or related fields.
Collapse
Affiliation(s)
- Faye Orcales
- Department of Biology, San Francisco State University, San Francisco, California, United States of America
- University of California San Francisco, San Francisco, California, United States of America
| | - Lucy Moctezuma Tan
- Department of Biology, San Francisco State University, San Francisco, California, United States of America
- Department of Statistics, California State University East Bay, Hayward, California, United States of America
| | - Meris Johnson-Hagler
- Department of Biology, San Francisco State University, San Francisco, California, United States of America
| | - John Matthew Suntay
- Department of Biology, San Francisco State University, San Francisco, California, United States of America
- University of California San Francisco, San Francisco, California, United States of America
| | - Jameel Ali
- Department of Biology, San Francisco State University, San Francisco, California, United States of America
| | - Kristiene Recto
- Department of Biology, San Francisco State University, San Francisco, California, United States of America
| | - Phelan Glenn
- Department of Biology, San Francisco State University, San Francisco, California, United States of America
- David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, United States of America
| | - Pleuni Pennings
- Department of Biology, San Francisco State University, San Francisco, California, United States of America
| |
Collapse
|
2
|
Do VH, Nguyen VS, Nguyen SH, Le DQ, Nguyen TT, Nguyen CH, Ho TH, Vo NS, Nguyen T, Nguyen HA, Cao MD. PanKA: Leveraging population pangenome to predict antibiotic resistance. iScience 2024; 27:110623. [PMID: 39228791 PMCID: PMC11369404 DOI: 10.1016/j.isci.2024.110623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 04/14/2024] [Accepted: 07/29/2024] [Indexed: 09/05/2024] Open
Abstract
Machine learning has the potential to be a powerful tool in the fight against antimicrobial resistance (AMR), a critical global health issue. Machine learning can identify resistance mechanisms from DNA sequence data without prior knowledge. The first step in building a machine learning model is a feature extraction from sequencing data. Traditional methods like single nucleotide polymorphism (SNP) calling and k-mer counting yield numerous, often redundant features, complicating prediction and analysis. In this paper, we propose PanKA, a method using the pangenome to extract a concise set of relevant features for predicting AMR. PanKA not only enables fast model training and prediction but also improves accuracy. Applied to the Escherichia coli and Klebsiella pneumoniae bacterial species, our model is more accurate than conventional and state-of-the-art methods in predicting AMR.
Collapse
Affiliation(s)
- Van Hoan Do
- Center for Applied Mathematics and Informatics, Le Quy Don Technical University, Hanoi, Vietnam
| | - Van Sang Nguyen
- Center for Biomedical Informatics, Vingroup Big Data Institute, Hanoi, Vietnam
| | | | - Duc Quang Le
- Faculty of IT, Hanoi University of Civil Engineering, Hanoi, Vietnam
| | - Tam Thi Nguyen
- Oxford University Clinical Research Unit, Hanoi, Vietnam
| | - Canh Hao Nguyen
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan
| | - Tho Huu Ho
- Department of Medical Microbiology, The 103 Military Hospital, Vietnam Military Medical University, Hanoi, Vietnam
- Department of Genomics & Cytogenetics, Institute of Biomedicine & Pharmacy, Vietnam Military Medical University, Hanoi, Vietnam
| | - Nam S. Vo
- Center for Biomedical Informatics, Vingroup Big Data Institute, Hanoi, Vietnam
| | | | | | | |
Collapse
|
3
|
Dillon L, Dimonaco NJ, Creevey CJ. Accessory genes define species-specific routes to antibiotic resistance. Life Sci Alliance 2024; 7:e202302420. [PMID: 38228374 PMCID: PMC10791901 DOI: 10.26508/lsa.202302420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 12/29/2023] [Accepted: 01/03/2024] [Indexed: 01/18/2024] Open
Abstract
A deeper understanding of the relationship between the antimicrobial resistance (AMR) gene carriage and phenotype is necessary to develop effective response strategies against this global burden. AMR phenotype is often a result of multi-gene interactions; therefore, we need approaches that go beyond current simple AMR gene identification tools. Machine-learning (ML) methods may meet this challenge and allow the development of rapid computational approaches for AMR phenotype classification. To examine this, we applied multiple ML techniques to 16,950 bacterial genomes across 28 genera, with corresponding MICs for 23 antibiotics with the aim of training models to accurately determine the AMR phenotype from sequenced genomes. This resulted in a >1.5-fold increase in AMR phenotype prediction accuracy over AMR gene identification alone. Furthermore, we revealed 528 unique (often species-specific) genomic routes to antibiotic resistance, including genes not previously linked to the AMR phenotype. Our study demonstrates the utility of ML in predicting AMR phenotypes across diverse clinically relevant organisms and antibiotics. This research proposes a rapid computational method to support laboratory-based identification of the AMR phenotype in pathogens.
Collapse
Affiliation(s)
- Lucy Dillon
- School of Biological Sciences, Queen's University Belfast, Belfast, UK
| | - Nicholas J Dimonaco
- School of Biological Sciences, Queen's University Belfast, Belfast, UK
- Department of Medicine, McMaster University, Hamilton, Ontario, Canada
- Farncombe Family Digestive Health Research Institute, McMaster University, Hamilton, Canada
| | | |
Collapse
|
4
|
Hu K, Meyer F, Deng ZL, Asgari E, Kuo TH, Münch PC, McHardy AC. Assessing computational predictions of antimicrobial resistance phenotypes from microbial genomes. Brief Bioinform 2024; 25:bbae206. [PMID: 38706320 PMCID: PMC11070729 DOI: 10.1093/bib/bbae206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 04/08/2024] [Accepted: 04/11/2024] [Indexed: 05/07/2024] Open
Abstract
The advent of rapid whole-genome sequencing has created new opportunities for computational prediction of antimicrobial resistance (AMR) phenotypes from genomic data. Both rule-based and machine learning (ML) approaches have been explored for this task, but systematic benchmarking is still needed. Here, we evaluated four state-of-the-art ML methods (Kover, PhenotypeSeeker, Seq2Geno2Pheno and Aytan-Aktug), an ML baseline and the rule-based ResFinder by training and testing each of them across 78 species-antibiotic datasets, using a rigorous benchmarking workflow that integrates three evaluation approaches, each paired with three distinct sample splitting methods. Our analysis revealed considerable variation in the performance across techniques and datasets. Whereas ML methods generally excelled for closely related strains, ResFinder excelled for handling divergent genomes. Overall, Kover most frequently ranked top among the ML approaches, followed by PhenotypeSeeker and Seq2Geno2Pheno. AMR phenotypes for antibiotic classes such as macrolides and sulfonamides were predicted with the highest accuracies. The quality of predictions varied substantially across species-antibiotic combinations, particularly for beta-lactams; across species, resistance phenotyping of the beta-lactams compound, aztreonam, amoxicillin/clavulanic acid, cefoxitin, ceftazidime and piperacillin/tazobactam, alongside tetracyclines demonstrated more variable performance than the other benchmarked antibiotics. By organism, Campylobacter jejuni and Enterococcus faecium phenotypes were more robustly predicted than those of Escherichia coli, Staphylococcus aureus, Salmonella enterica, Neisseria gonorrhoeae, Klebsiella pneumoniae, Pseudomonas aeruginosa, Acinetobacter baumannii, Streptococcus pneumoniae and Mycobacterium tuberculosis. In addition, our study provides software recommendations for each species-antibiotic combination. It furthermore highlights the need for optimization for robust clinical applications, particularly for strains that diverge substantially from those used for training.
Collapse
Affiliation(s)
- Kaixin Hu
- Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Fernando Meyer
- Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Zhi-Luo Deng
- Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Ehsaneddin Asgari
- Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany
- Molecular Cell Biomechanics Laboratory, Department of Bioengineering and Mechanical Engineering, University of California, Berkeley, USA
| | - Tzu-Hao Kuo
- Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Philipp C Münch
- Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
- Cluster of Excellence RESIST (EXC 2155), Hannover Medical School, Hannover, Germany
- German Center for Infection Research (DZIF), partner site Hannover Braunschweig, Braunschweig, Germany
- Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA
| | - Alice C McHardy
- Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| |
Collapse
|
5
|
Mofidifar S, Yadegar A, Karimi-Jafari MH. A reconstructed genome-scale metabolic model of Helicobacter pylori for predicting putative drug targets in clarithromycin and rifampicin resistance conditions. Helicobacter 2024; 29:e13074. [PMID: 38615332 DOI: 10.1111/hel.13074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 03/27/2024] [Accepted: 04/01/2024] [Indexed: 04/16/2024]
Abstract
BACKGROUND Helicobacter pylori is considered a true human pathogen for which rising drug resistance constitutes a drastic concern globally. The present study aimed to reconstruct a genome-scale metabolic model (GSMM) to decipher the metabolic capability of H. pylori strains in response to clarithromycin and rifampicin along with identification of novel drug targets. MATERIALS AND METHODS The iIT341 model of H. pylori was updated based on genome annotation data, and biochemical knowledge from literature and databases. Context-specific models were generated by integrating the transcriptomic data of clarithromycin and rifampicin resistance into the model. Flux balance analysis was employed for identifying essential genes in each strain, which were further prioritized upon being nonhomologs to humans, virulence factor analysis, druggability, and broad-spectrum analysis. Additionally, metabolic differences between sensitive and resistant strains were also investigated based on flux variability analysis and pathway enrichment analysis of transcriptomic data. RESULTS The reconstructed GSMM was named as HpM485 model. Pathway enrichment and flux variability analyses demonstrated reduced activity in the ribosomal pathway in both clarithromycin- and rifampicin-resistant strains. Also, a significant decrease was detected in the activity of metabolic pathways of clarithromycin-resistant strain. Moreover, 23 and 16 essential genes were exclusively detected in clarithromycin- and rifampicin-resistant strains, respectively. Based on prioritization analysis, cyclopropane fatty acid synthase and phosphoenolpyruvate synthase were identified as putative drug targets in clarithromycin- and rifampicin-resistant strains, respectively. CONCLUSIONS We present a robust and reliable metabolic model of H. pylori. This model can predict novel drug targets to combat drug resistance and explore the metabolic capability of H. pylori in various conditions.
Collapse
Affiliation(s)
- Sepideh Mofidifar
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Abbas Yadegar
- Foodborne and Waterborne Diseases Research Center, Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran
- Gastroenterology and Liver Diseases Research Center, Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | | |
Collapse
|
6
|
Gao Y, Li H, Zhao C, Li S, Yin G, Wang H. Machine learning and feature extraction for rapid antimicrobial resistance prediction of Acinetobacter baumannii from whole-genome sequencing data. Front Microbiol 2024; 14:1320312. [PMID: 38274740 PMCID: PMC10808480 DOI: 10.3389/fmicb.2023.1320312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Accepted: 12/22/2023] [Indexed: 01/27/2024] Open
Abstract
Background Whole-genome sequencing (WGS) has contributed significantly to advancements in machine learning methods for predicting antimicrobial resistance (AMR). However, the comparisons of different methods for AMR prediction without requiring prior knowledge of resistance remains to be conducted. Methods We aimed to predict the minimum inhibitory concentrations (MICs) of 13 antimicrobial agents against Acinetobacter baumannii using three machine learning algorithms (random forest, support vector machine, and XGBoost) combined with k-mer features extracted from WGS data. Results A cohort of 339 isolates was used for model construction. The average essential agreement and category agreement of the best models exceeded 90.90% (95%CI, 89.03-92.77%) and 95.29% (95%CI, 94.91-95.67%), respectively; the exceptions being levofloxacin, minocycline and imipenem. The very major error rates ranged from 0.0 to 5.71%. We applied feature selection pipelines to extract the top-ranked 11-mers to optimise training time and computing resources. This approach slightly improved the prediction performance and enabled us to obtain prediction results within 10 min. Notably, when employing these top-ranked 11-mers in an independent test dataset (120 isolates), we achieved an average accuracy of 0.96. Conclusion Our study is the first to demonstrate that AMR prediction for A. baumannii using machine learning methods based on k-mer features has competitive performance over traditional workflows; hence, sequence-based AMR prediction and its application could be further promoted. The k-mer-based workflow developed in this study demonstrated high recall/sensitivity and specificity, making it a dependable tool for MIC prediction in clinical settings.
Collapse
Affiliation(s)
- Yue Gao
- Institute of Medical Technology, Peking University Health Science Center, Beijing, China
- Department of Clinical Laboratory, Peking University People's Hospital, Beijing, China
| | - Henan Li
- Department of Clinical Laboratory, Peking University People's Hospital, Beijing, China
| | - Chunjiang Zhao
- Department of Clinical Laboratory, Peking University People's Hospital, Beijing, China
| | - Shuguang Li
- Department of Clinical Laboratory, Peking University People's Hospital, Beijing, China
| | - Guankun Yin
- Department of Clinical Laboratory, Peking University People's Hospital, Beijing, China
| | - Hui Wang
- Institute of Medical Technology, Peking University Health Science Center, Beijing, China
- Department of Clinical Laboratory, Peking University People's Hospital, Beijing, China
| |
Collapse
|
7
|
Wheeler NE, Price V, Cunningham-Oakes E, Tsang KK, Nunn JG, Midega JT, Anjum MF, Wade MJ, Feasey NA, Peacock SJ, Jauneikaite E, Baker KS. Innovations in genomic antimicrobial resistance surveillance. THE LANCET. MICROBE 2023; 4:e1063-e1070. [PMID: 37977163 DOI: 10.1016/s2666-5247(23)00285-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 08/16/2023] [Accepted: 08/22/2023] [Indexed: 11/19/2023]
Abstract
Whole-genome sequencing of antimicrobial-resistant pathogens is increasingly being used for antimicrobial resistance (AMR) surveillance, particularly in high-income countries. Innovations in genome sequencing and analysis technologies promise to revolutionise AMR surveillance and epidemiology; however, routine adoption of these technologies is challenging, particularly in low-income and middle-income countries. As part of a wider series of workshops and online consultations, a group of experts in AMR pathogen genomics and computational tool development conducted a situational analysis, identifying the following under-used innovations in genomic AMR surveillance: clinical metagenomics, environmental metagenomics, gene or plasmid tracking, and machine learning. The group recommended developing cost-effective use cases for each approach and mapping data outputs to clinical outcomes of interest to justify additional investment in capacity, training, and staff required to implement these technologies. Harmonisation and standardisation of methods, and the creation of equitable data sharing and governance frameworks, will facilitate successful implementation of these innovations.
Collapse
Affiliation(s)
- Nicole E Wheeler
- Institute of Microbiology and Infection, University of Birmingham, Birmingham, Edgbaston, UK
| | - Vivien Price
- Department of Clinical Infection, Immunology and Microbiology, Liverpool Centre for Global Health Research, University of Liverpool, Liverpool, UK
| | - Edward Cunningham-Oakes
- Department of Infection Biology and Microbiomes, Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, UK
| | - Kara K Tsang
- Department of Infection Biology, London School of Hygiene & Tropical Medicine, London, UK
| | - Jamie G Nunn
- Infectious Disease Challenge Area, Wellcome Trust, London, UK
| | | | - Muna F Anjum
- Department of Bacteriology, Animal and Plant Health Agency, Surrey, UK
| | - Matthew J Wade
- Data Analytics and Surveillance Group, UK Health Security Agency, London, UK; School of Engineering, Newcastle University, Newcastle-upon-Tyne, UK
| | - Nicholas A Feasey
- Clinical Sciences, Liverpool School of Tropical Medicine, Liverpool, UK; Malawi Liverpool Wellcome Research Programme, Chichiri, Blantyre, Malawi
| | | | - Elita Jauneikaite
- Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, UK; NIHR Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, Department of Infectious Disease, Imperial College London, Hammersmith Hospital, London, UK
| | - Kate S Baker
- Centre for Clinical Infection, Microbiology and Immunology, University of Liverpool, Liverpool, UK; Department of Genetics, University of Cambridge, Cambridge, UK.
| |
Collapse
|
8
|
Hyun JC, Monk JM, Szubin R, Hefner Y, Palsson BO. Global pathogenomic analysis identifies known and candidate genetic antimicrobial resistance determinants in twelve species. Nat Commun 2023; 14:7690. [PMID: 38001096 PMCID: PMC10673929 DOI: 10.1038/s41467-023-43549-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Accepted: 11/14/2023] [Indexed: 11/26/2023] Open
Abstract
Surveillance programs for managing antimicrobial resistance (AMR) have yielded thousands of genomes suited for data-driven mechanism discovery. We present a workflow integrating pangenomics, gene annotation, and machine learning to identify AMR genes at scale. When applied to 12 species, 27,155 genomes, and 69 drugs, we 1) find AMR gene transfer mostly confined within related species, with 925 genes in multiple species but just eight in multiple phylogenetic classes, 2) demonstrate that discovery-oriented support vector machines outperform contemporary methods at recovering known AMR genes, recovering 263 genes compared to 145 by Pyseer, and 3) identify 142 AMR gene candidates. Validation of two candidates in E. coli BW25113 reveals cases of conditional resistance: ΔcycA confers ciprofloxacin resistance in minimal media with D-serine, and frdD V111D confers ampicillin resistance in the presence of ampC by modifying the overlapping promoter. We expect this approach to be adaptable to other species and phenotypes.
Collapse
Affiliation(s)
- Jason C Hyun
- Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA, USA
| | - Jonathan M Monk
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
| | - Richard Szubin
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
| | - Ying Hefner
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
| | - Bernhard O Palsson
- Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA, USA.
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA.
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA.
- Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA, USA.
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet, Building 220, 2800, Kongens, Lyngby, Denmark.
| |
Collapse
|
9
|
Nguyen M, Elmore Z, Ihle C, Moen FS, Slater AD, Turner BN, Parrello B, Best AA, Davis JJ. Predicting variable gene content in Escherichia coli using conserved genes. mSystems 2023; 8:e0005823. [PMID: 37314210 PMCID: PMC10469788 DOI: 10.1128/msystems.00058-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Accepted: 04/25/2023] [Indexed: 06/15/2023] Open
Abstract
Having the ability to predict the protein-encoding gene content of an incomplete genome or metagenome-assembled genome is important for a variety of bioinformatic tasks. In this study, as a proof of concept, we built machine learning classifiers for predicting variable gene content in Escherichia coli genomes using only the nucleotide k-mers from a set of 100 conserved genes as features. Protein families were used to define orthologs, and a single classifier was built for predicting the presence or absence of each protein family occurring in 10%-90% of all E. coli genomes. The resulting set of 3,259 extreme gradient boosting classifiers had a per-genome average macro F1 score of 0.944 [0.943-0.945, 95% CI]. We show that the F1 scores are stable across multi-locus sequence types and that the trend can be recapitulated by sampling a smaller number of core genes or diverse input genomes. Surprisingly, the presence or absence of poorly annotated proteins, including "hypothetical proteins" was accurately predicted (F1 = 0.902 [0.898-0.906, 95% CI]). Models for proteins with horizontal gene transfer-related functions had slightly lower F1 scores but were still accurate (F1s = 0.895, 0.872, 0.824, and 0.841 for transposon, phage, plasmid, and antimicrobial resistance-related functions, respectively). Finally, using a holdout set of 419 diverse E. coli genomes that were isolated from freshwater environmental sources, we observed an average per-genome F1 score of 0.880 [0.876-0.883, 95% CI], demonstrating the extensibility of the models. Overall, this study provides a framework for predicting variable gene content using a limited amount of input sequence data. IMPORTANCE Having the ability to predict the protein-encoding gene content of a genome is important for assessing genome quality, binning genomes from shotgun metagenomic assemblies, and assessing risk due to the presence of antimicrobial resistance and other virulence genes. In this study, we built a set of binary classifiers for predicting the presence or absence of variable genes occurring in 10%-90% of all publicly available E. coli genomes. Overall, the results show that a large portion of the E. coli variable gene content can be predicted with high accuracy, including genes with functions relating to horizontal gene transfer. This study offers a strategy for predicting gene content using limited input sequence data.
Collapse
Affiliation(s)
- Marcus Nguyen
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois, USA
- Consortium for Advanced Science and Engineering, University of Chicago, Chicago, Illinois, USA
| | - Zachary Elmore
- Biology Department, Hope College, Holland, Michigan, USA
| | - Clay Ihle
- Biology Department, Hope College, Holland, Michigan, USA
| | | | - Adam D. Slater
- Biology Department, Hope College, Holland, Michigan, USA
| | | | - Bruce Parrello
- Consortium for Advanced Science and Engineering, University of Chicago, Chicago, Illinois, USA
- Fellowship for Interpretation of Genomes, Burr Ridge, Illinois, USA
| | - Aaron A. Best
- Biology Department, Hope College, Holland, Michigan, USA
| | - James J. Davis
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois, USA
- Consortium for Advanced Science and Engineering, University of Chicago, Chicago, Illinois, USA
| |
Collapse
|
10
|
Yu J, Lin YT, Chen WC, Tseng KH, Lin HH, Tien N, Cho CF, Huang JY, Liang SJ, Ho LC, Hsieh YW, Hsu KC, Ho MW, Hsueh PR, Cho DY. Direct prediction of carbapenem-resistant, carbapenemase-producing, and colistin-resistant Klebsiella pneumoniae isolates from routine MALDI-TOF mass spectra using machine learning and outcome evaluation. Int J Antimicrob Agents 2023; 61:106799. [PMID: 37004755 DOI: 10.1016/j.ijantimicag.2023.106799] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 03/14/2023] [Accepted: 03/26/2023] [Indexed: 04/03/2023]
Abstract
The objective of this study was to develop a rapid prediction method for carbapenem-resistant Klebsiella pneumoniae (CRKP) and colistin-resistant K. pneumoniae (ColRKP) based on routine MALDI-TOF mass spectrometry (MS) results in order to formulate a suitable and rapid treatment strategy. In total, 830 CRKP and 1,462 carbapenem-susceptible K. pneumoniae (CSKP) isolates were collected; 54 ColRKP isolates and 1,592 colistin-intermediate K. pneumoniae (ColIKP) isolates were also included. Routine MALDI-TOF MS, antimicrobial susceptibility testing, NG-Test CARBA 5, and resistance gene detection were followed by machine learning (ML). Using the ML model, the accuracy and area under the curve for differentiating CRKP and CSKP were 0.8869 and 0.9551, and those for ColRKP and ColIKP were 0.8361 and 0.8447, respectively. The most important MS features of CRKP and ColRKP were m/z 4520-4529 and m/z 4170-4179, respectively. Of the CRKP isolates, MS m/z 4520-4529 was a potential biomarker for distinguishing KPC from OXA, NDM, IMP, and VIM. Of the 34 patients who received preliminary CRKP ML prediction results (by texting), 24 (70.6%) were confirmed to have CRKP infection. The mortality rate was lower in patients who received antibiotic regimen adjustment based on the preliminary ML prediction (4/14, 28.6%). In conclusion, the proposed model can provide rapid results for differentiating CRKP and CSKP, as well as ColRKP and ColIKP. The combination of ML-based CRKP with preliminary reporting of results can help physicians alter the regimen approximately 24 h earlier, resulting in improved survival of patients with timely antibiotic intervention.
Collapse
|
11
|
Li S, Wu J, Ma N, Liu W, Shao M, Ying N, Zhu L. Prediction of genome-wide imipenem resistance features in Klebsiella pneumoniae using machine learning. J Med Microbiol 2023; 72. [PMID: 36753438 DOI: 10.1099/jmm.0.001657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2023] Open
Abstract
Introduction. The resistance rate of Klebsiella pneumoniae (K. pneumoniae) to imipenem is increasing year by year, and the imipenem resistance mechanism of K. pneumoniae is complex. Therefore, it is urgent to develop new strategies to explore the resistance mechanism of imipenem for its effective and accurate use in clinical practice.Hypothesis/Gap sStatement. Machine learning could identify resistance features and biological process that influence microbial resistance from whole-genome sequencing (WGS) data.Aims. This work aimed to predict imipenem resistance genetic features in K. pneumoniae from whole-genome k-mer features, and analyse their function for understanding its resistance mechanism.Methods. This study analysed WGS data of K. pneumoniae combined with resistance phenotype for imipenem, and established K. pneumoniae to imipenem genotype-phenotype model to predict resistance features using chi-squared test and random forest. An external clinical dataset was used to verify prediction power of resistance features. The potential genes were identified through alignment the resistance features with the K. pneumoniae reference genome using blastn, the functions of potential genes were further analysed to explore its resistance-related signalling pathways with GO and KEGG analysis, the resistance sequence patterns were screened using streme software. Finally, the resistance features were combined and modelled through four machine-learning algorithms (logistic regression, SVM, GBDT and XGBoost) to evaluate their phenotype prediction ability.Results. A total of 16 670 imipenem resistance features were predicted from genotype-phenotype model. The 30 potential genes were identified by annotating the resistance features and corresponded to known antibiotic-related genes (mdtM, dedA, rne, etc.). GO and KEGG pathway analyses indicated the possible association of imipenem resistance with metabolism process and cell membrane. CRYCAGCDN and CGRDAAAN were found from the imipenem resistance features, which were widely presented in the reported β-lactam resistance genes (bla SHV, bla CTX-M, bla TEM, etc.), and YCYAGCMCAST with metabolic functions (organic substance metabolic process, nitrogen compound metabolic process and cellular metabolic process) was identified from the top 50 resistance features. The 25 resistance genes in the training dataset included 19 genes in the external dataset, which verified the accuracy of prediction. The area under curve values of logistics regression, SVM, GBDT and XGBoost were 0.965, 0.966, 0.969 and 0.969, respectively, indicating that the imipenem resistance features have a strong prediction power.Conclusion. Machine-learning methods could effectively predict the imipenem resistance feature in K. pneumoniae, and provide resistance sequence profiles for predicting resistance phenotype and exploring potential resistance mechanisms. It provides an important insight into the potential therapeutic strategies of K. pneumoniae resistance to imipenem, and speed up the application of machine learning in routine diagnosis.
Collapse
Affiliation(s)
- Shanshan Li
- College of Automation, Hangzhou Dianzi University, Hangzhou, Zhejiang, 310018, PR China
| | - Jun Wu
- Lin'an Center for Disease Control and Prevention, Lin'an, 311300, PR China
| | - Nan Ma
- College of Automation, Hangzhou Dianzi University, Hangzhou, Zhejiang, 310018, PR China
| | - Wenjia Liu
- College of Automation, Hangzhou Dianzi University, Hangzhou, Zhejiang, 310018, PR China.,College of Electronics and Information Engineering, Hangzhou Dianzi University, Hangzhou 310018, PR China
| | - Mengjie Shao
- College of Automation, Hangzhou Dianzi University, Hangzhou, Zhejiang, 310018, PR China
| | - Nanjiao Ying
- College of Automation, Hangzhou Dianzi University, Hangzhou, Zhejiang, 310018, PR China.,Institute of Biomedical Engineering and Instrument, Hangzhou Dianzi University, Hangzhou, Zhejiang, 310018, PR China
| | - Lei Zhu
- College of Automation, Hangzhou Dianzi University, Hangzhou, Zhejiang, 310018, PR China.,Institute of Biomedical Engineering and Instrument, Hangzhou Dianzi University, Hangzhou, Zhejiang, 310018, PR China
| |
Collapse
|
12
|
Tharmakulasingam M, Gardner B, La Ragione R, Fernando A. Rectified Classifier Chains for Prediction of Antibiotic Resistance From Multi-Labelled Data With Missing Labels. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:625-636. [PMID: 35130168 DOI: 10.1109/tcbb.2022.3148577] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Predicting Antimicrobial Resistance (AMR) from genomic data has important implications for human and animal healthcare, and especially given its potential for more rapid diagnostics and informed treatment choices. With the recent advances in sequencing technologies, applying machine learning techniques for AMR prediction have indicated promising results. Despite this, there are shortcomings in the literature concerning methodologies suitable for multi-drug AMR prediction and especially where samples with missing labels exist. To address this shortcoming, we introduce a Rectified Classifier Chain (RCC) method for predicting multi-drug resistance. This RCC method was tested using annotated features of genomics sequences and compared with similar multi-label classification methodologies. We found that applying the eXtreme Gradient Boosting (XGBoost) base model to our RCC model outperformed the second-best model, XGBoost based binary relevance model, by 3.3% in Hamming accuracy and 7.8% in F1-score. Additionally, we note that in the literature machine learning models applied to AMR prediction typically are unsuitable for identifying biomarkers informative of their decisions; in this study, we show that biomarkers contributing to AMR prediction can also be identified using the proposed RCC method. We expect this can facilitate genome annotation and pave the path towards identifying new biomarkers indicative of AMR.
Collapse
|
13
|
Kim JI, Maguire F, Tsang KK, Gouliouris T, Peacock SJ, McAllister TA, McArthur AG, Beiko RG. Machine Learning for Antimicrobial Resistance Prediction: Current Practice, Limitations, and Clinical Perspective. Clin Microbiol Rev 2022; 35:e0017921. [PMID: 35612324 PMCID: PMC9491192 DOI: 10.1128/cmr.00179-21] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Antimicrobial resistance (AMR) is a global health crisis that poses a great threat to modern medicine. Effective prevention strategies are urgently required to slow the emergence and further dissemination of AMR. Given the availability of data sets encompassing hundreds or thousands of pathogen genomes, machine learning (ML) is increasingly being used to predict resistance to different antibiotics in pathogens based on gene content and genome composition. A key objective of this work is to advocate for the incorporation of ML into front-line settings but also highlight the further refinements that are necessary to safely and confidently incorporate these methods. The question of what to predict is not trivial given the existence of different quantitative and qualitative laboratory measures of AMR. ML models typically treat genes as independent predictors, with no consideration of structural and functional linkages; they also may not be accurate when new mutational variants of known AMR genes emerge. Finally, to have the technology trusted by end users in public health settings, ML models need to be transparent and explainable to ensure that the basis for prediction is clear. We strongly advocate that the next set of AMR-ML studies should focus on the refinement of these limitations to be able to bridge the gap to diagnostic implementation.
Collapse
Affiliation(s)
- Jee In Kim
- Faculty of Computer Science, Dalhousie University, Halifax, Canada
- Institute for Comparative Genomics, Dalhousie University, Halifax, Canada
- Lethbridge Research and Development Centre, Agriculture and Agri-Food Canada, Lethbridge, Canada
| | - Finlay Maguire
- Faculty of Computer Science, Dalhousie University, Halifax, Canada
- Institute for Comparative Genomics, Dalhousie University, Halifax, Canada
- Department of Community Health and Epidemiology, Faculty of Medicine, Dalhousie University, Halifax, Canada
- Shared Hospital Laboratory, Toronto, Canada
- Sunnybrook Research Institute, Sunnybrook Health Sciences Centre, Toronto, Canada
| | - Kara K. Tsang
- London School of Hygiene & Tropical Medicine, London, United Kingdom
| | - Theodore Gouliouris
- Department of Medicine, University of Cambridge, Cambridge, United Kingdom
- Clinical Microbiology and Public Health Laboratory, Public Health England, Cambridge, United Kingdom
- Cambridge University Hospitals NHS Foundation Trust, Cambridge, United Kingdom
| | - Sharon J. Peacock
- Department of Medicine, University of Cambridge, Cambridge, United Kingdom
| | - Tim A. McAllister
- Lethbridge Research and Development Centre, Agriculture and Agri-Food Canada, Lethbridge, Canada
| | - Andrew G. McArthur
- David Braley Centre for Antibiotic Discovery, McMaster University, Hamilton, Canada
- M.G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, Canada
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Canada
| | - Robert G. Beiko
- Faculty of Computer Science, Dalhousie University, Halifax, Canada
- Institute for Comparative Genomics, Dalhousie University, Halifax, Canada
| |
Collapse
|
14
|
Montelongo C, Mores CR, Putonti C, Wolfe AJ, Abouelfetouh A. Whole-Genome Sequencing of Staphylococcus aureus and Staphylococcus haemolyticus Clinical Isolates from Egypt. Microbiol Spectr 2022; 10:e0241321. [PMID: 35727037 PMCID: PMC9431571 DOI: 10.1128/spectrum.02413-21] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2021] [Accepted: 05/31/2022] [Indexed: 11/20/2022] Open
Abstract
Infections caused by antibiotic-resistant Staphylococcus are a global concern. This is true in the Middle East, where increasingly resistant Staphylococcus aureus and Staphylococcus haemolyticus strains have been detected. While extensive surveys have revealed the prevalence of infections caused by antibiotic-resistant staphylococci in Europe, Asia, and North America, the population structure of antibiotic-resistant staphylococci recovered from patients and clinical settings in Egypt remains uncharacterized. We performed whole-genome sequencing of 56 S. aureus and 10 S. haemolyticus isolates from Alexandria Main University Hospital; 46 of the S. aureus genomes and all 10 of the S. haemolyticus genomes carry mecA, which confers methicillin resistance. Supplemented with additional publicly available genomes from the other parts of the Middle East (34 S. aureus and 6 S. haemolyticus), we present the largest genomic study to date of staphylococcal isolates from the Middle East. These genomes include 20 S. aureus multilocus sequence types (MLST), including 3 new ones. They also include 9 S. haemolyticus MLSTs, including 1 new one. Phylogenomic analyses of each species' core genome largely mirrored those of the MLSTs, irrespective of geographical origin. The hospital-acquired spa t037/ST239-SCCmec III/MLST CC8 clone represented the largest clade, comprising 22% of the S. aureus isolates. Like S. aureus genome surveys of other regions, these isolates from the Middle East have an open pangenome, a strong indicator of gene exchange of virulence factors and antibiotic resistance genes with other reservoirs. Our genome analyses will inform antibiotic stewardship and infection control plans in the Middle East. IMPORTANCE Staphylococci are understudied despite their prevalence within the Middle East. Methicillin-resistant Staphylococcus aureus (MRSA) is endemic to hospitals in Egypt, as are other antibiotic-resistant strains of S. aureus and S. haemolyticus. To provide insight into the strains circulating in Egypt, we performed whole-genome sequencing of 56 S. aureus and 10 S. haemolyticus isolates from Alexandria Main University Hospital. Through analysis of these genomes, as well as all available S. aureus and S. haemolyticus genomes from the Middle East (n = 40), we were able to produce a picture of the diversity in this region more complete than those afforded by traditional molecular typing strategies. For example, we identified 4 new MLSTs. Most strains harbored genes associated with multidrug resistance, toxin production, biofilm formation, and immune evasion. These data provide invaluable insight for future antibiotic stewardship and infection control within the Middle East.
Collapse
Affiliation(s)
- Cesar Montelongo
- Department of Microbiology and Immunology, Stritch School of Medicine, Loyola University Chicago, Maywood, Illinois, USA
| | - Carine R. Mores
- Department of Microbiology and Immunology, Stritch School of Medicine, Loyola University Chicago, Maywood, Illinois, USA
| | - Catherine Putonti
- Department of Microbiology and Immunology, Stritch School of Medicine, Loyola University Chicago, Maywood, Illinois, USA
- Bioinformatics Program, Loyola University Chicago, Chicago, Illinois, USA
- Department of Biology, Loyola University Chicago, Chicago, Illinois, USA
| | - Alan J. Wolfe
- Department of Microbiology and Immunology, Stritch School of Medicine, Loyola University Chicago, Maywood, Illinois, USA
| | - Alaa Abouelfetouh
- Department of Microbiology and Immunology, Faculty of Pharmacy, Alexandria University, Alexandria, Egypt
- Department of Microbiology and Immunology, Faculty of Pharmacy, Alamein International University, Alamein, Egypt
| |
Collapse
|
15
|
Smith HG, Bean DC, Clarke RH, Loyn R, Larkins JA, Hassell C, Greenhill AR. Presence and antimicrobial resistance profiles of Escherichia coli, Enterococcusspp. and Salmonellasp. in 12 species of Australian shorebirds and terns. Zoonoses Public Health 2022; 69:615-624. [PMID: 35460193 PMCID: PMC9544147 DOI: 10.1111/zph.12950] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Revised: 03/13/2022] [Accepted: 04/05/2022] [Indexed: 11/27/2022]
Abstract
Antibiotic resistance is an ongoing threat to both human and animal health. Migratory birds are a potential vector for the spread of novel pathogens and antibiotic resistance genes. To date, there has been no comprehensive study investigating the presence of antibiotic resistance (AMR) in the bacteria of Australian shorebirds or terns. In the current study, 1022 individual birds representing 12 species were sampled across three states of Australia (Victoria, South Australia, and Western Australia) and tested for the presence of phenotypically resistant strains of three bacteria with potential to be zoonotic pathogens; Escherichia coli, Enterococcusspp., and Salmonellasp. In total, 206 E. coli, 266 Enterococcusspp., and 20 Salmonellasp. isolates were recovered, with AMR detected in 42% of E. coli, 85% of Enterococcusspp., and 10% of Salmonellasp. Phenotypic resistance was commonly detected to erythromycin (79% of Enterococcusspp.), ciprofloxacin (31% of Enterococcusspp.) and streptomycin (21% of E. coli). Resident birds were more likely to carry AMR bacteria than migratory birds (p ≤ .001). Bacteria isolated from shorebirds and terns are commonly resistant to at least one antibiotic, suggesting that wild bird populations serve as a potential reservoir and vector for AMR bacteria. However, globally emerging phenotypes of multidrug‐resistant bacteria were not detected in Australian shorebirds. This study provides baseline data of the carriage of AMR bacteria in Australian shorebirds and terns.
Collapse
Affiliation(s)
- Hannah G Smith
- Institute of Innovation, Science and Sustainability, Federation University, Churchill, Australia
| | - David C Bean
- Institute of Innovation, Science and Sustainability, Federation University, Churchill, Australia
| | - Rohan H Clarke
- School of Biological Sciences, Monash University, Melbourne, Victoria, Australia
| | - Richard Loyn
- School of Life Sciences, Centre for Freshwater Ecosystems, La Trobe University, Wodonga, Victoria, Australia.,Institute for Land, Water and Society, Charles Sturt University, Albury, New South Wales, Australia
| | - Jo-Ann Larkins
- Institute of Innovation, Science and Sustainability, Federation University, Churchill, Australia.,School of Science, Engineering and Information Technology, Federation University, Ballarat, Victoria, Australia
| | - Chris Hassell
- Global Flyway Network, Broome, Western Australia, Australia.,Australasian Wader Studies Group, Broome, Western Australia, Australia
| | - Andrew R Greenhill
- Institute of Innovation, Science and Sustainability, Federation University, Churchill, Australia
| |
Collapse
|
16
|
Wang S, Zhao C, Yin Y, Chen F, Chen H, Wang H. A Practical Approach for Predicting Antimicrobial Phenotype Resistance in Staphylococcus aureus Through Machine Learning Analysis of Genome Data. Front Microbiol 2022; 13:841289. [PMID: 35308374 PMCID: PMC8924536 DOI: 10.3389/fmicb.2022.841289] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Accepted: 02/11/2022] [Indexed: 11/28/2022] Open
Abstract
With the reduction in sequencing price and acceleration of sequencing speed, it is particularly important to directly link the genotype and phenotype of bacteria. Here, we firstly predicted the minimum inhibitory concentrations of ten antimicrobial agents for Staphylococcus aureus using 466 isolates by directly extracting k-mer from whole genome sequencing data combined with three machine learning algorithms: random forest, support vector machine, and XGBoost. Considering one two-fold dilution, the essential agreement and the category agreement could reach >85% and >90% for most antimicrobial agents. For clindamycin, cefoxitin and trimethoprim-sulfamethoxazole, the essential agreement and the category agreement could reach >91% and >93%, providing important information for clinical treatment. The successful prediction of cefoxitin resistance showed that the model could identify methicillin-resistant S. aureus. The results suggest that small datasets available in large hospitals could bypass the existing basic research and known antimicrobial resistance genes and accurately predict the bacterial phenotype.
Collapse
Affiliation(s)
- Shuyi Wang
- Institute of Medical Technology, Peking University Health Science Center, Beijing, China.,Department of Clinical Laboratory, Peking University People's Hospital, Beijing, China
| | - Chunjiang Zhao
- Department of Clinical Laboratory, Peking University People's Hospital, Beijing, China
| | - Yuyao Yin
- Department of Clinical Laboratory, Peking University People's Hospital, Beijing, China
| | - Fengning Chen
- Institute of Medical Technology, Peking University Health Science Center, Beijing, China.,Department of Clinical Laboratory, Peking University People's Hospital, Beijing, China
| | - Hongbin Chen
- Department of Clinical Laboratory, Peking University People's Hospital, Beijing, China
| | - Hui Wang
- Institute of Medical Technology, Peking University Health Science Center, Beijing, China.,Department of Clinical Laboratory, Peking University People's Hospital, Beijing, China
| |
Collapse
|
17
|
VanOeffelen M, Nguyen M, Aytan-Aktug D, Brettin T, Dietrich EM, Kenyon RW, Machi D, Mao C, Olson R, Pusch GD, Shukla M, Stevens R, Vonstein V, Warren AS, Wattam AR, Yoo H, Davis JJ. A genomic data resource for predicting antimicrobial resistance from laboratory-derived antimicrobial susceptibility phenotypes. Brief Bioinform 2021; 22:bbab313. [PMID: 34379107 PMCID: PMC8575023 DOI: 10.1093/bib/bbab313] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Revised: 06/18/2021] [Accepted: 07/20/2021] [Indexed: 11/14/2022] Open
Abstract
Antimicrobial resistance (AMR) is a major global health threat that affects millions of people each year. Funding agencies worldwide and the global research community have expended considerable capital and effort tracking the evolution and spread of AMR by isolating and sequencing bacterial strains and performing antimicrobial susceptibility testing (AST). For the last several years, we have been capturing these efforts by curating data from the literature and data resources and building a set of assembled bacterial genome sequences that are paired with laboratory-derived AST data. This collection currently contains AST data for over 67 000 genomes encompassing approximately 40 genera and over 100 species. In this paper, we describe the characteristics of this collection, highlighting areas where sampling is comparatively deep or shallow, and showing areas where attention is needed from the research community to improve sampling and tracking efforts. In addition to using the data to track the evolution and spread of AMR, it also serves as a useful starting point for building machine learning models for predicting AMR phenotypes. We demonstrate this by describing two machine learning models that are built from the entire dataset to show where the predictive power is comparatively high or low. This AMR metadata collection is freely available and maintained on the Bacterial and Viral Bioinformatics Center (BV-BRC) FTP site ftp://ftp.bvbrc.org/RELEASE_NOTES/PATRIC_genomes_AMR.txt.
Collapse
Affiliation(s)
| | - Marcus Nguyen
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Data Science and Learning Division, Argonne National Laboratory, Argonne, IL, USA
| | - Derya Aytan-Aktug
- National Food Institute, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Thomas Brettin
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Computing Environment and Life Sciences, Argonne National Laboratory, Argonne, IL, USA
| | - Emily M Dietrich
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Computing Environment and Life Sciences, Argonne National Laboratory, Argonne, IL, USA
| | - Ronald W Kenyon
- Biocomplexity Institute and Initiative, University of Virginia, Virginia, USA
| | - Dustin Machi
- Biocomplexity Institute and Initiative, University of Virginia, Virginia, USA
| | - Chunhong Mao
- Biocomplexity Institute and Initiative, University of Virginia, Virginia, USA
| | - Robert Olson
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Data Science and Learning Division, Argonne National Laboratory, Argonne, IL, USA
| | - Gordon D Pusch
- Fellowship for Interpretation of Genomes, Burr Ridge, IL, USA
| | - Maulik Shukla
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Data Science and Learning Division, Argonne National Laboratory, Argonne, IL, USA
| | - Rick Stevens
- Computing Environment and Life Sciences, Argonne National Laboratory, Argonne, IL, USA
- Department of Computer Science, University of Chicago, Chicago, IL, USA
| | | | - Andrew S Warren
- Biocomplexity Institute and Initiative, University of Virginia, Virginia, USA
| | - Alice R Wattam
- Data Science and Learning Division, Argonne National Laboratory, Argonne, IL, USA
- Biocomplexity Institute and Initiative, University of Virginia, Virginia, USA
| | - Hyunseung Yoo
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Data Science and Learning Division, Argonne National Laboratory, Argonne, IL, USA
| | - James J Davis
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Data Science and Learning Division, Argonne National Laboratory, Argonne, IL, USA
- Northwestern Argonne Institute for Science and Engineering, Evanston, IL, USA
| |
Collapse
|
18
|
Genomic Features Associated with the Degree of Phenotypic Resistance to Carbapenems in Carbapenem-Resistant Klebsiella pneumoniae. mSystems 2021; 6:e0019421. [PMID: 34519526 PMCID: PMC8547452 DOI: 10.1128/msystems.00194-21] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Carbapenem-resistant Klebsiella pneumoniae strains cause severe infections that are difficult to treat. The production of carbapenemases such as the K. pneumoniae carbapenemase (KPC) is a common mechanism by which these strains resist killing by the carbapenems. However, the degree of phenotypic carbapenem resistance (MIC) may differ markedly between isolates with similar carbapenemase genes, suggesting that our understanding of the underlying mechanisms of carbapenem resistance remains incomplete. To address this problem, we determined the whole-genome sequences of 166 K. pneumoniae clinical isolates resistant to meropenem, imipenem, or ertapenem. Multiple linear regression analysis of this collection of largely blaKPC-3-containing sequence type 258 (ST258) isolates indicated that blaKPC copy number and some outer membrane porin gene mutations were associated with higher MICs to carbapenems. A trend toward higher MICs was also observed with those blaKPC genes carried by the d isoform of Tn4401. In contrast, ompK37 mutations were associated with lower carbapenem MICs, and extended spectrum β-lactamase genes were not associated with higher or lower MICs in carbapenem-resistant K. pneumoniae. A machine learning approach based on the whole-genome sequences of these isolates did not result in a substantial improvement in prediction of isolates with high or low MICs. These results build upon previous findings suggesting that multiple factors influence the overall carbapenem resistance levels in carbapenem-resistant K. pneumoniae isolates. IMPORTANCEKlebsiella pneumoniae can cause severe infections in the blood, urinary tract, and lungs. Resistance to carbapenems in K. pneumoniae is an urgent public health threat, since it can make these isolates difficult to treat. While individual contributors to carbapenem resistance in K. pneumoniae have been studied, few reports explore their combined effects in clinical isolates. We sequenced 166 clinical carbapenem-resistant K. pneumoniae isolates to evaluate the contribution of known genes to carbapenem MICs and to try to identify novel genes associated with higher carbapenem MICs. The blaKPC copy number and some outer membrane porin gene mutations were associated with higher carbapenem MICs. In contrast, mutations in one specific porin, ompK37, were associated with lower carbapenem MICs. Machine learning did not result in a substantial improvement in the prediction of carbapenem resistance nor did it identify novel genes associated with carbapenem resistance. These findings enhance our understanding of the many contributors to carbapenem resistance in K. pneumoniae.
Collapse
|
19
|
Sanabria AM, Janice J, Hjerde E, Simonsen GS, Hanssen AM. Shotgun-metagenomics based prediction of antibiotic resistance and virulence determinants in Staphylococcus aureus from periprosthetic tissue on blood culture bottles. Sci Rep 2021; 11:20848. [PMID: 34675288 PMCID: PMC8531021 DOI: 10.1038/s41598-021-00383-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2021] [Accepted: 10/08/2021] [Indexed: 11/20/2022] Open
Abstract
Shotgun-metagenomics may give valuable clinical information beyond the detection of potential pathogen(s). Identification of antimicrobial resistance (AMR), virulence genes and typing directly from clinical samples has been limited due to challenges arising from incomplete genome coverage. We assessed the performance of shotgun-metagenomics on positive blood culture bottles (n = 19) with periprosthetic tissue for typing and prediction of AMR and virulence profiles in Staphylococcus aureus. We used different approaches to determine if sequence data from reads provides more information than from assembled contigs. Only 0.18% of total reads was derived from human DNA. Shotgun-metagenomics results and conventional method results were consistent in detecting S. aureus in all samples. AMR and known periprosthetic joint infection virulence genes were predicted from S. aureus. Mean coverage depth, when predicting AMR genes was 209 ×. Resistance phenotypes could be explained by genes predicted in the sample in most of the cases. The choice of bioinformatic data analysis approach clearly influenced the results, i.e. read-based analysis was more accurate for pathogen identification, while contigs seemed better for AMR profiling. Our study demonstrates high genome coverage and potential for typing and prediction of AMR and virulence profiles in S. aureus from shotgun-metagenomics data.
Collapse
Affiliation(s)
- Adriana Maria Sanabria
- Research Group for Host-Microbe Interaction, Department of Medical Biology, Faculty of Health Sciences, UiT - The Arctic University of Norway, Tromsø, Norway.
| | - Jessin Janice
- Research Group for Host-Microbe Interaction, Department of Medical Biology, Faculty of Health Sciences, UiT - The Arctic University of Norway, Tromsø, Norway
- Norwegian Advisory Unit on Detection of Antimicrobial Resistance, Department of Microbiology and Infection Control, University Hospital of North Norway, Tromsø, Norway
| | - Erik Hjerde
- Centre for Bioinformatics, Department of Chemistry, UiT - The Arctic University of Norway, Tromsø, Norway
| | - Gunnar Skov Simonsen
- Research Group for Host-Microbe Interaction, Department of Medical Biology, Faculty of Health Sciences, UiT - The Arctic University of Norway, Tromsø, Norway
- Department of Microbiology and Infection Control, University Hospital of North Norway, Tromsø, Norway
| | - Anne-Merethe Hanssen
- Research Group for Host-Microbe Interaction, Department of Medical Biology, Faculty of Health Sciences, UiT - The Arctic University of Norway, Tromsø, Norway.
| |
Collapse
|
20
|
Abstract
Antimicrobial resistance (AMR) is an important global health threat that impacts millions of people worldwide each year. Developing methods that can detect and predict AMR phenotypes can help to mitigate the spread of AMR by informing clinical decision making and appropriate mitigation strategies. Many bioinformatic methods have been developed for predicting AMR phenotypes from whole-genome sequences and AMR genes, but recent studies have indicated that predictions can be made from incomplete genome sequence data. In order to more systematically understand this, we built random forest-based machine learning classifiers for predicting susceptible and resistant phenotypes for Klebsiella pneumoniae (1,640 strains), Mycobacterium tuberculosis (2,497 strains), and Salmonella enterica (1,981 strains). We started by building models from alignments that were based on a reference chromosome for each species. We then subsampled each chromosomal alignment and built models for the resulting subalignments, finding that very small regions, representing approximately 0.1 to 0.2% of the chromosome, are predictive. In K. pneumoniae, M. tuberculosis, and S. enterica, the subalignments are able to predict multiple AMR phenotypes with at least 70% accuracy, even though most do not encode an AMR-related function. We used these models to identify regions of the chromosome with high and low predictive signals. Finally, subalignments that retain high accuracy across larger phylogenetic distances were examined in greater detail, revealing genes and intergenic regions with potential links to AMR, virulence, transport, and survival under stress conditions. IMPORTANCE Antimicrobial resistance causes thousands of deaths annually worldwide. Understanding the regions of the genome that are involved in antimicrobial resistance is important for developing mitigation strategies and preventing transmission. Machine learning models are capable of predicting antimicrobial resistance phenotypes from bacterial genome sequence data by identifying resistance genes, mutations, and other correlated features. They are also capable of implicating regions of the genome that have not been previously characterized as being involved in resistance. In this study, we generated global chromosomal alignments for Klebsiella pneumoniae, Mycobacterium tuberculosis, and Salmonella enterica and systematically searched them for small conserved regions of the genome that enable the prediction of antimicrobial resistance phenotypes. In addition to known antimicrobial resistance genes, this analysis identified genes involved in virulence and transport functions, as well as many genes with no previous implication in antimicrobial resistance.
Collapse
|