Gao Y, Li H, Zhao C, Li S, Yin G, Wang H. Machine learning and feature extraction for rapid antimicrobial resistance prediction of
Acinetobacter baumannii from whole-genome sequencing data.
Front Microbiol 2024;
14:1320312. [PMID:
38274740 PMCID:
PMC10808480 DOI:
10.3389/fmicb.2023.1320312]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Accepted: 12/22/2023] [Indexed: 01/27/2024] Open
Abstract
Background
Whole-genome sequencing (WGS) has contributed significantly to advancements in machine learning methods for predicting antimicrobial resistance (AMR). However, the comparisons of different methods for AMR prediction without requiring prior knowledge of resistance remains to be conducted.
Methods
We aimed to predict the minimum inhibitory concentrations (MICs) of 13 antimicrobial agents against Acinetobacter baumannii using three machine learning algorithms (random forest, support vector machine, and XGBoost) combined with k-mer features extracted from WGS data.
Results
A cohort of 339 isolates was used for model construction. The average essential agreement and category agreement of the best models exceeded 90.90% (95%CI, 89.03-92.77%) and 95.29% (95%CI, 94.91-95.67%), respectively; the exceptions being levofloxacin, minocycline and imipenem. The very major error rates ranged from 0.0 to 5.71%. We applied feature selection pipelines to extract the top-ranked 11-mers to optimise training time and computing resources. This approach slightly improved the prediction performance and enabled us to obtain prediction results within 10 min. Notably, when employing these top-ranked 11-mers in an independent test dataset (120 isolates), we achieved an average accuracy of 0.96.
Conclusion
Our study is the first to demonstrate that AMR prediction for A. baumannii using machine learning methods based on k-mer features has competitive performance over traditional workflows; hence, sequence-based AMR prediction and its application could be further promoted. The k-mer-based workflow developed in this study demonstrated high recall/sensitivity and specificity, making it a dependable tool for MIC prediction in clinical settings.
Collapse