1
|
Marini S, Barquero A, Wadhwani AA, Bian J, Ruiz J, Boucher C, Prosperi M. OCTOPUS: Disk-based, Multiplatform, Mobile-friendly Metagenomics Classifier. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2025; 2024:798-807. [PMID: 40417475 PMCID: PMC12099329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 05/27/2025]
Abstract
Portable genomic sequencers such as Oxford Nanopore's MinION enable real-time applications in clinical and environmental health. However, there is a bottleneck in the downstream analytics when bioinformatics pipelines are unavailable, e.g., when cloud processing is unreachable due to absence of Internet connection, or only low-end computing devices can be carried on site. Here we present a platform-friendly software for portable metagenomic analysis of Nanopore data, the Oligomer-based Classifier of Taxonomic Operational and Pan-genome Units via Singletons (OCTOPUS). OCTOPUS is written in Java, reimplements several features of the popular Kraken2 and KrakenUniq software, with original components for improving metagenomics classification on incomplete/sampled reference databases, making it ideal for running on smartphones or tablets. OCTOPUS obtains sensitivity and precision comparable to Kraken2, while dramatically decreasing (4- to 16-fold) the false positive rate, and yielding high correlation on real-word data. OCTOPUS is available along with customized databases at https://github.com/DataIntellSystLab/OCTOPUS and https://github.com/Ruiz-HCI-Lab/OctopusMobile.
Collapse
Affiliation(s)
- Simone Marini
- Department of Epidemiology, University of Florida, Gainesville, USA
- Emerging Pathogens Institute, University of Florida, Gainesville, USA
| | - Alexander Barquero
- Department of Computer and Information Science and Engineering, University of Florida, USA
| | - Anisha Ashok Wadhwani
- Department of Computer and Information Science and Engineering, University of Florida, USA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, University of Florida, USA
| | - Jaime Ruiz
- Department of Computer and Information Science and Engineering, University of Florida, USA
| | - Christina Boucher
- Department of Computer and Information Science and Engineering, University of Florida, USA
| | - Mattia Prosperi
- Department of Epidemiology, University of Florida, Gainesville, USA
| |
Collapse
|
2
|
Khalaf WS, Morgan RN, Elkhatib WF. Clinical microbiology and artificial intelligence: Different applications, challenges, and future prospects. J Microbiol Methods 2025; 232-234:107125. [PMID: 40188989 DOI: 10.1016/j.mimet.2025.107125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2024] [Revised: 03/10/2025] [Accepted: 04/03/2025] [Indexed: 04/10/2025]
Abstract
Conventional clinical microbiological techniques are enhanced by the introduction of artificial intelligence (AI). Comprehensive data processing and analysis enabled the development of curated datasets that has been effectively used in training different AI algorithms. Recently, a number of machine learning (ML) and deep learning (DL) algorithms are developed and evaluated using diverse microbiological datasets. These datasets included spectral analysis (Raman and MALDI-TOF spectroscopy), microscopic images (Gram and acid fast stains), and genomic and protein sequences (whole genome sequencing (WGS) and protein data banks (PDBs)). The primary objective of these algorithms is to minimize the time, effort, and expenses linked to conventional analytical methods. Furthermore, AI algorithms are incorporated with quantitative structure-activity relationship (QSAR) models to predict novel antimicrobial agents that address the continuing surge of antimicrobial resistance. During the COVID-19 pandemic, AI algorithms played a crucial role in vaccine developments and the discovery of new antiviral agents, and introduced potential drug candidates via drug repurposing. However, despite their significant benefits, the implementation of AI encounters various challenges, including ethical considerations, the potential for bias, and errors related to data training. This review seeks to provide an overview of the most recent applications of artificial intelligence in clinical microbiology, with the intention of educating a wider audience of clinical practitioners regarding the current uses of machine learning algorithms and encouraging their implementation. Furthermore, it will discuss the challenges related to the incorporation of AI into clinical microbiology laboratories and examine future opportunities for AI within the realm of infectious disease epidemiology.
Collapse
Affiliation(s)
- Wafaa S Khalaf
- Department of Microbiology and Immunology, Faculty of Pharmacy (Girls), Al-Azhar University, Nasr city, Cairo 11751, Egypt.
| | - Radwa N Morgan
- National Centre for Radiation Research and Technology (NCRRT), Drug Radiation Research Department, Egyptian Atomic Energy Authority (EAEA), Cairo 11787, Egypt.
| | - Walid F Elkhatib
- Department of Microbiology & Immunology, Faculty of Pharmacy, Galala University, New Galala City, Suez, Egypt; Microbiology and Immunology Department, Faculty of Pharmacy, Ain Shams University, African Union Organization St., Abbassia, Cairo 11566, Egypt.
| |
Collapse
|
3
|
Rancati S, Nicora G, Prosperi M, Bellazzi R, Salemi M, Marini S. Forecasting dominance of SARS-CoV-2 lineages by anomaly detection using deep AutoEncoders. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.10.24.563721. [PMID: 37961168 PMCID: PMC10634784 DOI: 10.1101/2023.10.24.563721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
The coronavirus disease of 2019 (COVID-19) pandemic is characterized by sequential emergence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants, lineages, and sublineages, outcompeting previously circulating ones because of, among other factors, increased transmissibility and immune escape. We propose DeepAutoCoV, an unsupervised deep learning anomaly detection system to predict future dominant lineages (FDLs). We define FDLs as viral (sub)lineages that will constitute more than 10% of all the viral sequences added to the GISAID database on a given week. DeepAutoCoV is trained and validated by assembling global and country-specific data sets from over 16 million Spike protein sequences sampled over a period of about 4 years. DeepAutoCoV successfully flags FDLs at very low frequencies (0.01% - 3%), with median lead times of 4-17 weeks, and predicts FDLs ~5 and ~25 times better than a baseline approach For example, the B.1.617.2 vaccine reference strain was flagged as FDL when its frequency was only 0.01%, more than a year before it was considered for an updated COVID-19 vaccine. Furthermore, DeepAutoCoV outputs interpretable results by pinpointing specific mutations potentially linked to increased fitness, and may provide significant insights for the optimization of public health pre-emptive intervention strategies.
Collapse
Affiliation(s)
- Simone Rancati
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Giovanna Nicora
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Mattia Prosperi
- Department of Epidemiology, College of Public Health and Health Professions, University of Florida, Gainesville, FL, USA
- Emerging Pathogens Institute, University of Florida, Gainesville, FL, USA
| | - Riccardo Bellazzi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Marco Salemi
- Emerging Pathogens Institute, University of Florida, Gainesville, FL, USA
- Department of Pathology, Immunology and Laboratory Medicine, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Simone Marini
- Department of Epidemiology, College of Public Health and Health Professions, University of Florida, Gainesville, FL, USA
- Emerging Pathogens Institute, University of Florida, Gainesville, FL, USA
| |
Collapse
|
4
|
Rancati S, Nicora G, Prosperi M, Bellazzi R, Salemi M, Marini S. Forecasting dominance of SARS-CoV-2 lineages by anomaly detection using deep AutoEncoders. Brief Bioinform 2024; 25:bbae535. [PMID: 39446192 PMCID: PMC11500442 DOI: 10.1093/bib/bbae535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2024] [Revised: 09/10/2024] [Accepted: 10/08/2024] [Indexed: 10/25/2024] Open
Abstract
The COVID-19 pandemic is marked by the successive emergence of new SARS-CoV-2 variants, lineages, and sublineages that outcompete earlier strains, largely due to factors like increased transmissibility and immune escape. We propose DeepAutoCoV, an unsupervised deep learning anomaly detection system, to predict future dominant lineages (FDLs). We define FDLs as viral (sub)lineages that will constitute >10% of all the viral sequences added to the GISAID, a public database supporting viral genetic sequence sharing, in a given week. DeepAutoCoV is trained and validated by assembling global and country-specific data sets from over 16 million Spike protein sequences sampled over a period of ~4 years. DeepAutoCoV successfully flags FDLs at very low frequencies (0.01%-3%), with median lead times of 4-17 weeks, and predicts FDLs between ~5 and ~25 times better than a baseline approach. For example, the B.1.617.2 vaccine reference strain was flagged as FDL when its frequency was only 0.01%, more than a year before it was considered for an updated COVID-19 vaccine. Furthermore, DeepAutoCoV outputs interpretable results by pinpointing specific mutations potentially linked to increased fitness and may provide significant insights for the optimization of public health 'pre-emptive' intervention strategies.
Collapse
Affiliation(s)
- Simone Rancati
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Via Adolfo Ferrata 5, Pavia, 27100, Italy
| | - Giovanna Nicora
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Via Adolfo Ferrata 5, Pavia, 27100, Italy
| | - Mattia Prosperi
- Department of Epidemiology, College of Public Health and Health Professions, University of Florida, 2004 Mowry Road, Gainesville, FL 32610, United States
- Emerging Pathogens Institute, University of Florida, 2055 Mowry Road, Gainesville, FL 32610, United States
| | - Riccardo Bellazzi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Via Adolfo Ferrata 5, Pavia, 27100, Italy
| | - Marco Salemi
- Emerging Pathogens Institute, University of Florida, 2055 Mowry Road, Gainesville, FL 32610, United States
- Department of Pathology, Immunology and Laboratory Medicine, College of Medicine, University of Florida, 1600 SW Archer Road, Gainesville, FL 32610, United States
| | - Simone Marini
- Department of Epidemiology, College of Public Health and Health Professions, University of Florida, 2004 Mowry Road, Gainesville, FL 32610, United States
- Emerging Pathogens Institute, University of Florida, 2055 Mowry Road, Gainesville, FL 32610, United States
| |
Collapse
|
5
|
Ko S, Kim J, Lim J, Lee SM, Park JY, Woo J, Scott-Nevros ZK, Kim JR, Yoon H, Kim D. Blanket antimicrobial resistance gene database with structural information, BOARDS, provides insights on historical landscape of resistance prevalence and effects of mutations in enzyme structure. mSystems 2024; 9:e0094323. [PMID: 38085058 PMCID: PMC10871167 DOI: 10.1128/msystems.00943-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 11/02/2023] [Indexed: 01/24/2024] Open
Abstract
Antimicrobial resistance (AMR) in pathogenic bacteria poses a significant threat to public health, yet there is still a need for development in the tools to deeply understand AMR genes based on genetic or structural information. In this study, we present an interactive web database named Blanket Overarching Antimicrobial-Resistance gene Database with Structural information (BOARDS, sbml.unist.ac.kr), a database that comprehensively includes 3,943 reported AMR gene information for 1,997 extended spectrum beta-lactamase (ESBL) and 1,946 other genes as well as a total of 27,395 predicted protein structures. These structures, which include both wild-type AMR genes and their mutants, were derived from 80,094 publicly available whole-genome sequences. In addition, we developed the rapid analysis and detection tool of antimicrobial-resistance (RADAR), a one-stop analysis pipeline to detect AMR genes across whole-genome sequencing (WGSs). By integrating BOARDS and RADAR, the AMR prevalence landscape for eight multi-drug resistant pathogens was reconstructed, leading to unexpected findings such as the pre-existence of the MCR genes before their official reports. Enzymatic structure prediction-based analysis revealed that the occurrence of mutations found in some ESBL genes was found to be closely related to the binding affinities with their antibiotic substrates. Overall, BOARDS can play a significant role in performing in-depth analysis on AMR.IMPORTANCEWhile the increasing antibiotic resistance (AMR) in pathogen has been a burden on public health, effective tools for deep understanding of AMR based on genetic or structural information remain limited. In this study, a blanket overarching antimicrobial-resistance gene database with structure information (BOARDS)-a web-based database that comprehensively collected AMR gene data with predictive protein structural information was constructed. Additionally, we report the development of a RADAR pipeline that can analyze whole-genome sequences as well. BOARDS, which includes sequence and structural information, has shown the historical landscape and prevalence of the AMR genes and can provide insight into single-nucleotide polymorphism effects on antibiotic degrading enzymes within protein structures.
Collapse
Affiliation(s)
- Seyoung Ko
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, South Korea
- School of Life Sciences, Ulsan National Institute of Science and Technology (UNIST), Ulsan, South Korea
| | - Jaehyung Kim
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, South Korea
| | - Jaewon Lim
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, South Korea
| | - Sang-Mok Lee
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, South Korea
| | - Joon Young Park
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, South Korea
| | - Jihoon Woo
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, South Korea
| | - Zoe K. Scott-Nevros
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, South Korea
| | - Jong R. Kim
- School of Engineering and Digital Sciences, Nazarbayev University, Astan, Kazakhstan
| | - Hyunjin Yoon
- Department of Molecular Science and Technology, Ajou University, Suwon, South Korea
| | - Donghyuk Kim
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, South Korea
- School of Life Sciences, Ulsan National Institute of Science and Technology (UNIST), Ulsan, South Korea
| |
Collapse
|