1
|
Marini S, Barquero A, Wadhwani AA, Bian J, Ruiz J, Boucher C, Prosperi M. OCTOPUS: Disk-based, Multiplatform, Mobile-friendly Metagenomics Classifier. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2025; 2024:798-807. [PMID: 40417475 PMCID: PMC12099329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 05/27/2025]
Abstract
Portable genomic sequencers such as Oxford Nanopore's MinION enable real-time applications in clinical and environmental health. However, there is a bottleneck in the downstream analytics when bioinformatics pipelines are unavailable, e.g., when cloud processing is unreachable due to absence of Internet connection, or only low-end computing devices can be carried on site. Here we present a platform-friendly software for portable metagenomic analysis of Nanopore data, the Oligomer-based Classifier of Taxonomic Operational and Pan-genome Units via Singletons (OCTOPUS). OCTOPUS is written in Java, reimplements several features of the popular Kraken2 and KrakenUniq software, with original components for improving metagenomics classification on incomplete/sampled reference databases, making it ideal for running on smartphones or tablets. OCTOPUS obtains sensitivity and precision comparable to Kraken2, while dramatically decreasing (4- to 16-fold) the false positive rate, and yielding high correlation on real-word data. OCTOPUS is available along with customized databases at https://github.com/DataIntellSystLab/OCTOPUS and https://github.com/Ruiz-HCI-Lab/OctopusMobile.
Collapse
Affiliation(s)
- Simone Marini
- Department of Epidemiology, University of Florida, Gainesville, USA
- Emerging Pathogens Institute, University of Florida, Gainesville, USA
| | - Alexander Barquero
- Department of Computer and Information Science and Engineering, University of Florida, USA
| | - Anisha Ashok Wadhwani
- Department of Computer and Information Science and Engineering, University of Florida, USA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, University of Florida, USA
| | - Jaime Ruiz
- Department of Computer and Information Science and Engineering, University of Florida, USA
| | - Christina Boucher
- Department of Computer and Information Science and Engineering, University of Florida, USA
| | - Mattia Prosperi
- Department of Epidemiology, University of Florida, Gainesville, USA
| |
Collapse
|
2
|
Alsharksi AN, Sirekbasan S, Gürkök-Tan T, Mustapha A. From Tradition to Innovation: Diverse Molecular Techniques in the Fight Against Infectious Diseases. Diagnostics (Basel) 2024; 14:2876. [PMID: 39767237 PMCID: PMC11674978 DOI: 10.3390/diagnostics14242876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2024] [Revised: 11/15/2024] [Accepted: 12/17/2024] [Indexed: 01/02/2025] Open
Abstract
Infectious diseases impose a significant burden on global health systems due to high morbidity and mortality rates. According to the World Health Organization, millions die from infectious diseases annually, often due to delays in accurate diagnosis. Traditional diagnostic methods in clinical microbiology, primarily culture-based techniques, are time-consuming and may fail with hard-to-culture pathogens. Molecular biology advancements, notably the polymerase chain reaction (PCR), have revolutionized infectious disease diagnostics by allowing rapid and sensitive detection of pathogens' genetic material. PCR has become the gold standard for many infections, particularly highlighted during the COVID-19 pandemic. Following PCR, next-generation sequencing (NGS) has emerged, enabling comprehensive genomic analysis of pathogens, thus facilitating the detection of new strains and antibiotic resistance tracking. Innovative approaches like CRISPR technology are also enhancing diagnostic precision by identifying specific DNA/RNA sequences. However, the implementation of these methods faces challenges, particularly in low- and middle-income countries due to infrastructural and financial constraints. This review will explore the role of molecular diagnostic methods in infectious disease diagnosis, comparing their advantages and limitations, with a focus on PCR and NGS technologies and their future potential.
Collapse
Affiliation(s)
- Ahmed Nouri Alsharksi
- Department of Microbiology, Faculty of Medicine, Misurata University, Misrata 93FH+66F, Libya;
| | - Serhat Sirekbasan
- Department of Medical Laboratory Techniques, Şabanözü Vocational School, Çankırı Karatekin University, Çankırı 18650, Turkey
| | - Tuğba Gürkök-Tan
- Department of Field Crops, Food and Agriculture Vocational School, Çankırı Karatekin University, Çankırı 18100, Turkey;
| | - Adam Mustapha
- Department of Microbiology, Faculty of Life Sciences, University of Maiduguri, Maiduguri 600104, Nigeria;
| |
Collapse
|
3
|
Marini S, Barquero A, Wadhwani AA, Bian J, Ruiz J, Boucher C, Prosperi M. OCTOPUS: Disk-based, Multiplatform, Mobile-friendly Metagenomics Classifier. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.15.585215. [PMID: 38559026 PMCID: PMC10979967 DOI: 10.1101/2024.03.15.585215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Portable genomic sequencers such as Oxford Nanopore's MinION enable real-time applications in clinical and environmental health. However, there is a bottleneck in the downstream analytics when bioinformatics pipelines are unavailable, e.g., when cloud processing is unreachable due to absence of Internet connection, or only low-end computing devices can be carried on site. Here we present a platform-friendly software for portable metagenomic analysis of Nanopore data, the Oligomer-based Classifier of Taxonomic Operational and Pan-genome Units via Singletons (OCTOPUS). OCTOPUS is written in Java, reimplements several features of the popular Kraken2 and KrakenUniq software, with original components for improving metagenomics classification on incomplete/sampled reference databases, making it ideal for running on smartphones or tablets. OCTOPUS obtains sensitivity and precision comparable to Kraken2, while dramatically decreasing (4- to 16-fold) the false positive rate, and yielding high correlation on real-word data. OCTOPUS is available along with customized databases at https://github.com/DataIntellSystLab/OCTOPUS and https://github.com/Ruiz-HCI-Lab/OctopusMobile.
Collapse
Affiliation(s)
- Simone Marini
- Department of Epidemiology, University of Florida, Gainesville, USA
- Emerging Pathogens Institute, University of Florida, Gainesville, USA
| | - Alexander Barquero
- Department of Computer and Information Science and Engineering, University of Florida, USA
| | - Anisha Ashok Wadhwani
- Department of Computer and Information Science and Engineering, University of Florida, USA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, University of Florida, USA
| | - Jaime Ruiz
- Department of Computer and Information Science and Engineering, University of Florida, USA
| | - Christina Boucher
- Department of Computer and Information Science and Engineering, University of Florida, USA
| | - Mattia Prosperi
- Department of Epidemiology, University of Florida, Gainesville, USA
| |
Collapse
|
4
|
Kuzdraliński A, Miśkiewicz M, Szczerba H, Mazurczyk W, Nivala J, Księżopolski B. Unlocking the potential of DNA-based tagging: current market solutions and expanding horizons. Nat Commun 2023; 14:6052. [PMID: 37770439 PMCID: PMC10539344 DOI: 10.1038/s41467-023-41728-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Accepted: 09/18/2023] [Indexed: 09/30/2023] Open
Affiliation(s)
- Adam Kuzdraliński
- Department of Cybersecurity and Cybereducation, Faculty of Information Technology, Polish-Japanese Academy of Information Technology, Warsaw, Mazowieckie, 02-008, Poland.
| | - Marek Miśkiewicz
- Institute of Computer Science, University of Maria Curie-Skłodowska, Akademicka 9, 20-033, Lublin, Poland
| | - Hubert Szczerba
- Department of Biotechnology, Microbiology and Human Nutrition, Faculty of Food Science and Biotechnology, University of Life Sciences in Lublin, 8 Skromna St., 20-704, Lublin, Poland.
- Joint BioEnergy Institute, Emeryville, CA, 94608, USA.
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.
| | - Wojciech Mazurczyk
- Institute of Computer Science, Faculty of Electronics and Information Technology, Warsaw University of Technology, Warsaw, Nowowiejska 15/19, 00-665, Warsaw, Poland
- Parallelism and VLSI Group, Faculty of Mathematics and Computer Science, FernUniversität in Hagen, Universitätsstr. 1, 58097, Hagen, Germany
| | - Jeff Nivala
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
- Molecular Engineering and Sciences Institute, University of Washington, Seattle, WA, USA
| | - Bogdan Księżopolski
- Department of Cybersecurity and Cybereducation, Faculty of Information Technology, Polish-Japanese Academy of Information Technology, Warsaw, Mazowieckie, 02-008, Poland
| |
Collapse
|
5
|
Yu R, Abdullah SMU, Sun Y. HMMPolish: a coding region polishing tool for TGS-sequenced RNA viruses. Brief Bioinform 2023; 24:bbad264. [PMID: 37478372 PMCID: PMC10516367 DOI: 10.1093/bib/bbad264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2023] [Revised: 06/05/2023] [Accepted: 06/29/2023] [Indexed: 07/23/2023] Open
Abstract
Access to accurate viral genomes is important to downstream data analysis. Third-generation sequencing (TGS) has recently become a popular platform for virus sequencing because of its long read length. However, its per-base error rate, which is higher than next-generation sequencing, can lead to genomes with errors. Polishing tools are thus needed to correct errors either before or after sequence assembly. Despite promising results of available polishing tools, there is still room to improve the error correction performance to perform more accurate genome assembly. The errors, particularly those in coding regions, can hamper analysis such as linage identification and variant monitoring. In this work, we developed a novel pipeline, HMMPolish, for correcting (polishing) errors in protein-coding regions of known RNA viruses. This tool can be applied to either raw TGS reads or the assembled sequences of the target virus. By utilizing profile Hidden Markov Models of protein families/domains in known viruses, HMMPolish can correct errors that are ignored by available polishers. We extensively validated HMMPolish on 34 datasets that covered four clinically important viruses, including HIV-1, influenza-A, norovirus, and severe acute respiratory syndrome coronavirus 2. These datasets contain reads with different properties, such as sequencing depth and platforms (PacBio or Nanopore). The benchmark results against popular/representative polishers show that HMMPolish competes favorably on error correction in coding regions of known RNA viruses.
Collapse
Affiliation(s)
- Runzhou Yu
- Electrical Engineering, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, China
| | | | - Yanni Sun
- Electrical Engineering, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, China
| |
Collapse
|
6
|
Mikalsen AJ, Zola J. Coriolis: enabling metagenomic classification on lightweight mobile devices. Bioinformatics 2023; 39:i66-i75. [PMID: 37387129 DOI: 10.1093/bioinformatics/btad243] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION The introduction of portable DNA sequencers such as the Oxford Nanopore Technologies MinION has enabled real-time and in the field DNA sequencing. However, in the field sequencing is actionable only when coupled with in the field DNA classification. This poses new challenges for metagenomic software since mobile deployments are typically in remote locations with limited network connectivity and without access to capable computing devices. RESULTS We propose new strategies to enable in the field metagenomic classification on mobile devices. We first introduce a programming model for expressing metagenomic classifiers that decomposes the classification process into well-defined and manageable abstractions. The model simplifies resource management in mobile setups and enables rapid prototyping of classification algorithms. Next, we introduce the compact string B-tree, a practical data structure for indexing text in external storage, and we demonstrate its viability as a strategy to deploy massive DNA databases on memory-constrained devices. Finally, we combine both solutions into Coriolis, a metagenomic classifier designed specifically to operate on lightweight mobile devices. Through experiments with actual MinION metagenomic reads and a portable supercomputer-on-a-chip, we show that compared with the state-of-the-art solutions Coriolis offers higher throughput and lower resource consumption without sacrificing quality of classification. AVAILABILITY AND IMPLEMENTATION Source code and test data are available from http://score-group.org/?id=smarten.
Collapse
Affiliation(s)
- Andrew J Mikalsen
- Department of Computer Science and Engineering, University at Buffalo, Buffalo, NY 14260, United States
| | - Jaroslaw Zola
- Department of Computer Science and Engineering, University at Buffalo, Buffalo, NY 14260, United States
| |
Collapse
|
7
|
Marini S, Boucher C, Noyes N, Prosperi M. The K-mer antibiotic resistance gene variant analyzer (KARGVA). Front Microbiol 2023; 14:1060891. [PMID: 36960290 PMCID: PMC10027697 DOI: 10.3389/fmicb.2023.1060891] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Accepted: 02/08/2023] [Indexed: 03/09/2023] Open
Abstract
Characterization of antibiotic resistance genes (ARGs) from high-throughput sequencing data of metagenomics and cultured bacterial samples is a challenging task, with the need to account for both computational (e.g., string algorithms) and biological (e.g., gene transfers, rearrangements) aspects. Curated ARG databases exist together with assorted ARG classification approaches (e.g., database alignment, machine learning). Besides ARGs that naturally occur in bacterial strains or are acquired through mobile elements, there are chromosomal genes that can render a bacterium resistant to antibiotics through point mutations, i.e., ARG variants (ARGVs). While ARG repositories also collect ARGVs, there are only a few tools that are able to identify ARGVs from metagenomics and high throughput sequencing data, with a number of limitations (e.g., pre-assembly, a posteriori verification of mutations, or specification of species). In this work we present the k-mer, i.e., strings of fixed length k, ARGV analyzer - KARGVA - an open-source, multi-platform tool that provides: (i) an ad hoc, large ARGV database derived from multiple sources; (ii) input capability for various types of high-throughput sequencing data; (iii) a three-way, hash-based, k-mer search setup to process data efficiently, linking k-mers to ARGVs, k-mers to point mutations, and ARGVs to k-mers, respectively; (iv) a statistical filter on sequence classification to reduce type I and II errors. On semi-synthetic data, KARGVA provides very high accuracy even in presence of high sequencing errors or mutations (99.2 and 86.6% accuracy within 1 and 5% base change rates, respectively), and genome rearrangements (98.2% accuracy), with robust performance on ad hoc false positive sets. On data from the worldwide MetaSUB consortium, comprising 3,700+ metagenomics experiments, KARGVA identifies more ARGVs than Resistance Gene Identifier (4.8x) and PointFinder (6.8x), yet all predictions are below the expected false positive estimates. The prevalence of ARGVs is correlated to ARGs but ecological characteristics do not explain well ARGV variance. KARGVA is publicly available at https://github.com/DataIntellSystLab/KARGVA under MIT license.
Collapse
Affiliation(s)
- Simone Marini
- Department of Epidemiology, University of Florida, Gainesville, FL, United States
- Department of Pathology, University of Florida, Gainesville, FL, United States
| | - Christina Boucher
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, FL, United States
| | - Noelle Noyes
- Department of Veterinary Population Medicine, University of Minnesota, St. Paul, MN, United States
| | - Mattia Prosperi
- Department of Epidemiology, University of Florida, Gainesville, FL, United States
- *Correspondence: Mattia Prosperi,
| |
Collapse
|
8
|
Barquero A, Marini S, Boucher C, Ruiz J, Prosperi M. KARGAMobile: Android app for portable, real-time, easily interpretable analysis of antibiotic resistance genes via nanopore sequencing. Front Bioeng Biotechnol 2022; 10:1016408. [PMID: 36324897 PMCID: PMC9618647 DOI: 10.3389/fbioe.2022.1016408] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 09/27/2022] [Indexed: 02/03/2023] Open
Abstract
Nanopore technology enables portable, real-time sequencing of microbial populations from clinical and ecological samples. An emerging healthcare application for Nanopore includes point-of-care, timely identification of antibiotic resistance genes (ARGs) to help developing targeted treatments of bacterial infections, and monitoring resistant outbreaks in the environment. While several computational tools exist for classifying ARGs from sequencing data, to date (2022) none have been developed for mobile devices. We present here KARGAMobile, a mobile app for portable, real-time, easily interpretable analysis of ARGs from Nanopore sequencing. KARGAMobile is the porting of an existing ARG identification tool named KARGA; it retains the same algorithmic structure, but it is optimized for mobile devices. Specifically, KARGAMobile employs a compressed ARG reference database and different internal data structures to save RAM usage. The KARGAMobile app features a friendly graphical user interface that guides through file browsing, loading, parameter setup, and process execution. More importantly, the output files are post-processed to create visual, printable and shareable reports, aiding users to interpret the ARG findings. The difference in classification performance between KARGAMobile and KARGA is minimal (96.2% vs. 96.9% f-measure on semi-synthetic datasets of 1 million reads with known resistance ground truth). Using real Nanopore experiments, KARGAMobile processes on average 1 GB data every 23-48 min (targeted sequencing - metagenomics), with peak RAM usage below 500MB, independently from input file sizes, and an average temperature of 49°C after 1 h of continuous data processing. KARGAMobile is written in Java and is available at https://github.com/Ruiz-HCI-Lab/KargaMobile under the MIT license.
Collapse
Affiliation(s)
- Alexander Barquero
- Department of Computer Science and Information and Engineering, University of Florida, Gainesville, FL, United States
| | - Simone Marini
- Department of Epidemiology, University of Florida, Gainesville, FL, United States
- Department of Pathology, University of Florida, Gainesville, FL, United States
| | - Christina Boucher
- Department of Computer Science and Information and Engineering, University of Florida, Gainesville, FL, United States
| | - Jaime Ruiz
- Department of Computer Science and Information and Engineering, University of Florida, Gainesville, FL, United States
| | - Mattia Prosperi
- Department of Epidemiology, University of Florida, Gainesville, FL, United States
| |
Collapse
|
9
|
Rodríguez-Pérez H, Ciuffreda L, Flores C. NanoRTax, a real-time pipeline for taxonomic and diversity analysis of nanopore 16S rRNA amplicon sequencing data. Comput Struct Biotechnol J 2022; 20:5350-5354. [PMID: 36212537 PMCID: PMC9522874 DOI: 10.1016/j.csbj.2022.09.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Revised: 09/16/2022] [Accepted: 09/16/2022] [Indexed: 11/05/2022] Open
Abstract
Background The study of microbial communities and their applications have been leveraged by advances in sequencing techniques and bioinformatics tools. The Oxford Nanopore Technologies long-read sequencing by nanopores provides a portable and cost-efficient platform for sequencing assays. While this opens the possibility of sequencing applications outside specialized environments and real-time analysis of data, complementing the existing efficient library preparation protocols with streamlined bioinformatic workflows is required. Results Here we present NanoRTax, a Nextflow pipeline for nanopore 16S rRNA gene amplicon data that features state-of-the-art taxonomic classification tools and real-time capability. The pipeline is paired with a web-based visual interface to enable user-friendly inspections of the experiment in progress. NanoRTax workflow and a simulated real-time analysis were used to validate the prediction of adult Intensive Care Unit patient mortality based on full-length 16S rRNA sequencing data from respiratory microbiome samples. Conclusions This constitutes a proof-of-concept simulation study of how real-time bioinformatic workflows could be used to shorten the turnaround times in critical care settings and provides an instrument for future research on early-response strategies for sepsis.
Collapse
Affiliation(s)
- Héctor Rodríguez-Pérez
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria, Santa Cruz de Tenerife 38010, Spain
| | - Laura Ciuffreda
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria, Santa Cruz de Tenerife 38010, Spain
| | - Carlos Flores
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria, Santa Cruz de Tenerife 38010, Spain
- CIBER de Enfermedades Respiratorias, Instituto de Salud Carlos III, Madrid 28029, Spain
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), 38600 Granadilla, Santa Cruz de Tenerife, Spain
- Facultad de Ciencias de la Salud, Universidad Fernando de Pessoa Canarias, 35450 Las Palmas de Gran Canaria, Spain
| |
Collapse
|
10
|
Comparing the significance of the utilization of next generation and third generation sequencing technologies in microbial metagenomics. Microbiol Res 2022; 264:127154. [DOI: 10.1016/j.micres.2022.127154] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Revised: 07/05/2022] [Accepted: 07/29/2022] [Indexed: 01/07/2023]
|
11
|
Marini S, Oliva M, Slizovskiy IB, Das RA, Noyes NR, Kahveci T, Boucher C, Prosperi M. AMR-meta: a k-mer and metafeature approach to classify antimicrobial resistance from high-throughput short-read metagenomics data. Gigascience 2022; 11:giac029. [PMID: 35583675 PMCID: PMC9116207 DOI: 10.1093/gigascience/giac029] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 01/27/2022] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Antimicrobial resistance (AMR) is a global health concern. High-throughput metagenomic sequencing of microbial samples enables profiling of AMR genes through comparison with curated AMR databases. However, the performance of current methods is often hampered by database incompleteness and the presence of homology/homoplasy with other non-AMR genes in sequenced samples. RESULTS We present AMR-meta, a database-free and alignment-free approach, based on k-mers, which combines algebraic matrix factorization into metafeatures with regularized regression. Metafeatures capture multi-level gene diversity across the main antibiotic classes. AMR-meta takes in reads from metagenomic shotgun sequencing and outputs predictions about whether those reads contribute to resistance against specific classes of antibiotics. In addition, AMR-meta uses an augmented training strategy that joins an AMR gene database with non-AMR genes (used as negative examples). We compare AMR-meta with AMRPlusPlus, DeepARG, and Meta-MARC, further testing their ensemble via a voting system. In cross-validation, AMR-meta has a median f-score of 0.7 (interquartile range, 0.2-0.9). On semi-synthetic metagenomic data-external test-on average AMR-meta yields a 1.3-fold hit rate increase over existing methods. In terms of run-time, AMR-meta is 3 times faster than DeepARG, 30 times faster than Meta-MARC, and as fast as AMRPlusPlus. Finally, we note that differences in AMR ontologies and observed variance of all tools in classification outputs call for further development on standardization of benchmarking data and protocols. CONCLUSIONS AMR-meta is a fast, accurate classifier that exploits non-AMR negative sets to improve sensitivity and specificity. The differences in AMR ontologies and the high variance of all tools in classification outputs call for the deployment of standard benchmarking data and protocols, to fairly compare AMR prediction tools.
Collapse
Affiliation(s)
- Simone Marini
- Department of Computer and Information Science and Engineering, University of Florida, 2004 Mowry Road Gainesville, FL 32610, USA
| | - Marco Oliva
- Department of Computer and Information Science and Engineering, University of Florida, 432 Newell Dr, Gainesville, FL 32611, USA
| | - Ilya B Slizovskiy
- Department of Veterinary Population Medicine, University of Minnesota, 1365 Gortner Avenue 225, St. Paul, MN 55108, USA
| | - Rishabh A Das
- Department of Computer and Information Science and Engineering, University of Florida, 2004 Mowry Road Gainesville, FL 32610, USA
| | - Noelle Robertson Noyes
- Department of Veterinary Population Medicine, University of Minnesota, 1365 Gortner Avenue 225, St. Paul, MN 55108, USA
| | - Tamer Kahveci
- Department of Computer and Information Science and Engineering, University of Florida, 432 Newell Dr, Gainesville, FL 32611, USA
| | - Christina Boucher
- Department of Computer and Information Science and Engineering, University of Florida, 432 Newell Dr, Gainesville, FL 32611, USA
| | - Mattia Prosperi
- Department of Computer and Information Science and Engineering, University of Florida, 2004 Mowry Road Gainesville, FL 32610, USA
| |
Collapse
|
12
|
Serretti A. Precision medicine in mood disorders. PCN REPORTS : PSYCHIATRY AND CLINICAL NEUROSCIENCES 2022; 1:e1. [PMID: 38868801 PMCID: PMC11114272 DOI: 10.1002/pcn5.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 11/09/2021] [Accepted: 12/05/2021] [Indexed: 06/14/2024]
Abstract
The choice of the most appropriate psychoactive medication for each of our patients is always a challenge. We can use more than 100 psychoactive drugs in the treatment of mood disorders, which can be prescribed either alone or in combination. Response and tolerability problems are common, and much trial and error is often needed before achieving a satisfactory outcome. Precision medicine is therefore needed for tailoring treatment to optimize outcome. Pharmacological, clinical, and demographic factors are important and informative, but biological factors may further inform and refine prediction. Twenty years after the first reports of gene variants modulating antidepressant response, we are now confronted with the prospect of routine clinical pharmacogenetic applications in the treatment of depression. The scientific community is divided into two camps: those who are enthusiastic and those who are skeptical. Although it appears clear that the benefit of existing tools is still not completely defined, at least in the case of central nervous system gene variants, this is not the case for metabolic gene variants, which is generally accepted. Cumulative scores encompassing many variants across the entire genome will soon predict psychiatric disorder liability and outcome. At present, precision medicine in mood disorders may be implemented using clinical and pharmacokinetic factors. In the near future, a genome-wide composite genetic score in conjunction with clinical factors within each patient is the most promising approach for developing a more effective way to target treatment for patients suffering from mood disorders.
Collapse
Affiliation(s)
- Alessandro Serretti
- Department of Biomedical and NeuroMotor SciencesUniversity of BolognaBolognaItaly
| |
Collapse
|
13
|
Pucker B, Irisarri I, de Vries J, Xu B. Plant genome sequence assembly in the era of long reads: Progress, challenges and future directions. QUANTITATIVE PLANT BIOLOGY 2022; 3:e5. [PMID: 37077982 PMCID: PMC10095996 DOI: 10.1017/qpb.2021.18] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 11/24/2021] [Accepted: 12/21/2021] [Indexed: 05/03/2023]
Abstract
Third-generation long-read sequencing is transforming plant genomics. Oxford Nanopore Technologies and Pacific Biosciences are offering competing long-read sequencing technologies and enable plant scientists to investigate even large and complex plant genomes. Sequencing projects can be conducted by single research groups and sequences of smaller plant genomes can be completed within days. This also resulted in an increased investigation of genomes from multiple species in large scale to address fundamental questions associated with the origin and evolution of land plants. Increased accessibility of sequencing devices and user-friendly software allows more researchers to get involved in genomics. Current challenges are accurately resolving diploid or polyploid genome sequences and better accounting for the intra-specific diversity by switching from the use of single reference genome sequences to a pangenome graph.
Collapse
Affiliation(s)
- Boas Pucker
- Department of Plant Sciences, University of Cambridge, Cambridge, United Kingdom
- Institute of Plant Biology & Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
- Author for correspondence: Boas Pucker E-mail:
| | - Iker Irisarri
- Department of Applied Bioinformatics, Institute for Microbiology and Genetics, University of Goettingen, Göttingen, Germany
- Campus Institute Data Science (CIDAS), University of Goettingen, Göttingen, Germany
| | - Jan de Vries
- Department of Applied Bioinformatics, Institute for Microbiology and Genetics, University of Goettingen, Göttingen, Germany
- Campus Institute Data Science (CIDAS), University of Goettingen, Göttingen, Germany
- Department of Applied Bioinformatics, Göttingen Center for Molecular Biosciences (GZMB), University of Goettingen, Göttingen, Germany
| | - Bo Xu
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
14
|
Prosperi M, Marini S. KARGA: Multi-platform Toolkit for k-mer-based Antibiotic Resistance Gene Analysis of High-throughput Sequencing Data. ... IEEE-EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS. IEEE-EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS 2021; 2021:10.1109/bhi50953.2021.9508479. [PMID: 34447942 PMCID: PMC8383893 DOI: 10.1109/bhi50953.2021.9508479] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
High-throughput sequencing is widely used for strain detection and characterization of antibiotic resistance in microbial metagenomic samples. Current analytical tools use curated antibiotic resistance gene (ARG) databases to classify individual sequencing reads or assembled contigs. However, identifying ARGs from raw read data can be time consuming (especially if assembly or alignment is required) and challenging, due to genome rearrangements and mutations. Here, we present the k-mer-based antibiotic gene resistance analyzer (KARGA), a multi-platform Java toolkit for identifying ARGs from metagenomic short read data. KARGA does not perform alignment; it uses an efficient double-lookup strategy, statistical filtering on false positives, and provides individual read classification as well as covering of the database resistome. On simulated data, KARGA's antibiotic resistance class recall is 99.89% for error/mutation rates within 10%, and of 83.37% for error/mutation rates between 10% and 25%, while it is 99.92% on ARGs with rearrangements. On empirical data, KARGA provides higher hit score (≥1.5-fold) than AMRPlusPlus, DeepARG, and MetaMARC. KARGA has also faster runtimes than all other tools (2x faster than AMRPlusPlus, 7x than DeepARG, and over 100x than MetaMARC). KARGA is available under the MIT license at https://github.com/DataIntellSystLab/KARGA.
Collapse
Affiliation(s)
- Mattia Prosperi
- Data Intelligence Systems Lab, Department of Epidemiology, College of Public Health and Health Professions, University of Florida, Gainesville, FL, USA
| | - Simone Marini
- Data Intelligence Systems Lab, Department of Epidemiology, College of Public Health and Health Professions, University of Florida, Gainesville, FL, USA
| |
Collapse
|
15
|
Boucher C, Gagie T, Tomohiro I, Köppl D, Langmead B, Manzini G, Navarro G, Pacheco A, Rossi M. PHONI: Streamed Matching Statistics with Multi-Genome References. PROCEEDINGS. DATA COMPRESSION CONFERENCE 2021; 2021:193-202. [PMID: 34778549 PMCID: PMC8583545 DOI: 10.1109/dcc50243.2021.00027] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Computing the matching statistics of patterns with respect to a text is a fundamental task in bioinformatics, but a formidable one when the text is a highly compressed genomic database. Bannai et al. gave an efficient solution for this case, which Rossi et al. recently implemented, but it uses two passes over the patterns and buffers a pointer for each character during the first pass. In this paper, we simplify their solution and make it streaming, at the cost of slowing it down slightly. This means that, first, we can compute the matching statistics of several long patterns (such as whole human chromosomes) in parallel while still using a reasonable amount of RAM; second, we can compute matching statistics online with low latency and thus quickly recognize when a pattern becomes incompressible relative to the database. Our code is available at https://github.com/koeppl/phoni.
Collapse
|
16
|
Huo W, Ling W, Wang Z, Li Y, Zhou M, Ren M, Li X, Li J, Xia Z, Liu X, Huang X. Miniaturized DNA Sequencers for Personal Use: Unreachable Dreams or Achievable Goals. FRONTIERS IN NANOTECHNOLOGY 2021. [DOI: 10.3389/fnano.2021.628861] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
The appearance of next generation sequencing technology that features short read length with high measurement throughput and low cost has revolutionized the field of life science, medicine, and even computer science. The subsequent development of the third-generation sequencing technologies represented by nanopore and zero-mode waveguide techniques offers even higher speed and long read length with promising applications in portable and rapid genomic tests in field. Especially under the current circumstances, issues such as public health emergencies and global pandemics impose soaring demand on quick identification of origins and species of analytes through DNA sequences. In addition, future development of disease diagnosis, treatment, and tracking techniques may also require frequent DNA testing. As a result, DNA sequencers with miniaturized size and highly integrated components for personal and portable use to tackle increasing needs for disease prevention, personal medicine, and biohazard protection may become future trends. Just like many other biological and medical analytical systems that were originally bulky in sizes, collaborative work from various subjects in engineering and science eventually leads to the miniaturization of these systems. DNA sequencers that involve nanoprobes, detectors, microfluidics, microelectronics, and circuits as well as complex functional materials and structures are extremely complicated but may be miniaturized with technical advancement. This paper reviews the state-of-the-art technology in developing essential components in DNA sequencers and analyzes the feasibility to achieve miniaturized DNA sequencers for personal use. Future perspectives on the opportunities and associated challenges for compact DNA sequencers are also identified.
Collapse
|
17
|
Prendergast SC, Strobl A, Cross W, Pillay N, Strauss SJ, Ye H, Lindsay D, Tirabosco R, Chalker J, Mahamdallie SS, Sosinsky A, RNOH Pathology Laboratory and Biobank Team, Genomics England Research Consortium, Flanagan AM, Amary F. Sarcoma and the 100,000 Genomes Project: our experience and changes to practice. J Pathol Clin Res 2020; 6:297-307. [PMID: 32573957 PMCID: PMC7578291 DOI: 10.1002/cjp2.174] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 05/21/2020] [Accepted: 05/27/2020] [Indexed: 11/06/2022]
Abstract
The largest whole genome sequencing (WGS) endeavour involving cancer and rare diseases was initiated in the UK in 2015 and ran for 5 years. Despite its rarity, sarcoma ranked third overall among the number of patients' samples sent for sequencing. Herein, we recount the lessons learned by a specialist sarcoma centre that recruited close to 1000 patients to the project, so that we and others may learn from our experience. WGS data was generated from 597 patients, but samples from the remaining approximately 400 patients were not sequenced. This was largely accounted for by unsuitability due to extensive necrosis, secondary to neoadjuvant radiotherapy or chemotherapy, or being placed in formalin. The number of informative genomes produced was reduced further by a PCR amplification step. We showed that this loss of genomic data could be mitigated by sequencing whole genomes from needle core biopsies. Storage of resection specimens at 4 °C for up to 96 h overcame the challenge of freezing tissue out of hours including weekends. Removing access to formalin increased compliance to these storage arrangements. With over 70 different sarcoma subtypes described, WGS was a useful tool for refining diagnoses and identifying novel alterations. Genomes from 350 of the cohort of 597 patients were analysed in this study. Overall, diagnoses were modified for 3% of patients following review of the WGS findings. Continued refinement of the variant-calling bioinformatic pipelines is required as not all alterations were identified when validated against histology and standard of care diagnostic tests. Further research is necessary to evaluate the impact of germline mutations in patients with sarcoma, and sarcomas with evidence of hypermutation. Despite 50% of the WGS exhibiting domain 1 alterations, the number of patients with sarcoma who were eligible for clinical trials remains small, highlighting the need to revaluate clinical trial design.
Collapse
Affiliation(s)
- Sophie C Prendergast
- Research Department of PathologyUniversity College London Cancer InstituteLondonUK
| | - Anna‐Christina Strobl
- Research Department of PathologyUniversity College London Cancer InstituteLondonUK
- Department of HistopathologyRoyal National Orthopaedic Hospital NHS TrustStanmoreUK
| | - William Cross
- Research Department of PathologyUniversity College London Cancer InstituteLondonUK
| | - Nischalan Pillay
- Research Department of PathologyUniversity College London Cancer InstituteLondonUK
- Department of HistopathologyRoyal National Orthopaedic Hospital NHS TrustStanmoreUK
| | - Sandra J Strauss
- Research Department of PathologyUniversity College London Cancer InstituteLondonUK
- Department of OncologyUniversity College London Hospital NHS Foundation TrustLondonUK
| | - Hongtao Ye
- Department of HistopathologyRoyal National Orthopaedic Hospital NHS TrustStanmoreUK
| | - Daniel Lindsay
- Department of HistopathologyRoyal National Orthopaedic Hospital NHS TrustStanmoreUK
| | - Roberto Tirabosco
- Department of HistopathologyRoyal National Orthopaedic Hospital NHS TrustStanmoreUK
| | - Jane Chalker
- SHIMDS Acquired GenomicsGreat Ormond Street Hospital for Children NHS TrustLondonUK
| | - Shazia S Mahamdallie
- Rare and Inherited Disease LaboratoryGreat Ormond Street Hospital for Children NHS TrustLondonUK
| | | | | | | | - Adrienne M Flanagan
- Research Department of PathologyUniversity College London Cancer InstituteLondonUK
- Department of HistopathologyRoyal National Orthopaedic Hospital NHS TrustStanmoreUK
| | - Fernanda Amary
- Research Department of PathologyUniversity College London Cancer InstituteLondonUK
- Department of HistopathologyRoyal National Orthopaedic Hospital NHS TrustStanmoreUK
| |
Collapse
|