1
|
Devaraj AR, Marianthiran VJ. Advancements in Viral Genomics: Gated Recurrent Unit Modeling of SARS-CoV-2, SARS, MERS, and Ebola viruses. Rev Soc Bras Med Trop 2025; 58:e004012024. [PMID: 39936709 PMCID: PMC11805527 DOI: 10.1590/0037-8682-0178-2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Accepted: 11/08/2024] [Indexed: 02/13/2025] Open
Abstract
BACKGROUND Emerging infections have posed persistent threats to humanity throughout history. Rapid and unprecedented anthropogenic, behavioral, and social transformations witnessed in the past century have expedited the emergence of novel pathogens, intensifying their impact on the global human population. METHODS This study aimed to comprehensively analyze and compare the genomic sequences of four distinct viruses: SARS-CoV-2, SARS, MERS, and Ebola. Advanced genomic sequencing techniques and a Gated Recurrent Unit-based deep learning model were used to examine the intricate genetic makeup of these viruses. The proposed study sheds light on their evolutionary dynamics, transmission patterns, and pathogenicity and contributes to the development of effective diagnostic and therapeutic interventions. RESULTS This model exhibited exceptional performance as evidenced by accuracy values of 99.01%, 98.91%, 98.35%, and 98.04% for SARS-CoV-2, SARS, MERS, and Ebola respectively. Precision values ranged from 98.1% to 98.72%, recall values consistently surpassed 92%, and F1 scores ranged from 95.47% to 96.37%. CONCLUSIONS These results underscore the robustness of this model and its potential utility in genomic analysis, paving the way for enhanced understanding, preparedness, and response to emerging viral threats. In the future, this research will focus on creating better diagnostic instruments for the early identification of viral illnesses, developing vaccinations, and tailoring treatments based on the genetic composition and evolutionary patterns of different viruses. This model can be modified to examine a more extensive variety of diseases and recently discovered viruses to predict future outbreaks and their effects on global health.
Collapse
Affiliation(s)
- Abhishak Raj Devaraj
- Noorul Islam Centre for Higher Education, Department of Computer Applications, Tamilnadu, India
| | - Victor Jose Marianthiran
- Vel Tech Multi Tech Dr. Rangarajan. Sakunthala Engineering College, Department of Artificial Intelligence and Data Science, Tamilnadu, India
| |
Collapse
|
2
|
Ulrich JU, Renard BY. Fast and space-efficient taxonomic classification of long reads with hierarchical interleaved XOR filters. Genome Res 2024; 34:914-924. [PMID: 38886068 PMCID: PMC11293544 DOI: 10.1101/gr.278623.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 05/23/2024] [Indexed: 06/20/2024]
Abstract
Metagenomic long-read sequencing is gaining popularity for various applications, including pathogen detection and microbiome studies. To analyze the large data created in those studies, software tools need to taxonomically classify the sequenced molecules and estimate the relative abundances of organisms in the sequenced sample. Because of the exponential growth of reference genome databases, the current taxonomic classification methods have large computational requirements. This issue motivated us to develop a new data structure for fast and memory-efficient querying of long reads. Here, we present Taxor as a new tool for long-read metagenomic classification using a hierarchical interleaved XOR filter data structure for indexing and querying large reference genome sets. Taxor implements several k-mer-based approaches, such as syncmers, for pseudoalignment to classify reads and an expectation-maximization algorithm for metagenomic profiling. Our results show that Taxor outperforms state-of-the-art tools regarding precision while having a similar recall for long-read taxonomic classification. Most notably, Taxor reduces the memory requirements and index size by >50% and is among the fastest tools regarding query times. This enables real-time metagenomics analysis with large reference databases on a small laptop in the field.
Collapse
Affiliation(s)
- Jens-Uwe Ulrich
- Data Analytics and Computational Statistics, Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, 14482 Potsdam, Germany;
- Phylogenomics Unit, Center for Artificial Intelligence in Public Health Research, Robert Koch Institute, 15745 Wildau, Germany
- Department of Mathematics and Computer Science, Free University of Berlin, 14195 Berlin, Germany
| | - Bernhard Y Renard
- Data Analytics and Computational Statistics, Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, 14482 Potsdam, Germany;
| |
Collapse
|
3
|
Jurado-Rueda F, Alonso-Guirado L, Perea-Chamblee TE, Elliott OT, Filip I, Rabadán R, Malats N. Benchmarking of microbiome detection tools on RNA-seq synthetic databases according to diverse conditions. BIOINFORMATICS ADVANCES 2023; 3:vbad014. [PMID: 36874954 PMCID: PMC9976984 DOI: 10.1093/bioadv/vbad014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 11/15/2022] [Accepted: 02/03/2023] [Indexed: 02/24/2023]
Abstract
Motivation Here, we performed a benchmarking analysis of five tools for microbe sequence detection using transcriptomics data (Kraken2, MetaPhlAn2, PathSeq, DRAC and Pandora). We built a synthetic database mimicking real-world structure with tuned conditions accounting for microbe species prevalence, base calling quality and sequence length. Sensitivity and positive predictive value (PPV) parameters, as well as computational requirements, were used for tool ranking. Results GATK PathSeq showed the highest sensitivity on average and across all scenarios considered. However, the main drawback of this tool was its slowness. Kraken2 was the fastest tool and displayed the second-best sensitivity, though with large variance depending on the species to be classified. There was no significant difference for the other three algorithms sensitivity. The sensitivity of MetaPhlAn2 and Pandora was affected by sequence number and DRAC by sequence quality and length. Results from this study support the use of Kraken2 for routine microbiome profiling based on its competitive sensitivity and runtime performance. Nonetheless, we strongly endorse to complement it by combining with MetaPhlAn2 for thorough taxonomic analyses. Availability and implementation https://github.com/fjuradorueda/MIME/ and https://github.com/lola4/DRAC/. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Francisco Jurado-Rueda
- Genetic & Molecular Epidemiology Group, Spanish National Cancer Research Centre and CIBERONC, Madrid 28029, Spain
| | - Lola Alonso-Guirado
- Genetic & Molecular Epidemiology Group, Spanish National Cancer Research Centre and CIBERONC, Madrid 28029, Spain
| | - Tomin E Perea-Chamblee
- Program for Mathematical Genomics and Department of Systems Biology, Columbia University, New York, NY 10027, USA
| | - Oliver T Elliott
- Program for Mathematical Genomics and Department of Systems Biology, Columbia University, New York, NY 10027, USA
| | - Ioan Filip
- Program for Mathematical Genomics and Department of Systems Biology, Columbia University, New York, NY 10027, USA
| | - Raúl Rabadán
- Program for Mathematical Genomics and Department of Systems Biology, Columbia University, New York, NY 10027, USA
| | - Núria Malats
- Genetic & Molecular Epidemiology Group, Spanish National Cancer Research Centre and CIBERONC, Madrid 28029, Spain
| |
Collapse
|
4
|
Robas Mora M, Fernández Pastrana VM, Probanza Lobo A, Jiménez Gómez PA. Valorization as a biofertilizer of an agricultural residue leachate: Metagenomic characterization and growth promotion test by PGPB in the forage plant Medicago sativa (alfalfa). Front Microbiol 2022; 13:1048154. [PMID: 36620069 PMCID: PMC9815802 DOI: 10.3389/fmicb.2022.1048154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Accepted: 11/28/2022] [Indexed: 12/24/2022] Open
Abstract
The abuse of chemical fertilizers in intensive agriculture has turned out in the contamination of ground and the soil on which they are applied. Likewise, the generation, storage, and destruction of plant residues from the agri-food industry poses a threat to the environment and human health. The current situation of growing demand for food implies the urgent need to find sustainable alternatives to chemical fertilizers and the management of agricultural waste. Valorization of this plant residue to produce natural biofertilizers using microbiological treatments is presented as a sustainable alternative. The microbial activity allows the transformation into simple molecules that are easily absorbed by plants, as well as the stimulation of plant growth. This double direct and indirect action induced significant increases against the variables of germination, viability, and biomass (dry weight). To guarantee biosafety, it is necessary to use new bio-technological tools, such as metagenomics, which allow the taxonomic analysis of microbial communities, detecting the absence of pathogens. In the present paper, a physicochemical and metagenomic characterization of a fertilizer obtained from agricultural plant waste valorization is carried out. Likewise, fertigation treatments were tested to which the Plant Growth Promoting Bacteria (PGPB) Pseudomonas agronomica and Bacillus pretiosus were added, both independently and in consortium. Metagenomic analysis has identified taxa belonging to the kingdoms Bacteria and Archaea; 10 phyla, 25 families, 32 genera and 34 species, none of them previously described as pathogenic. A 1/512 dilution of the fertilizer increased the germination rate of Medicago sativa (alfalfa) by 16% at 144 h, compared to the treatment without fertilizer. Both the fertilizer and the addition of PGPB in a double direct and indirect action induced significant increases against the variables of germination, viability, and biomass (dry weight). Therefore, the use of an agricultural residue is proposed, which after the addition of two new species is transformed into a biofertilizer that significantly induces plant growth in Mendicago sativa plants.
Collapse
Affiliation(s)
- Marina Robas Mora
- Department of Pharmaceutical Science and Health, Montepríncipe Campus, CEU San Pablo University, Madrid, Spain
| | | | | | | |
Collapse
|
5
|
Bartoszewicz JM, Nasri F, Nowicka M, Renard BY. Detecting DNA of novel fungal pathogens using ResNets and a curated fungi-hosts data collection. Bioinformatics 2022; 38:ii168-ii174. [PMID: 36124807 DOI: 10.1093/bioinformatics/btac495] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/08/2022] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND Emerging pathogens are a growing threat, but large data collections and approaches for predicting the risk associated with novel agents are limited to bacteria and viruses. Pathogenic fungi, which also pose a constant threat to public health, remain understudied. Relevant data remain comparatively scarce and scattered among many different sources, hindering the development of sequencing-based detection workflows for novel fungal pathogens. No prediction method working for agents across all three groups is available, even though the cause of an infection is often difficult to identify from symptoms alone. RESULTS We present a curated collection of fungal host range data, comprising records on human, animal and plant pathogens, as well as other plant-associated fungi, linked to publicly available genomes. We show that it can be used to predict the pathogenic potential of novel fungal species directly from DNA sequences with either sequence homology or deep learning. We develop learned, numerical representations of the collected genomes and visualize the landscape of fungal pathogenicity. Finally, we train multi-class models predicting if next-generation sequencing reads originate from novel fungal, bacterial or viral threats. CONCLUSIONS The neural networks trained using our data collection enable accurate detection of novel fungal pathogens. A curated set of over 1400 genomes with host and pathogenicity metadata supports training of machine-learning models and sequence comparison, not limited to the pathogen detection task. AVAILABILITY AND IMPLEMENTATION The data, models and code are hosted at https://zenodo.org/record/5846345, https://zenodo.org/record/5711877 and https://gitlab.com/dacs-hpi/deepac. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jakub M Bartoszewicz
- Hasso Plattner Institute for Digital Engineering, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany.,Department of Mathematics and Computer Science, Free University of Berlin, Berlin 14195, Germany
| | - Ferdous Nasri
- Hasso Plattner Institute for Digital Engineering, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany.,Department of Mathematics and Computer Science, Free University of Berlin, Berlin 14195, Germany
| | - Melania Nowicka
- Hasso Plattner Institute for Digital Engineering, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany.,Department of Mathematics and Computer Science, Free University of Berlin, Berlin 14195, Germany
| | - Bernhard Y Renard
- Hasso Plattner Institute for Digital Engineering, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany
| |
Collapse
|
6
|
PathoLive—Real-Time Pathogen Identification from Metagenomic Illumina Datasets. Life (Basel) 2022; 12:life12091345. [PMID: 36143382 PMCID: PMC9505849 DOI: 10.3390/life12091345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 08/24/2022] [Accepted: 08/24/2022] [Indexed: 11/18/2022] Open
Abstract
Over the past years, NGS has become a crucial workhorse for open-view pathogen diagnostics. Yet, long turnaround times result from using massively parallel high-throughput technologies as the analysis can only be performed after sequencing has finished. The interpretation of results can further be challenged by contaminations, clinically irrelevant sequences, and the sheer amount and complexity of the data. We implemented PathoLive, a real-time diagnostics pipeline for the detection of pathogens from clinical samples hours before sequencing has finished. Based on real-time alignment with HiLive2, mappings are scored with respect to common contaminations, low-entropy areas, and sequences of widespread, non-pathogenic organisms. The results are visualized using an interactive taxonomic tree that provides an easily interpretable overview of the relevance of hits. For a human plasma sample that was spiked in vitro with six pathogenic viruses, all agents were clearly detected after only 40 of 200 sequencing cycles. For a real-world sample from Sudan, the results correctly indicated the presence of Crimean-Congo hemorrhagic fever virus. In a second real-world dataset from the 2019 SARS-CoV-2 outbreak in Wuhan, we found the presence of a SARS coronavirus as the most relevant hit without the novel virus reference genome being included in the database. For all samples, clinically irrelevant hits were correctly de-emphasized. Our approach is valuable to obtain fast and accurate NGS-based pathogen identifications and correctly prioritize and visualize them based on their clinical significance: PathoLive is open source and available on GitLab and BioConda.
Collapse
|
7
|
Large Scale Genome-Centric Metagenomic Data from the Gut Microbiome of Food-Producing Animals and Humans. Sci Data 2022; 9:366. [PMID: 35752638 PMCID: PMC9233704 DOI: 10.1038/s41597-022-01465-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Accepted: 06/08/2022] [Indexed: 11/29/2022] Open
Abstract
The One Health concept is a global strategy to study the relationship between human and animal health and the transfer of pathogenic and non-pathogenic species between these systems. However, to the best of our knowledge, no data based on One Health genome-centric metagenomics are available in public repositories. Here, we present a dataset based on a pilot-study of 2,915 metagenome-assembled genomes (MAGs) of 107 samples from the human (N = 34), cattle (N = 28), swine (N = 15) and poultry (N = 30) gut microbiomes. Samples were collected from the five Brazilian geographical regions. Of the draft genomes, 1,273 were high-quality drafts (≥90% of completeness and ≤5% of contamination), and 1,642 were medium-quality drafts (≥50% of completeness and ≤10% of contamination). Taxonomic predictions were based on the alignment and concatenation of single-marker genes, and the most representative phyla were Bacteroidota, Firmicutes, and Proteobacteria. Many of these species represent potential pathogens that have already been described or potential new families, genera, and species with potential biotechnological applications. Analyses of this dataset will highlight discoveries about the ecology and functional role of pathogens and uncultivated Archaea and Bacteria from food-producing animals and humans. Furthermore, it also represents an opportunity to describe new species from underrepresented taxonomic groups. Measurement(s) | Metagenome | Technology Type(s) | Illumina Sequencing |
Collapse
|
8
|
Chelliah R, Banan-MwineDaliri E, Khan I, Wei S, Elahi F, Yeon SJ, Selvakumar V, Ofosu FK, Rubab M, Ju HH, Rallabandi HR, Madar IH, Sultan G, Oh DH. A review on the application of bioinformatics tools in food microbiome studies. Brief Bioinform 2022; 23:bbac007. [PMID: 35189636 DOI: 10.1093/bib/bbac007] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Revised: 12/20/2021] [Accepted: 01/05/2022] [Indexed: 12/12/2022] Open
Abstract
There is currently a transformed interest toward understanding the impact of fermentation on functional food development due to growing consumer interest on modified health benefits of sustainable foods. In this review, we attempt to summarize recent findings regarding the impact of Next-generation sequencing and other bioinformatics methods in the food microbiome and use prediction software to understand the critical role of microbes in producing fermented foods. Traditionally, fermentation methods and starter culture development were considered conventional methods needing optimization to eliminate errors in technique and were influenced by technical knowledge of fermentation. Recent advances in high-output omics innovations permit the implementation of additional logical tactics for developing fermentation methods. Further, the review describes the multiple functions of the predictions based on docking studies and the correlation of genomic and metabolomic analysis to develop trends to understand the potential food microbiome interactions and associated products to become a part of a healthy diet.
Collapse
Affiliation(s)
- Ramachandran Chelliah
- Department of Food Science and Biotechnology, College of Agriculture and Life Sciences, Kangwon National University, Chuncheon, Gangwon-do 24341, Korea
| | - Eric Banan-MwineDaliri
- Department of Food Science and Biotechnology, College of Agriculture and Life Sciences, Kangwon National University, Chuncheon, Gangwon-do 24341, Korea
| | - Imran Khan
- Department of Food Science and Biotechnology, College of Agriculture and Life Sciences, Kangwon National University, Chuncheon, Gangwon-do 24341, Korea
- Department of Biotechnology, University of Malakand, Khyber Pakhtunkhwa Pakistan
| | - Shuai Wei
- Department of Food Science and Biotechnology, College of Agriculture and Life Sciences, Kangwon National University, Chuncheon, Gangwon-do 24341, Korea
- Guangdong Provincial Key Laboratory of Aquatic Product Processing and Safety, College of Food Science and Technology, Guangdong Ocean University, Zhanjiang 524088, China
| | - Fazle Elahi
- Department of Food Science and Biotechnology, College of Agriculture and Life Sciences, Kangwon National University, Chuncheon, Gangwon-do 24341, Korea
| | - Su-Jung Yeon
- Department of Food Science and Biotechnology, College of Agriculture and Life Sciences, Kangwon National University, Chuncheon, Gangwon-do 24341, Korea
| | - Vijayalakshmi Selvakumar
- Department of Food Science and Biotechnology, College of Agriculture and Life Sciences, Kangwon National University, Chuncheon, Gangwon-do 24341, Korea
| | - Fred Kwame Ofosu
- Department of Food Science and Biotechnology, College of Agriculture and Life Sciences, Kangwon National University, Chuncheon, Gangwon-do 24341, Korea
| | - Momna Rubab
- Department of Food Science and Biotechnology, College of Agriculture and Life Sciences, Kangwon National University, Chuncheon, Gangwon-do 24341, Korea
| | - Hum Hun Ju
- Department of Biological Environment, College of Agriculture and Life Sciences, Kangwon National University, Chuncheon, Gangwon-do 24341, Korea
| | - Harikrishna Reddy Rallabandi
- Department of Food Science and Biotechnology, College of Agriculture and Life Sciences, Kangwon National University, Chuncheon, Gangwon-do 24341, Korea
| | - Inamul Hasan Madar
- Department of Biochemistry, School of Life Science, Bharathidasan, University, Thiruchirappalli, Tamilnadu, India
| | - Ghazala Sultan
- Department of Computer Science, Aligarh Muslim University, Aligarh, Uttar Pradesh, 202002, India
| | - Deog Hwan Oh
- Department of Food Science and Biotechnology, College of Agriculture and Life Sciences, Kangwon National University, Chuncheon, Gangwon-do 24341, Korea
| |
Collapse
|
9
|
Optimization of cerebrospinal fluid microbial DNA metagenomic sequencing diagnostics. Sci Rep 2022; 12:3378. [PMID: 35233021 PMCID: PMC8888594 DOI: 10.1038/s41598-022-07260-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Accepted: 02/04/2022] [Indexed: 12/25/2022] Open
Abstract
Infection in the central nervous system is a severe condition associated with high morbidity and mortality. Despite ample testing, the majority of encephalitis and meningitis cases remain undiagnosed. Metagenomic sequencing of cerebrospinal fluid has emerged as an unbiased approach to identify rare microbes and novel pathogens. However, several major hurdles remain, including establishment of individual limits of detection, removal of false positives and implementation of universal controls. Twenty-one cerebrospinal fluid samples, in which a known pathogen had been positively identified by available clinical techniques, were subjected to metagenomic DNA sequencing. Fourteen samples contained minute levels of Epstein-Barr virus. The detection threshold for each sample was calculated by using the total leukocyte content in the sample and environmental contaminants found in the bioinformatic classifiers. Virus sequences were detected in all ten samples, in which more than one read was expected according to the calculations. Conversely, no viral reads were detected in seven out of eight samples, in which less than one read was expected according to the calculations. False positive pathogens of computational or environmental origin were readily identified, by using a commonly available cell control. For bacteria, additional filters including a comparison between classifiers removed the remaining false positives and alleviated pathogen identification. Here we show a generalizable method for identification of pathogen species using DNA metagenomic sequencing. The choice of bioinformatic method mainly affected the efficiency of pathogen identification, but not the sensitivity of detection. Identification of pathogens requires multiple filtering steps including read distribution, sequence diversity and complementary verification of pathogen reads.
Collapse
|
10
|
Voigt B, Fischer O, Krumnow C, Herta C, Dabrowski PW. NGS read classification using AI. PLoS One 2021; 16:e0261548. [PMID: 34936673 PMCID: PMC8694450 DOI: 10.1371/journal.pone.0261548] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Accepted: 12/03/2021] [Indexed: 11/19/2022] Open
Abstract
Clinical metagenomics is a powerful diagnostic tool, as it offers an open view into all DNA in a patient's sample. This allows the detection of pathogens that would slip through the cracks of classical specific assays. However, due to this unspecific nature of metagenomic sequencing, a huge amount of unspecific data is generated during the sequencing itself and the diagnosis only takes place at the data analysis stage where relevant sequences are filtered out. Typically, this is done by comparison to reference databases. While this approach has been optimized over the past years and works well to detect pathogens that are represented in the used databases, a common challenge in analysing a metagenomic patient sample arises when no pathogen sequences are found: How to determine whether truly no evidence of a pathogen is present in the data or whether the pathogen's genome is simply absent from the database and the sequences in the dataset could thus not be classified? Here, we present a novel approach to this problem of detecting novel pathogens in metagenomic datasets by classifying the (segments of) proteins encoded by the sequences in the datasets. We train a neural network on the sequences of coding sequences, labeled by taxonomic domain, and use this neural network to predict the taxonomic classification of sequences that can not be classified by comparison to a reference database, thus facilitating the detection of potential novel pathogens.
Collapse
Affiliation(s)
- Benjamin Voigt
- Center for Bio-Medical image and Information processing (CBMI), HTW University of Applied Sciences, Berlin, Germany
| | - Oliver Fischer
- Center for Bio-Medical image and Information processing (CBMI), HTW University of Applied Sciences, Berlin, Germany
| | - Christian Krumnow
- Center for Bio-Medical image and Information processing (CBMI), HTW University of Applied Sciences, Berlin, Germany
| | - Christian Herta
- Center for Bio-Medical image and Information processing (CBMI), HTW University of Applied Sciences, Berlin, Germany
| | - Piotr Wojciech Dabrowski
- Center for Bio-Medical image and Information processing (CBMI), HTW University of Applied Sciences, Berlin, Germany
| |
Collapse
|
11
|
Utilizing the VirIdAl Pipeline to Search for Viruses in the Metagenomic Data of Bat Samples. Viruses 2021; 13:v13102006. [PMID: 34696436 PMCID: PMC8541124 DOI: 10.3390/v13102006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 09/30/2021] [Accepted: 10/02/2021] [Indexed: 12/27/2022] Open
Abstract
According to various estimates, only a small percentage of existing viruses have been discovered, naturally much less being represented in the genomic databases. High-throughput sequencing technologies develop rapidly, empowering large-scale screening of various biological samples for the presence of pathogen-associated nucleotide sequences, but many organisms are yet to be attributed specific loci for identification. This problem particularly impedes viral screening, due to vast heterogeneity in viral genomes. In this paper, we present a new bioinformatic pipeline, VirIdAl, for detecting and identifying viral pathogens in sequencing data. We also demonstrate the utility of the new software by applying it to viral screening of the feces of bats collected in the Moscow region, which revealed a significant variety of viruses associated with bats, insects, plants, and protozoa. The presence of alpha and beta coronavirus reads, including the MERS-like bat virus, deserves a special mention, as it once again indicates that bats are indeed reservoirs for many viral pathogens. In addition, it was shown that alignment-based methods were unable to identify the taxon for a large proportion of reads, and we additionally applied other approaches, showing that they can further reveal the presence of viral agents in sequencing data. However, the incompleteness of viral databases remains a significant problem in the studies of viral diversity, and therefore necessitates the use of combined approaches, including those based on machine learning methods.
Collapse
|
12
|
Bartoszewicz JM, Genske U, Renard BY. Deep learning-based real-time detection of novel pathogens during sequencing. Brief Bioinform 2021; 22:6326527. [PMID: 34297793 DOI: 10.1093/bib/bbab269] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Revised: 06/09/2021] [Accepted: 06/23/2021] [Indexed: 11/12/2022] Open
Abstract
Novel pathogens evolve quickly and may emerge rapidly, causing dangerous outbreaks or even global pandemics. Next-generation sequencing is the state of the art in open-view pathogen detection, and one of the few methods available at the earliest stages of an epidemic, even when the biological threat is unknown. Analyzing the samples as the sequencer is running can greatly reduce the turnaround time, but existing tools rely on close matches to lists of known pathogens and perform poorly on novel species. Machine learning approaches can predict if single reads originate from more distant, unknown pathogens but require relatively long input sequences and processed data from a finished sequencing run. Incomplete sequences contain less information, leading to a trade-off between sequencing time and detection accuracy. Using a workflow for real-time pathogenic potential prediction, we investigate which subsequences already allow accurate inference. We train deep neural networks to classify Illumina and Nanopore reads and integrate the models with HiLive2, a real-time Illumina mapper. This approach outperforms alternatives based on machine learning and sequence alignment on simulated and real data, including SARS-CoV-2 sequencing runs. After just 50 Illumina cycles, we observe an 80-fold sensitivity increase compared to real-time mapping. The first 250 bp of Nanopore reads, corresponding to 0.5 s of sequencing time, are enough to yield predictions more accurate than mapping the finished long reads. The approach could also be used for screening synthetic sequences against biosecurity threats.
Collapse
Affiliation(s)
- Jakub M Bartoszewicz
- Digital Engineering Faculty, Hasso Plattner Institute, University of Postdam, Prof.-Dr.-Helmert-Straße 2-3, 14482 Brandenburg, Germany
| | - Ulrich Genske
- Digital Engineering Faculty, Hasso Plattner Institute, University of Postdam, Prof.-Dr.-Helmert-Straße 2-3, 14482 Brandenburg, Germany
| | - Bernhard Y Renard
- Digital Engineering Faculty, Hasso Plattner Institute, University of Postdam, Prof.-Dr.-Helmert-Straße 2-3, 14482 Brandenburg, Germany
| |
Collapse
|
13
|
Desai S, Rashmi S, Rane A, Dharavath B, Sawant A, Dutt A. An integrated approach to determine the abundance, mutation rate and phylogeny of the SARS-CoV-2 genome. Brief Bioinform 2021; 22:1065-1075. [PMID: 33479725 PMCID: PMC7929363 DOI: 10.1093/bib/bbaa437] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Revised: 12/22/2020] [Accepted: 12/28/2020] [Indexed: 02/05/2023] Open
Abstract
The analysis of the SARS-CoV-2 genome datasets has significantly advanced our understanding of the biology and genomic adaptability of the virus. However, the plurality of advanced sequencing datasets-such as short and long reads-presents a formidable computational challenge to uniformly perform quantitative, variant or phylogenetic analysis, thus limiting its application in public health laboratories engaged in studying epidemic outbreaks. We present a computational tool, Infectious Pathogen Detector (IPD), to perform integrated analysis of diverse genomic datasets, with a customized analytical module for the SARS-CoV-2 virus. The IPD pipeline quantitates individual occurrences of 1060 pathogens and performs mutation and phylogenetic analysis from heterogeneous sequencing datasets. Using IPD, we demonstrate a varying burden (5.055-999655.7 fragments per million) of SARS-CoV-2 transcripts across 1500 short- and long-read sequencing SARS-CoV-2 datasets and identify 4634 SARS-CoV-2 variants (~3.05 variants per sample), including 449 novel variants, across the genome with distinct hotspot mutations in the ORF1ab and S genes along with their phylogenetic relationships establishing the utility of IPD in tracing the genome isolates from the genomic data (as accessed on 11 June 2020). The IPD predicts the occurrence and dynamics of variability among infectious pathogens-with a potential for direct utility in the COVID-19 pandemic and beyond to help automate the sequencing-based pathogen analysis and in responding to public health threats, efficaciously. A graphical user interface (GUI)-enabled desktop application is freely available for download for the academic users at http://www.actrec.gov.in/pi-webpages/AmitDutt/IPD/IPD.html and for web-based processing at http://ipd.actrec.gov.in/ipdweb/ to generate an automated report without any prior computational know-how.
Collapse
Affiliation(s)
- Sanket Desai
- Integrated Cancer Genomics Laboratory, Advanced Centre for Treatment, Research, and Education in Cancer, Kharghar, Navi Mumbai, Maharashtra, 410210, India
- Homi Bhabha National Institute, Training School Complex, Anushakti Nagar, Mumbai, Maharashtra, 400094, India
| | - Sonal Rashmi
- Integrated Cancer Genomics Laboratory, Advanced Centre for Treatment, Research, and Education in Cancer, Kharghar, Navi Mumbai, Maharashtra, 410210, India
| | - Aishwarya Rane
- Integrated Cancer Genomics Laboratory, Advanced Centre for Treatment, Research, and Education in Cancer, Kharghar, Navi Mumbai, Maharashtra, 410210, India
| | - Bhasker Dharavath
- Integrated Cancer Genomics Laboratory, Advanced Centre for Treatment, Research, and Education in Cancer, Kharghar, Navi Mumbai, Maharashtra, 410210, India
- Homi Bhabha National Institute, Training School Complex, Anushakti Nagar, Mumbai, Maharashtra, 400094, India
| | - Aniket Sawant
- Integrated Cancer Genomics Laboratory, Advanced Centre for Treatment, Research, and Education in Cancer, Kharghar, Navi Mumbai, Maharashtra, 410210, India
- Homi Bhabha National Institute, Training School Complex, Anushakti Nagar, Mumbai, Maharashtra, 400094, India
| | - Amit Dutt
- Integrated Cancer Genomics Laboratory, Advanced Centre for Treatment, Research, and Education in Cancer, Kharghar, Navi Mumbai, Maharashtra, 410210, India
- Homi Bhabha National Institute, Training School Complex, Anushakti Nagar, Mumbai, Maharashtra, 400094, India
- Adjunct Faculty, Institute of Advanced Virology, Kerala State Council for Science, Technology and Environment, Govt. of Kerala, Thonnakkal, Kerala, 695317, India
| |
Collapse
|
14
|
Bartoszewicz JM, Seidel A, Renard BY. Interpretable detection of novel human viruses from genome sequencing data. NAR Genom Bioinform 2021; 3:lqab004. [PMID: 33554119 PMCID: PMC7849996 DOI: 10.1093/nargab/lqab004] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Revised: 01/04/2021] [Accepted: 01/15/2021] [Indexed: 01/21/2023] Open
Abstract
Viruses evolve extremely quickly, so reliable methods for viral host prediction are necessary to safeguard biosecurity and biosafety alike. Novel human-infecting viruses are difficult to detect with standard bioinformatics workflows. Here, we predict whether a virus can infect humans directly from next-generation sequencing reads. We show that deep neural architectures significantly outperform both shallow machine learning and standard, homology-based algorithms, cutting the error rates in half and generalizing to taxonomic units distant from those presented during training. Further, we develop a suite of interpretability tools and show that it can be applied also to other models beyond the host prediction task. We propose a new approach for convolutional filter visualization to disentangle the information content of each nucleotide from its contribution to the final classification decision. Nucleotide-resolution maps of the learned associations between pathogen genomes and the infectious phenotype can be used to detect regions of interest in novel agents, for example, the SARS-CoV-2 coronavirus, unknown before it caused a COVID-19 pandemic in 2020. All methods presented here are implemented as easy-to-install packages not only enabling analysis of NGS datasets without requiring any deep learning skills, but also allowing advanced users to easily train and explain new models for genomics.
Collapse
Affiliation(s)
- Jakub M Bartoszewicz
- Bioinformatics (MF1), Department of Methodology and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany
- Department of Mathematics and Computer Science, Free University of Berlin, 14195 Berlin, Germany
- Data Analytics and Computational Statistics, Hasso Plattner Institute for Digital Engineering, 14482 Potsdam, Brandenburg, Germany
- Digital Engineering Faculty, University of Postdam, 14482 Potsdam, Brandenburg, Germany
| | - Anja Seidel
- Bioinformatics (MF1), Department of Methodology and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany
- Department of Mathematics and Computer Science, Free University of Berlin, 14195 Berlin, Germany
| | - Bernhard Y Renard
- Bioinformatics (MF1), Department of Methodology and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany
- Data Analytics and Computational Statistics, Hasso Plattner Institute for Digital Engineering, 14482 Potsdam, Brandenburg, Germany
- Digital Engineering Faculty, University of Postdam, 14482 Potsdam, Brandenburg, Germany
| |
Collapse
|
15
|
Chen X, Li D. Sequencing facility and DNA source associated patterns of virus-mappable reads in whole-genome sequencing data. Genomics 2021; 113:1189-1198. [PMID: 33301893 PMCID: PMC7856238 DOI: 10.1016/j.ygeno.2020.12.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2020] [Revised: 11/25/2020] [Accepted: 12/04/2020] [Indexed: 12/12/2022]
Abstract
Numerous viral sequences have been reported in the whole-genome sequencing (WGS) data of human blood. However, it is not clear to what degree the virus-mappable reads represent true viral sequences rather than random-mapping or noise originating from sample preparation, sequencing processes, or other sources. Identification of patterns of virus-mappable reads may generate novel indicators for evaluating the origins of these viral sequences. We characterized paired-end unmapped reads and reads aligned to viral references in human WGS datasets, then compared patterns of the virus-mappable reads among DNA sources and sequencing facilities which produced these datasets. We then examined potential origins of the source- and facility-associated viral reads. The proportions of clean unmapped reads among the seven sequencing facilities were significantly different (P < 2 × 10-16). We identified 260,339 reads that were mappable to a total of 99 viral references in 2535 samples. The majority (86.7%) of these virus-mappable reads (corresponding to 47 viral references), which can be classified into four groups based on their distinct patterns, were strongly associated with sequencing facility or DNA source (adjusted P value <0.01). Possible origins of these reads include artificial sequences in library preparation, recombinant vectors in cell culture, and phages co-contaminated with their host bacteria. The sequencing facility-associated virus-mappable reads and patterns were repeatedly observed in other datasets produced in the same facilities. We have constructed an analytic framework and profiled the unmapped reads mappable to viral references. The results provide a new understanding of sequencing facility- and DNA source-associated batch effects in deep sequencing data and may facilitate improved bioinformatics filtering of reads.
Collapse
Affiliation(s)
- Xun Chen
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, VT 05405, USA
| | - Dawei Li
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, VT 05405, USA; Department of Computer Science, University of Vermont, Burlington, VT 05405, USA; Neuroscience, Behavior, Health Initiative, University of Vermont, Burlington, VT 05405, USA.
| |
Collapse
|
16
|
Ebinger A, Fischer S, Höper D. A theoretical and generalized approach for the assessment of the sample-specific limit of detection for clinical metagenomics. Comput Struct Biotechnol J 2020; 19:732-742. [PMID: 33552445 PMCID: PMC7822954 DOI: 10.1016/j.csbj.2020.12.040] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Revised: 12/16/2020] [Accepted: 12/24/2020] [Indexed: 12/18/2022] Open
Abstract
Metagenomics is a powerful tool to identify novel or unexpected pathogens, since it is generic and relatively unbiased. The limit of detection (LOD) is a critical parameter for the routine application of methods in the clinical diagnostic context. Although attempts for the determination of LODs for metagenomics next-generation sequencing (mNGS) have been made previously, these were only applicable for specific target species in defined samples matrices. Therefore, we developed and validated a generalized probability-based model to assess the sample-specific LOD of mNGS experiments (LODmNGS). Initial rarefaction analyses with datasets of Borna disease virus 1 human encephalitis cases revealed a stochastic behavior of virus read detection. Based on this, we transformed the Bernoulli formula to predict the minimal necessary dataset size to detect one virus read with a probability of 99%. We validated the formula with 30 datasets from diseased individuals, resulting in an accuracy of 99.1% and an average of 4.5 ± 0.4 viral reads found in the calculated minimal dataset size. We demonstrated by modeling the virus genome size, virus-, and total RNA-concentration that the main determinant of mNGS sensitivity is the virus-sample background ratio. The predicted LODmNGS for the respective pathogenic virus in the datasets were congruent with the virus-concentration determined by RT-qPCR. Theoretical assumptions were further confirmed by correlation analysis of mNGS and RT-qPCR data from the samples of the analyzed datasets. This approach should guide standardization of mNGS application, due to the generalized concept of LODmNGS.
Collapse
Affiliation(s)
- Arnt Ebinger
- Institute for Diagnostic Virology, Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, Südufer 10, 17493 Greifswald-Insel Riems, Mecklenburg-Western Pomerania, Germany
| | - Susanne Fischer
- Institute of Infectology, Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, Südufer 10, 17493 Greifswald-Insel Riems, Mecklenburg-Western Pomerania, Germany
| | - Dirk Höper
- Institute for Diagnostic Virology, Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, Südufer 10, 17493 Greifswald-Insel Riems, Mecklenburg-Western Pomerania, Germany
| |
Collapse
|
17
|
Bartoszewicz JM, Seidel A, Rentzsch R, Renard BY. DeePaC: predicting pathogenic potential of novel DNA with reverse-complement neural networks. Bioinformatics 2020; 36:81-89. [PMID: 31298694 DOI: 10.1093/bioinformatics/btz541] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Revised: 06/22/2019] [Accepted: 07/10/2019] [Indexed: 12/31/2022] Open
Abstract
MOTIVATION We expect novel pathogens to arise due to their fast-paced evolution, and new species to be discovered thanks to advances in DNA sequencing and metagenomics. Moreover, recent developments in synthetic biology raise concerns that some strains of bacteria could be modified for malicious purposes. Traditional approaches to open-view pathogen detection depend on databases of known organisms, which limits their performance on unknown, unrecognized and unmapped sequences. In contrast, machine learning methods can infer pathogenic phenotypes from single NGS reads, even though the biological context is unavailable. RESULTS We present DeePaC, a Deep Learning Approach to Pathogenicity Classification. It includes a flexible framework allowing easy evaluation of neural architectures with reverse-complement parameter sharing. We show that convolutional neural networks and LSTMs outperform the state-of-the-art based on both sequence homology and machine learning. Combining a deep learning approach with integrating the predictions for both mates in a read pair results in cutting the error rate almost in half in comparison to the previous state-of-the-art. AVAILABILITY AND IMPLEMENTATION The code and the models are available at: https://gitlab.com/rki_bioinformatics/DeePaC. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jakub M Bartoszewicz
- Bioinformatics Unit (MF1), Department of Methodology and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany
- Department of Mathematics and Computer Science, Free University of Berlin, 14195 Berlin, Germany
| | - Anja Seidel
- Bioinformatics Unit (MF1), Department of Methodology and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany
- Department of Mathematics and Computer Science, Free University of Berlin, 14195 Berlin, Germany
| | - Robert Rentzsch
- Bioinformatics Unit (MF1), Department of Methodology and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany
| | - Bernhard Y Renard
- Bioinformatics Unit (MF1), Department of Methodology and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany
| |
Collapse
|
18
|
Brüggemann H, Al-Zeer MA. Bacterial signatures and their inflammatory potentials associated with prostate cancer. APMIS 2020; 128:80-91. [PMID: 31990107 DOI: 10.1111/apm.13021] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2019] [Accepted: 11/25/2019] [Indexed: 02/06/2023]
Abstract
Chronic inflammation can create a microenvironment that can contribute to the formation of prostate pathologies. Far less well understood is the origin of inflammation in the prostate. One potential source is microbial infections of the prostate. This review summarizes recent findings regarding the presence of bacteria in the prostate and the dysbiosis of bacterial populations in the urinary tract and the gastrointestinal tract related to prostate cancer, thereby focusing on next-generation sequencing (NGS)-generated data. The current limitations regarding NGS-based detection methods and other difficulties in the quest for a microbial etiology for prostate cancer are discussed. We then focus on a few bacterial species, including Cutibacterium acnes and Escherichia coli that are often NGS-detected in prostatic tissue specimens, and discuss their possible contribution as initiator or enhancer of prostate inflammation and prostate carcinogenesis.
Collapse
Affiliation(s)
| | - Munir A Al-Zeer
- Institute of Biotechnology, Department of Applied Biochemistry, Technical University of Berlin, Berlin, Germany
| |
Collapse
|
19
|
Han D, Li Z, Li R, Tan P, Zhang R, Li J. mNGS in clinical microbiology laboratories: on the road to maturity. Crit Rev Microbiol 2019; 45:668-685. [PMID: 31691607 DOI: 10.1080/1040841x.2019.1681933] [Citation(s) in RCA: 224] [Impact Index Per Article: 37.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Metagenomic next-generation sequencing (mNGS) is increasingly being applied in clinical laboratories for unbiased culture-independent diagnosis. Whether it can be a next routine pathogen identification tool has become a topic of concern. We review the current implementation of this new technology for infectious disease diagnostics and discuss the feasibility of transforming mNGS into a routine diagnostic test. Since 2008, numerous studies from over 20 countries have revealed the practicality of mNGS in the work-up of undiagnosed infectious diseases. mNGS performs well in identifying rare, novel, difficult-to-detect and coinfected pathogens directly from clinical samples and presents great potential in resistance prediction by sequencing the antibiotic resistance genes, providing new diagnostic evidence that can be used to guide treatment options and improve antibiotic stewardship. Many physicians recognized mNGS as a last resort method to address clinical infection problems. Although several hurdles, such as workflow validation, quality control, method standardisation, and data interpretation, remain before mNGS can be implemented routinely in clinical laboratories, they are temporary and can be overcome by rapidly evolving technologies. With more validated workflows, lower cost and turnaround time, and simplified interpretation criteria, mNGS will be widely accepted in clinical practice. Overall, mNGS is transforming the landscape of clinical microbiology laboratories, and to ensure that it is properly utilised in clinical diagnosis, both physicians and microbiologists should have a thorough understanding of the power and limitations of this method.
Collapse
Affiliation(s)
- Dongsheng Han
- National Center for Clinical Laboratories, Beijing Hospital, National Center of Gerontology, Beijing, People's Republic of China.,Graduate School, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, People's Republic of China.,Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, People's Republic of China
| | - Ziyang Li
- National Center for Clinical Laboratories, Beijing Hospital, National Center of Gerontology, Beijing, People's Republic of China.,Graduate School, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, People's Republic of China.,Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, People's Republic of China
| | - Rui Li
- National Center for Clinical Laboratories, Beijing Hospital, National Center of Gerontology, Beijing, People's Republic of China.,Graduate School, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, People's Republic of China.,Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, People's Republic of China
| | - Ping Tan
- National Center for Clinical Laboratories, Beijing Hospital, National Center of Gerontology, Beijing, People's Republic of China.,Graduate School, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, People's Republic of China.,Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, People's Republic of China
| | - Rui Zhang
- National Center for Clinical Laboratories, Beijing Hospital, National Center of Gerontology, Beijing, People's Republic of China.,Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, People's Republic of China
| | - Jinming Li
- National Center for Clinical Laboratories, Beijing Hospital, National Center of Gerontology, Beijing, People's Republic of China.,Graduate School, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, People's Republic of China.,Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, People's Republic of China
| |
Collapse
|
20
|
Rhee C, Kharod GA, Schaad N, Furukawa NW, Vora NM, Blaney DD, Crump JA, Clarke KR. Global knowledge gaps in acute febrile illness etiologic investigations: A scoping review. PLoS Negl Trop Dis 2019; 13:e0007792. [PMID: 31730635 PMCID: PMC6881070 DOI: 10.1371/journal.pntd.0007792] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2019] [Revised: 11/27/2019] [Accepted: 09/18/2019] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Acute febrile illness (AFI), a common reason for people seeking medical care globally, represents a spectrum of infectious disease etiologies with important variations geographically and by population. There is no standardized approach to conducting AFI etiologic investigations, limiting interpretation of data in a global context. We conducted a scoping review to characterize current AFI research methodologies, identify global research gaps, and provide methodological research standardization recommendations. METHODOLOGY/FINDINGS Using pre-defined terms, we searched Medline, Embase, and Global Health, for publications from January 1, 2005-December 31, 2017. Publications cited in previously published systematic reviews and an online study repository of non-malarial febrile illness etiologies were also included. We screened abstracts for publications reporting on human infectious disease, aimed at determining AFI etiology using laboratory diagnostics. One-hundred ninety publications underwent full-text review, using a standardized tool to collect data on study characteristics, methodology, and laboratory diagnostics. AFI case definitions between publications varied: use of self-reported fever as part of case definitions (28%, 53/190), fever cut-off value (38·0°C most commonly used: 45%, 85/190), and fever measurement site (axillary most commonly used: 19%, 36/190). Eighty-nine publications (47%) did not include exclusion criteria, and inclusion criteria in 13% (24/190) of publications did not include age group. No publications included study settings in Southern Africa, Micronesia & Polynesia, or Central Asia. We summarized standardized reporting practices, specific to AFI etiologic investigations that would increase inter-study comparability. CONCLUSIONS Wider implementation of standardized AFI reporting methods, with multi-pathogen disease detection, could improve comparability of study findings, knowledge of the range of AFI etiologies, and their contributions to the global AFI burden. These steps can guide resource allocation, strengthen outbreak detection and response, target prevention efforts, and improve clinical care, especially in resource-limited settings where disease control often relies on empiric treatment. PROSPERO: CRD42016035666.
Collapse
Affiliation(s)
- Chulwoo Rhee
- Division of Global Health Protection, Center for Global Health, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
| | - Grishma A. Kharod
- Division of High-Consequence Pathogens and Pathology, National Center for Emerging and Zoonotic Infectious Disease, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
| | - Nicolas Schaad
- Division of Global Health Protection, Center for Global Health, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
| | - Nathan W. Furukawa
- Department of Medicine, University of Washington, Seattle, Washington, United States of America
| | - Neil M. Vora
- Division of Global Health Protection, Center for Global Health, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
| | - David D. Blaney
- Division of High-Consequence Pathogens and Pathology, National Center for Emerging and Zoonotic Infectious Disease, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
| | - John A. Crump
- Division of Infectious Diseases and International Health, Duke University Medical Center, Durham, North Carolina, United States of America
- Centre for International Health, University of Otago, New Zealand
| | - Kevin R. Clarke
- Division of Global Health Protection, Center for Global Health, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
| |
Collapse
|
21
|
Brinkmann A, Andrusch A, Belka A, Wylezich C, Höper D, Pohlmann A, Nordahl Petersen T, Lucas P, Blanchard Y, Papa A, Melidou A, Oude Munnink BB, Matthijnssens J, Deboutte W, Ellis RJ, Hansmann F, Baumgärtner W, van der Vries E, Osterhaus A, Camma C, Mangone I, Lorusso A, Marcacci M, Nunes A, Pinto M, Borges V, Kroneman A, Schmitz D, Corman VM, Drosten C, Jones TC, Hendriksen RS, Aarestrup FM, Koopmans M, Beer M, Nitsche A. Proficiency Testing of Virus Diagnostics Based on Bioinformatics Analysis of Simulated In Silico High-Throughput Sequencing Data Sets. J Clin Microbiol 2019; 57:e00466-19. [PMID: 31167846 PMCID: PMC6663916 DOI: 10.1128/jcm.00466-19] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2019] [Accepted: 05/28/2019] [Indexed: 12/22/2022] Open
Abstract
Quality management and independent assessment of high-throughput sequencing-based virus diagnostics have not yet been established as a mandatory approach for ensuring comparable results. The sensitivity and specificity of viral high-throughput sequence data analysis are highly affected by bioinformatics processing using publicly available and custom tools and databases and thus differ widely between individuals and institutions. Here we present the results of the COMPARE [Collaborative Management Platform for Detection and Analyses of (Re-)emerging and Foodborne Outbreaks in Europe] in silico virus proficiency test. An artificial, simulated in silico data set of Illumina HiSeq sequences was provided to 13 different European institutes for bioinformatics analysis to identify viral pathogens in high-throughput sequence data. Comparison of the participants' analyses shows that the use of different tools, programs, and databases for bioinformatics analyses can impact the correct identification of viral sequences from a simple data set. The identification of slightly mutated and highly divergent virus genomes has been shown to be most challenging. Furthermore, the interpretation of the results, together with a fictitious case report, by the participants showed that in addition to the bioinformatics analysis, the virological evaluation of the results can be important in clinical settings. External quality assessment and proficiency testing should become an important part of validating high-throughput sequencing-based virus diagnostics and could improve the harmonization, comparability, and reproducibility of results. There is a need for the establishment of international proficiency testing, like that established for conventional laboratory tests such as PCR, for bioinformatics pipelines and the interpretation of such results.
Collapse
Affiliation(s)
- Annika Brinkmann
- Robert Koch Institute, Centre for Biological Threats and Special Pathogens 1, Berlin, Germany
| | - Andreas Andrusch
- Robert Koch Institute, Centre for Biological Threats and Special Pathogens 1, Berlin, Germany
| | - Ariane Belka
- Friedrich-Loeffler-Institut, Institute of Diagnostic Virology, Greifswald-Insel Riems, Germany
| | - Claudia Wylezich
- Friedrich-Loeffler-Institut, Institute of Diagnostic Virology, Greifswald-Insel Riems, Germany
| | - Dirk Höper
- Friedrich-Loeffler-Institut, Institute of Diagnostic Virology, Greifswald-Insel Riems, Germany
| | - Anne Pohlmann
- Friedrich-Loeffler-Institut, Institute of Diagnostic Virology, Greifswald-Insel Riems, Germany
| | - Thomas Nordahl Petersen
- Technical University of Denmark, National Food Institute, WHO Collaborating Center for Antimicrobial Resistance in Foodborne Pathogens and Genomics and European Union Reference Laboratory for Antimicrobial Resistance, Kongens Lyngby, Denmark
| | - Pierrick Lucas
- French Agency for Food, Environmental and Occupational Health and Safety, Laboratory of Ploufragan, Unit of Viral Genetics and Biosafety, Ploufragan, France
| | - Yannick Blanchard
- French Agency for Food, Environmental and Occupational Health and Safety, Laboratory of Ploufragan, Unit of Viral Genetics and Biosafety, Ploufragan, France
| | - Anna Papa
- Microbiology Department, Aristotle University of Thessaloniki, School of Medicine, Thessaloniki, Greece
| | - Angeliki Melidou
- Microbiology Department, Aristotle University of Thessaloniki, School of Medicine, Thessaloniki, Greece
| | - Bas B Oude Munnink
- Department of Viroscience, Erasmus Medical Centre, Rotterdam, The Netherlands
| | | | | | | | - Florian Hansmann
- Department of Pathology, University of Veterinary Medicine Hannover, Hannover, Germany
| | - Wolfgang Baumgärtner
- Department of Pathology, University of Veterinary Medicine Hannover, Hannover, Germany
| | - Erhard van der Vries
- Department of Infectious Diseases and Immunology, University of Utrecht, Utrecht, The Netherlands
| | | | - Cesare Camma
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e Molise G. Caporale, National Reference Center for Whole Genome Sequencing of Microbial Pathogens: Database and Bioinformatic Analysis, Teramo, Italy
| | - Iolanda Mangone
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e Molise G. Caporale, National Reference Center for Whole Genome Sequencing of Microbial Pathogens: Database and Bioinformatic Analysis, Teramo, Italy
| | - Alessio Lorusso
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e Molise G. Caporale, National Reference Center for Whole Genome Sequencing of Microbial Pathogens: Database and Bioinformatic Analysis, Teramo, Italy
| | - Maurilia Marcacci
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e Molise G. Caporale, National Reference Center for Whole Genome Sequencing of Microbial Pathogens: Database and Bioinformatic Analysis, Teramo, Italy
| | - Alexandra Nunes
- Bioinformatics Unit, Department of Infectious Diseases, National Institute of Health (INSA), Lisbon, Portugal
| | - Miguel Pinto
- Bioinformatics Unit, Department of Infectious Diseases, National Institute of Health (INSA), Lisbon, Portugal
| | - Vítor Borges
- Bioinformatics Unit, Department of Infectious Diseases, National Institute of Health (INSA), Lisbon, Portugal
| | - Annelies Kroneman
- National Institute for Public Health and the Environment, Bilthoven, The Netherlands
| | - Dennis Schmitz
- Department of Viroscience, Erasmus Medical Centre, Rotterdam, The Netherlands
- National Institute for Public Health and the Environment, Bilthoven, The Netherlands
| | - Victor Max Corman
- Institute of Virology, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Christian Drosten
- Institute of Virology, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Terry C Jones
- Institute of Virology, Charité-Universitätsmedizin Berlin, Berlin, Germany
- Center for Pathogen Evolution, Department of Zoology, University of Cambridge, Cambridge, United Kingdom
| | - Rene S Hendriksen
- Technical University of Denmark, National Food Institute, WHO Collaborating Center for Antimicrobial Resistance in Foodborne Pathogens and Genomics and European Union Reference Laboratory for Antimicrobial Resistance, Kongens Lyngby, Denmark
| | - Frank M Aarestrup
- Technical University of Denmark, National Food Institute, WHO Collaborating Center for Antimicrobial Resistance in Foodborne Pathogens and Genomics and European Union Reference Laboratory for Antimicrobial Resistance, Kongens Lyngby, Denmark
| | - Marion Koopmans
- Department of Viroscience, Erasmus Medical Centre, Rotterdam, The Netherlands
| | - Martin Beer
- Friedrich-Loeffler-Institut, Institute of Diagnostic Virology, Greifswald-Insel Riems, Germany
| | - Andreas Nitsche
- Robert Koch Institute, Centre for Biological Threats and Special Pathogens 1, Berlin, Germany
| |
Collapse
|