1
|
Álvarez-Narváez S, Harrell TL, Nour I, Mohanty SK, Conrad SJ. Choosing the most suitable NGS technology to combine with a standardized viral enrichment protocol for obtaining complete avian orthoreovirus genomes from metagenomic samples. FRONTIERS IN BIOINFORMATICS 2025; 5:1498921. [PMID: 39967836 PMCID: PMC11833334 DOI: 10.3389/fbinf.2025.1498921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2024] [Accepted: 01/13/2025] [Indexed: 02/20/2025] Open
Abstract
Since viruses are obligate intracellular pathogens, sequencing their genomes results in metagenomic data from both the virus and the host. Virology researchers are constantly seeking new, cost-effective strategies and bioinformatic pipelines for the retrieval of complete viral genomes from these metagenomic samples. Avian orthoreoviruses (ARVs) pose a significant and growing threat to the poultry industry and frequently cause economic losses associated with disease in production birds. Currently available commercial vaccines are ineffective against new ARV variants and ARV outbreaks are increasing worldwide, requiring whole genome sequencing (WGS) to characterize strains that evade vaccines. This study compares the effectiveness of long-read and short-read sequencing technologies for obtaining ARV complete genomes. We used eight clinical isolates of ARV, each previously processed using our published viral genome enrichment protocol. Additionally, we evaluate three assembly methods to determine which provided the most complete and reliable whole genomes: De novo, reference-guided or hybrid. The results suggest that our ARV genome enrichment protocol caused some fragmentation of the viral cDNA that impacted the length of the long reads (but not the short reads) and, as a result, caused a failure to produce complete genomes via de novo assembly. Overall, we observed that regardless of the sequencing technology, the best quality assemblies were generated by mapping quality-trimmed reads to a custom reference genome. The custom reference genomes were in turn constructed with the publicly available ARV genomic segments that shared the highest sequence similarity with the contigs from short-read de novo assemblies. Hence, we conclude that short-read sequencing is the most suitable technology to combine with our ARV genome enrichment protocol.
Collapse
Affiliation(s)
- Sonsiray Álvarez-Narváez
- US National Poultry Research Center, United States Department of Agriculture, Agricultural Research Service, Athens, GA, United States
- Department of Infectious Diseases, College of Veterinary Medicine, University of Georgia, Athens, GA, United States
| | - Telvin L. Harrell
- US National Poultry Research Center, United States Department of Agriculture, Agricultural Research Service, Athens, GA, United States
| | - Islam Nour
- US National Poultry Research Center, United States Department of Agriculture, Agricultural Research Service, Athens, GA, United States
| | - Sujit K. Mohanty
- US National Poultry Research Center, United States Department of Agriculture, Agricultural Research Service, Athens, GA, United States
| | - Steven J. Conrad
- US National Poultry Research Center, United States Department of Agriculture, Agricultural Research Service, Athens, GA, United States
| |
Collapse
|
2
|
Ortigas-Vasquez A, Szpara M. Embracing Complexity: What Novel Sequencing Methods Are Teaching Us About Herpesvirus Genomic Diversity. Annu Rev Virol 2024; 11:67-87. [PMID: 38848592 DOI: 10.1146/annurev-virology-100422-010336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2024]
Abstract
The arrival of novel sequencing technologies throughout the past two decades has led to a paradigm shift in our understanding of herpesvirus genomic diversity. Previously, herpesviruses were seen as a family of DNA viruses with low genomic diversity. However, a growing body of evidence now suggests that herpesviruses exist as dynamic populations that possess standing variation and evolve at much faster rates than previously assumed. In this review, we explore how strategies such as deep sequencing, long-read sequencing, and haplotype reconstruction are allowing scientists to dissect the genomic composition of herpesvirus populations. We also discuss the challenges that need to be addressed before a detailed picture of herpesvirus diversity can emerge.
Collapse
Affiliation(s)
- Alejandro Ortigas-Vasquez
- Departments of Biology and of Biochemistry and Molecular Biology; Center for Infectious Disease Dynamics; and Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania, USA;
| | - Moriah Szpara
- Departments of Biology and of Biochemistry and Molecular Biology; Center for Infectious Disease Dynamics; and Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania, USA;
| |
Collapse
|
3
|
Gu X, Yang Y, Mao F, Lee WL, Armas F, You F, Needham DM, Ng C, Chen H, Chandra F, Gin KY. A comparative study of flow cytometry-sorted communities and shotgun viral metagenomics in a Singapore municipal wastewater treatment plant. IMETA 2022; 1:e39. [PMID: 38868719 PMCID: PMC10989988 DOI: 10.1002/imt2.39] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/20/2022] [Revised: 04/30/2022] [Accepted: 06/19/2022] [Indexed: 06/14/2024]
Abstract
Traditional or "bulk" viral enrichment and amplification methods used in viral metagenomics introduce unavoidable bias in viral diversity. This bias is due to shortcomings in existing viral enrichment methods and overshadowing by the more abundant viral populations. To reduce the complexity and improve the resolution of viral diversity, we developed a strategy coupling fluorescence-activated cell sorting (FACS) with random amplification and compared this to bulk metagenomics. This strategy was validated on both influent and effluent samples from a municipal wastewater treatment plant using the Modified Ludzack-Ettinger (MLE) process as the treatment method. We found that DNA and RNA communities generated using bulk samples were mostly different from those derived following FACS for both treatments before and after MLE. Before MLE treatment, FACS identified five viral families and 512 viral annotated contigs. Up to 43% of mapped reads were not detected in bulk samples. Nucleo-cytoplasmic large DNA viral families were enriched to a greater extent in the FACS-coupled subpopulations compared with bulk samples. FACS-coupled viromes captured a single-contig viral genome associated with Anabaena phage, which was not observed in bulk samples or in FACS-sorted samples after MLE. These short metagenomic reads, which were assembled into a high-quality draft genome of 46 kbp, were found to be highly dominant in one of the pre-MLE FACS annotated virome fractions (57.4%). Using bulk metagenomics, we identified that between Primary Settling Tank and Secondary Settling Tank viromes, Virgaviridae, Astroviridae, Parvoviridae, Picobirnaviridae, Nodaviridae, and Iridoviridae were susceptible to MLE treatment. In all, bulk and FACS-coupled metagenomics are complementary approaches that enable a more thorough understanding of the community structure of DNA and RNA viruses in complex environmental samples, of which the latter is critical for increasing the sensitivity of detection of viral signatures that would otherwise be lost through bulk viral metagenomics.
Collapse
Affiliation(s)
- Xiaoqiong Gu
- Department of Civil and Environmental EngineeringNational University of SingaporeSingaporeSingapore
- Antimicrobial Resistance Interdisciplinary Research GroupSingapore‐MIT Alliance for Research and TechnologySingaporeSingapore
| | - Yi Yang
- NUS Environmental Research InstituteNational University of SingaporeSingaporeSingapore
| | - Feijian Mao
- Department of Civil and Environmental EngineeringNational University of SingaporeSingaporeSingapore
| | - Wei Lin Lee
- Antimicrobial Resistance Interdisciplinary Research GroupSingapore‐MIT Alliance for Research and TechnologySingaporeSingapore
| | - Federica Armas
- Antimicrobial Resistance Interdisciplinary Research GroupSingapore‐MIT Alliance for Research and TechnologySingaporeSingapore
| | - Fang You
- Department of Civil and Environmental EngineeringNational University of SingaporeSingaporeSingapore
| | - David M. Needham
- Monterey Bay Aquarium Research InstituteMoss LandingCaliforniaUSA
- GEOMAR Helmholtz Centre for Ocean ResearchOcean EcoSystems Biology UnitKielGermany
- Department of Biological EngineeringMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
| | - Charmaine Ng
- Department of Civil and Environmental EngineeringNational University of SingaporeSingaporeSingapore
| | - Hongjie Chen
- Department of Civil and Environmental EngineeringNational University of SingaporeSingaporeSingapore
- Antimicrobial Resistance Interdisciplinary Research GroupSingapore‐MIT Alliance for Research and TechnologySingaporeSingapore
| | - Franciscus Chandra
- Department of Civil and Environmental EngineeringNational University of SingaporeSingaporeSingapore
| | - Karina Yew‐Hoong Gin
- Department of Civil and Environmental EngineeringNational University of SingaporeSingaporeSingapore
- NUS Environmental Research InstituteNational University of SingaporeSingaporeSingapore
| |
Collapse
|
4
|
VPipe: an Automated Bioinformatics Platform for Assembly and Management of Viral Next-Generation Sequencing Data. Microbiol Spectr 2022; 10:e0256421. [PMID: 35234489 PMCID: PMC8941893 DOI: 10.1128/spectrum.02564-21] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Next-generation sequencing (NGS) is a powerful tool for detecting and investigating viral pathogens; however, analysis and management of the enormous amounts of data generated from these technologies remains a challenge. Here, we present VPipe (the Viral NGS Analysis Pipeline and Data Management System), an automated bioinformatics pipeline optimized for whole-genome assembly of viral sequences and identification of diverse species. VPipe automates the data quality control, assembly, and contig identification steps typically performed when analyzing NGS data. Users access the pipeline through a secure web-based portal, which provides an easy-to-use interface with advanced search capabilities for reviewing results. In addition, VPipe provides a centralized system for storing and analyzing NGS data, eliminating common bottlenecks in bioinformatics analyses for public health laboratories with limited on-site computational infrastructure. The performance of VPipe was validated through the analysis of publicly available NGS data sets for viral pathogens, generating high-quality assemblies for 12 data sets. VPipe also generated assemblies with greater contiguity than similar pipelines for 41 human respiratory syncytial virus isolates and 23 SARS-CoV-2 specimens. IMPORTANCE Computational infrastructure and bioinformatics analysis are bottlenecks in the application of NGS to viral pathogens. As of September 2021, VPipe has been used by the U.S. Centers for Disease Control and Prevention (CDC) and 12 state public health laboratories to characterize >17,500 and 1,500 clinical specimens and isolates, respectively. VPipe automates genome assembly for a wide range of viruses, including high-consequence pathogens such as SARS-CoV-2. Such automated functionality expedites public health responses to viral outbreaks and pathogen surveillance.
Collapse
|
5
|
Pechlivanis N, Togkousidis A, Tsagiopoulou M, Sgardelis S, Kappas I, Psomopoulos F. A Computational Framework for Pattern Detection on Unaligned Sequences: An Application on SARS-CoV-2 Data. Front Genet 2021; 12:618170. [PMID: 34122498 PMCID: PMC8194296 DOI: 10.3389/fgene.2021.618170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 05/04/2021] [Indexed: 11/13/2022] Open
Abstract
The exponential growth of genome sequences available has spurred research on pattern detection with the aim of extracting evolutionary signal. Traditional approaches, such as multiple sequence alignment, rely on positional homology in order to reconstruct the phylogenetic history of taxa. Yet, mining information from the plethora of biological data and delineating species on a genetic basis, still proves to be an extremely difficult problem to consider. Multiple algorithms and techniques have been developed in order to approach the problem multidimensionally. Here, we propose a computational framework for identifying potentially meaningful features based on k-mers retrieved from unaligned sequence data. Specifically, we have developed a process which makes use of unsupervised learning techniques in order to identify characteristic k-mers of the input dataset across a range of different k-values and within a reasonable time frame. We use these k-mers as features for clustering the input sequences and identifying differences between the distributions of k-mers across the dataset. The developed algorithm is part of an innovative and much promising approach both to the problem of grouping sequence data based on their inherent characteristic features, as well as for the study of changes in the distributions of k-mers, as the k-value is fluctuating within a range of values. Our framework is fully developed in Python language as an open source software licensed under the MIT License, and is freely available at https://github.com/BiodataAnalysisGroup/kmerAnalyzer.
Collapse
Affiliation(s)
- Nikolaos Pechlivanis
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
- Department of Genetics, Development and Molecular Biology, School of Biology, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Anastasios Togkousidis
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
| | - Maria Tsagiopoulou
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
| | - Stefanos Sgardelis
- Department of Ecology, School of Biology, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Ilias Kappas
- Department of Genetics, Development and Molecular Biology, School of Biology, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Fotis Psomopoulos
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
| |
Collapse
|
6
|
López-Leal G, Reyes-Muñoz A, Santamaria RI, Cevallos MA, Pérez-Monter C, Castillo-Ramírez S. A novel vieuvirus from multidrug-resistant Acinetobacter baumannii. Arch Virol 2021; 166:1401-1408. [PMID: 33635432 DOI: 10.1007/s00705-021-05010-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Accepted: 02/16/2021] [Indexed: 11/26/2022]
Abstract
Bacteriophages are considered the most abundant biological entities on earth, and they are able to modulate the populations of their bacterial hosts. Although the potential of bacteriophages has been accepted as an alternative strategy to combat multidrug-resistant pathogenic bacteria, there still exists a considerable knowledge gap regarding their genetic diversity, which hinders their use as antimicrobial agents. In this study, we undertook a genomic and phylogenetic characterization of the phage Ab11510-phi, which was isolated from a multidrug-resistant Acinetobacter baumannii strain (Ab11510). We found that Ab11510-phi has a narrow host range and belongs to a small group of transposable phages of the genus Vieuvirus that have only been reported to infect Acinetobacter bacteria. Finally, we showed that Ab11510-phi (as well as other vieuvirus phages) has a high level of mosaicism. On a broader level, we demonstrate that comparative genomics and phylogenetic analysis are necessary tools for the proper characterization of phage diversity.
Collapse
Affiliation(s)
- Gamaliel López-Leal
- Grupo de Biología Computacional y Ecología Microbiana, Departamento de Ciencias Biológicas, Universidad de los Andes, Bogotá, D.C., Colombia.
| | - Alejandro Reyes-Muñoz
- Grupo de Biología Computacional y Ecología Microbiana, Departamento de Ciencias Biológicas, Universidad de los Andes, Bogotá, D.C., Colombia
| | - Rosa Isela Santamaria
- Programa de Genómica Evolutiva, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - Miguel A Cevallos
- Programa de Genómica Evolutiva, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - Carlos Pérez-Monter
- Departamento de Gastroenterología, Instituto Nacional de Ciencias Médicas y Nutrición Salvador Zubirán, México City, México
| | - Santiago Castillo-Ramírez
- Programa de Genómica Evolutiva, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| |
Collapse
|