1
|
Ortigas-Vasquez A, Szpara M. Embracing Complexity: What Novel Sequencing Methods Are Teaching Us About Herpesvirus Genomic Diversity. Annu Rev Virol 2024; 11:67-87. [PMID: 38848592 DOI: 10.1146/annurev-virology-100422-010336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2024]
Abstract
The arrival of novel sequencing technologies throughout the past two decades has led to a paradigm shift in our understanding of herpesvirus genomic diversity. Previously, herpesviruses were seen as a family of DNA viruses with low genomic diversity. However, a growing body of evidence now suggests that herpesviruses exist as dynamic populations that possess standing variation and evolve at much faster rates than previously assumed. In this review, we explore how strategies such as deep sequencing, long-read sequencing, and haplotype reconstruction are allowing scientists to dissect the genomic composition of herpesvirus populations. We also discuss the challenges that need to be addressed before a detailed picture of herpesvirus diversity can emerge.
Collapse
Affiliation(s)
- Alejandro Ortigas-Vasquez
- Departments of Biology and of Biochemistry and Molecular Biology; Center for Infectious Disease Dynamics; and Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania, USA;
| | - Moriah Szpara
- Departments of Biology and of Biochemistry and Molecular Biology; Center for Infectious Disease Dynamics; and Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania, USA;
| |
Collapse
|
2
|
Gupta AK, Kumar M. Benchmarking and Assessment of Eight De Novo Genome Assemblers on Viral Next-Generation Sequencing Data, Including the SARS-CoV-2. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2022; 26:372-381. [PMID: 35759429 DOI: 10.1089/omi.2022.0042] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Viral genomics has become crucial in clinical diagnostics and ecology, not to mention to stem the COVID-19 pandemic. Whole-genome sequencing (WGS) is pivotal in gaining an improved understanding of viral evolution, genomic epidemiology, infectious outbreaks, pathobiology, clinical management, and vaccine development. Genome assembly is one of the crucial steps in WGS data analyses. A series of different assemblers has been developed with the advent of high-throughput next-generation sequencing (NGS). Various studies have reported the evaluation of these assembly tools on distinct datasets; however, these lack data from viral origin. In this study, we performed a comparative evaluation and benchmarking of eight de novo assemblers: SOAPdenovo, Velvet, assembly by short sequences (ABySS), iterative De Bruijn graph assembler (IDBA), SPAdes, Edena, iterative virus assembler, and VICUNA on the viral NGS data from distinct Illumina (GAIIx, Hiseq, Miseq, and Nextseq) platforms. WGS data of diverse viruses, that is, severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), dengue virus 3, human immunodeficiency virus 1, hepatitis B virus, human herpesvirus 8, human papillomavirus 16, rhinovirus A, and West Nile virus, were utilized to assess these assemblers. Performance metrics such as genome fraction recovery, assembly lengths, NG50, N50, contig length, contig numbers, mismatches, and misassemblies were analyzed. Overall, three assemblers, that is, SPAdes, IDBA, and ABySS, performed consistently well, including for genome assembly of SARS-CoV-2. These assembly methods should be considered and recommended for future studies of viruses. The study also suggests that implementing two or more assembly approaches should be considered in viral NGS studies, especially in clinical settings. Taken together, the benchmarking of eight de novo genome assemblers reported in this study can inform future public health and ecology research concerning the viruses, the COVID-19 pandemic, and viral outbreaks.
Collapse
Affiliation(s)
- Amit Kumar Gupta
- Virology Unit and Bioinformatics Centre, Institute of Microbial Technology, Council of Scientific and Industrial Research (CSIR), Chandigarh, India
| | - Manoj Kumar
- Virology Unit and Bioinformatics Centre, Institute of Microbial Technology, Council of Scientific and Industrial Research (CSIR), Chandigarh, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| |
Collapse
|
3
|
Knyazev S, Tsyvina V, Shankar A, Melnyk A, Artyomenko A, Malygina T, Porozov YB, Campbell EM, Switzer WM, Skums P, Mangul S, Zelikovsky A. Accurate assembly of minority viral haplotypes from next-generation sequencing through efficient noise reduction. Nucleic Acids Res 2021; 49:e102. [PMID: 34214168 PMCID: PMC8464054 DOI: 10.1093/nar/gkab576] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 05/25/2021] [Accepted: 06/18/2021] [Indexed: 12/21/2022] Open
Abstract
Rapidly evolving RNA viruses continuously produce minority haplotypes that can become dominant if they are drug-resistant or can better evade the immune system. Therefore, early detection and identification of minority viral haplotypes may help to promptly adjust the patient’s treatment plan preventing potential disease complications. Minority haplotypes can be identified using next-generation sequencing, but sequencing noise hinders accurate identification. The elimination of sequencing noise is a non-trivial task that still remains open. Here we propose CliqueSNV based on extracting pairs of statistically linked mutations from noisy reads. This effectively reduces sequencing noise and enables identifying minority haplotypes with the frequency below the sequencing error rate. We comparatively assess the performance of CliqueSNV using an in vitro mixture of nine haplotypes that were derived from the mutation profile of an existing HIV patient. We show that CliqueSNV can accurately assemble viral haplotypes with frequencies as low as 0.1% and maintains consistent performance across short and long bases sequencing platforms.
Collapse
Affiliation(s)
- Sergey Knyazev
- Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA.,Division of HIV Prevention, Centers for Disease Control and Prevention, Atlanta, GA 30333, USA.,Oak Ridge Institute for Science and Education, Oak Ridge, TN 37830, USA
| | - Viachaslau Tsyvina
- Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA
| | - Anupama Shankar
- Division of HIV Prevention, Centers for Disease Control and Prevention, Atlanta, GA 30333, USA
| | - Andrew Melnyk
- Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA
| | | | - Tatiana Malygina
- International Scientific and Research Institute of Bioengineering, ITMO University, St. Petersburg 197101, Russia
| | - Yuri B Porozov
- World-Class Research Center "Digital biodesign and personalized healthcare", I.M. Sechenov First Moscow State Medical University, Moscow 119991, Russia.,Department of Computational Biology, Sirius University of Science and Technology, Sochi 354340, Russia
| | - Ellsworth M Campbell
- Division of HIV Prevention, Centers for Disease Control and Prevention, Atlanta, GA 30333, USA
| | - William M Switzer
- Division of HIV Prevention, Centers for Disease Control and Prevention, Atlanta, GA 30333, USA
| | - Pavel Skums
- Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA
| | - Serghei Mangul
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, Los Angeles, CA 90089, USA
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA.,World-Class Research Center "Digital biodesign and personalized healthcare", I.M. Sechenov First Moscow State Medical University, Moscow 119991, Russia
| |
Collapse
|
4
|
Posada-Céspedes S, Seifert D, Topolsky I, Jablonski KP, Metzner KJ, Beerenwinkel N. V-pipe: a computational pipeline for assessing viral genetic diversity from high-throughput data. Bioinformatics 2021; 37:1673-1680. [PMID: 33471068 PMCID: PMC8289377 DOI: 10.1093/bioinformatics/btab015] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Revised: 12/09/2020] [Accepted: 01/08/2021] [Indexed: 12/30/2022] Open
Abstract
Motivation High-throughput sequencing technologies are used increasingly not only in viral genomics research but also in clinical surveillance and diagnostics. These technologies facilitate the assessment of the genetic diversity in intra-host virus populations, which affects transmission, virulence and pathogenesis of viral infections. However, there are two major challenges in analysing viral diversity. First, amplification and sequencing errors confound the identification of true biological variants, and second, the large data volumes represent computational limitations. Results To support viral high-throughput sequencing studies, we developed V-pipe, a bioinformatics pipeline combining various state-of-the-art statistical models and computational tools for automated end-to-end analyses of raw sequencing reads. V-pipe supports quality control, read mapping and alignment, low-frequency mutation calling, and inference of viral haplotypes. For generating high-quality read alignments, we developed a novel method, called ngshmmalign, based on profile hidden Markov models and tailored to small and highly diverse viral genomes. V-pipe also includes benchmarking functionality providing a standardized environment for comparative evaluations of different pipeline configurations. We demonstrate this capability by assessing the impact of three different read aligners (Bowtie 2, BWA MEM, ngshmmalign) and two different variant callers (LoFreq, ShoRAH) on the performance of calling single-nucleotide variants in intra-host virus populations. V-pipe supports various pipeline configurations and is implemented in a modular fashion to facilitate adaptations to the continuously changing technology landscape. Availabilityand implementation V-pipe is freely available at https://github.com/cbg-ethz/V-pipe. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Susana Posada-Céspedes
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, 4058, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, 4058, Switzerland
| | - David Seifert
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, 4058, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, 4058, Switzerland
| | - Ivan Topolsky
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, 4058, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, 4058, Switzerland
| | - Kim Philipp Jablonski
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, 4058, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, 4058, Switzerland
| | - Karin J Metzner
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, Zurich, 8091, Switzerland.,4 Institute of Medical Virology, University of Zurich, Zurich, 8091, Switzerland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, 4058, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, 4058, Switzerland
| |
Collapse
|
5
|
Knyazev S, Hughes L, Skums P, Zelikovsky A. Epidemiological data analysis of viral quasispecies in the next-generation sequencing era. Brief Bioinform 2021; 22:96-108. [PMID: 32568371 PMCID: PMC8485218 DOI: 10.1093/bib/bbaa101] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Revised: 04/24/2020] [Accepted: 05/04/2020] [Indexed: 01/04/2023] Open
Abstract
The unprecedented coverage offered by next-generation sequencing (NGS) technology has facilitated the assessment of the population complexity of intra-host RNA viral populations at an unprecedented level of detail. Consequently, analysis of NGS datasets could be used to extract and infer crucial epidemiological and biomedical information on the levels of both infected individuals and susceptible populations, thus enabling the development of more effective prevention strategies and antiviral therapeutics. Such information includes drug resistance, infection stage, transmission clusters and structures of transmission networks. However, NGS data require sophisticated analysis dealing with millions of error-prone short reads per patient. Prior to the NGS era, epidemiological and phylogenetic analyses were geared toward Sanger sequencing technology; now, they must be redesigned to handle the large-scale NGS datasets and properly model the evolution of heterogeneous rapidly mutating viral populations. Additionally, dedicated epidemiological surveillance systems require big data analytics to handle millions of reads obtained from thousands of patients for rapid outbreak investigation and management. We survey bioinformatics tools analyzing NGS data for (i) characterization of intra-host viral population complexity including single nucleotide variant and haplotype calling; (ii) downstream epidemiological analysis and inference of drug-resistant mutations, age of infection and linkage between patients; and (iii) data collection and analytics in surveillance systems for fast response and control of outbreaks.
Collapse
|
6
|
Abstract
Alphaherpesviruses, as large double-stranded DNA viruses, were long considered to be genetically stable and to exist in a homogeneous state. Recently, the proliferation of high-throughput sequencing (HTS) and bioinformatics analysis has expanded our understanding of herpesvirus genomes and the variations found therein. Recent data indicate that herpesviruses exist as diverse populations, both in culture and in vivo, in a manner reminiscent of RNA viruses. In this review, we discuss the past, present, and potential future of alphaherpesvirus genomics, including the technical challenges that face the field. We also review how recent data has enabled genome-wide comparisons of sequence diversity, recombination, allele frequency, and selective pressures, including those introduced by cell culture. While we focus on the human alphaherpesviruses, we draw key insights from related veterinary species and from the beta- and gamma-subfamilies of herpesviruses. Promising technologies and potential future directions for herpesvirus genomics are highlighted as well, including the potential to link viral genetic differences to phenotypic and disease outcomes.
Collapse
Affiliation(s)
- Chad V. Kuny
- Departments of Biology, and Biochemistry and Molecular Biology, Center for Infectious Disease Dynamics, and the Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania, 16802, USA
| | - Moriah L. Szpara
- Departments of Biology, and Biochemistry and Molecular Biology, Center for Infectious Disease Dynamics, and the Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania, 16802, USA
| |
Collapse
|
7
|
Streamlined Subpopulation, Subtype, and Recombination Analysis of HIV-1 Half-Genome Sequences Generated by High-Throughput Sequencing. mSphere 2020; 5:5/5/e00551-20. [PMID: 33055255 PMCID: PMC7565892 DOI: 10.1128/msphere.00551-20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
The highly recombinogenic nature of human immunodeficiency virus type 1 (HIV-1) leads to recombination and emergence of quasispecies. It is important to reliably identify subpopulations to understand the complexity of a viral population for drug resistance surveillance and vaccine development. High-throughput sequencing (HTS) provides improved resolution over Sanger sequencing for the analysis of heterogeneous viral subpopulations. However, current methods of analysis of HTS reads are unable to fully address accurate population reconstruction. Hence, there is a dire need for a more sensitive, accurate, user-friendly, and cost-effective method to analyze viral quasispecies. For this purpose, we have improved the HIVE-hexahedron algorithm that we previously developed with in silico short sequences to analyze raw HTS short reads. The significance of this study is that our standalone algorithm enables a streamlined analysis of quasispecies, subtype, and recombination patterns from long HIV-1 genome regions without the need of additional sequence analysis tools. Distinct viral populations and recombination patterns identified by HIVE-hexahedron are further validated by comparison with sequences obtained by single genome sequencing (SGS). High-throughput sequencing (HTS) has been widely used to characterize HIV-1 genome sequences. There are no algorithms currently that can directly determine genotype and quasispecies population using short HTS reads generated from long genome sequences without additional software. To establish a robust subpopulation, subtype, and recombination analysis workflow, we amplified the HIV-1 3′-half genome from plasma samples of 65 HIV-1-infected individuals and sequenced the entire amplicon (∼4,500 bp) by HTS. With direct analysis of raw reads using HIVE-hexahedron, we showed that 48% of samples harbored 2 to 13 subpopulations. We identified various subtypes (17 A1s, 4 Bs, 27 Cs, 6 CRF02_AGs, and 11 unique recombinant forms) and defined recombinant breakpoints of 10 recombinants. These results were validated with viral genome sequences generated by single genome sequencing (SGS) or the analysis of consensus sequence of the HTS reads. The HIVE-hexahedron workflow is more sensitive and accurate than just evaluating the consensus sequence and also more cost-effective than SGS. IMPORTANCE The highly recombinogenic nature of human immunodeficiency virus type 1 (HIV-1) leads to recombination and emergence of quasispecies. It is important to reliably identify subpopulations to understand the complexity of a viral population for drug resistance surveillance and vaccine development. High-throughput sequencing (HTS) provides improved resolution over Sanger sequencing for the analysis of heterogeneous viral subpopulations. However, current methods of analysis of HTS reads are unable to fully address accurate population reconstruction. Hence, there is a dire need for a more sensitive, accurate, user-friendly, and cost-effective method to analyze viral quasispecies. For this purpose, we have improved the HIVE-hexahedron algorithm that we previously developed with in silico short sequences to analyze raw HTS short reads. The significance of this study is that our standalone algorithm enables a streamlined analysis of quasispecies, subtype, and recombination patterns from long HIV-1 genome regions without the need of additional sequence analysis tools. Distinct viral populations and recombination patterns identified by HIVE-hexahedron are further validated by comparison with sequences obtained by single genome sequencing (SGS).
Collapse
|
8
|
Kumar S, Williams RS, Wang Z. Third-order nanocircuit elements for neuromorphic engineering. Nature 2020; 585:518-523. [PMID: 32968256 DOI: 10.1038/s41586-020-2735-5] [Citation(s) in RCA: 98] [Impact Index Per Article: 19.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Accepted: 08/03/2020] [Indexed: 11/09/2022]
Abstract
Current hardware approaches to biomimetic or neuromorphic artificial intelligence rely on elaborate transistor circuits to simulate biological functions. However, these can instead be more faithfully emulated by higher-order circuit elements that naturally express neuromorphic nonlinear dynamics1-4. Generating neuromorphic action potentials in a circuit element theoretically requires a minimum of third-order complexity (for example, three dynamical electrophysical processes)5, but there have been few examples of second-order neuromorphic elements, and no previous demonstration of any isolated third-order element6-8. Using both experiments and modelling, here we show how multiple electrophysical processes-including Mott transition dynamics-form a nanoscale third-order circuit element. We demonstrate simple transistorless networks of third-order elements that perform Boolean operations and find analogue solutions to a computationally hard graph-partitioning problem. This work paves a way towards very compact and densely functional neuromorphic computing primitives, and energy-efficient validation of neuroscientific models.
Collapse
|
9
|
Eliseev A, Gibson KM, Avdeyev P, Novik D, Bendall ML, Pérez-Losada M, Alexeev N, Crandall KA. Evaluation of haplotype callers for next-generation sequencing of viruses. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2020; 82:104277. [PMID: 32151775 PMCID: PMC7293574 DOI: 10.1016/j.meegid.2020.104277] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/04/2019] [Revised: 03/04/2020] [Accepted: 03/06/2020] [Indexed: 01/30/2023]
Abstract
Currently, the standard practice for assembling next-generation sequencing (NGS) reads of viral genomes is to summarize thousands of individual short reads into a single consensus sequence, thus confounding useful intra-host diversity information for molecular phylodynamic inference. It is hypothesized that a few viral strains may dominate the intra-host genetic diversity with a variety of lower frequency strains comprising the rest of the population. Several software tools currently exist to convert NGS sequence variants into haplotypes. Previous benchmarks of viral haplotype reconstruction programs used simulation scenarios that are useful from a mathematical perspective but do not reflect viral evolution and epidemiology. Here, we tested twelve NGS haplotype reconstruction methods using viral populations simulated under realistic evolutionary dynamics. We simulated coalescent-based populations that spanned known levels of viral genetic diversity, including mutation rates, sample size and effective population size, to test the limits of the haplotype reconstruction methods and to ensure coverage of predicted intra-host viral diversity levels (especially HIV-1). All twelve investigated haplotype callers showed variable performance and produced drastically different results that were mainly driven by differences in mutation rate and, to a lesser extent, in effective population size. Most methods were able to accurately reconstruct haplotypes when genetic diversity was low. However, under higher levels of diversity (e.g., those seen intra-host HIV-1 infections), haplotype reconstruction quality was highly variable and, on average, poor. All haplotype reconstruction tools, except QuasiRecomb and ShoRAH, greatly underestimated intra-host diversity and the true number of haplotypes. PredictHaplo outperformed, in regard to highest precision, recall, and lowest UniFrac distance values, the other haplotype reconstruction tools followed by CliqueSNV, which, given more computational time, may have outperformed PredictHaplo. Here, we present an extensive comparison of available viral haplotype reconstruction tools and provide insights for future improvements in haplotype reconstruction tools using both short-read and long-read technologies.
Collapse
Affiliation(s)
- Anton Eliseev
- Computer Technologies Laboratory, ITMO University, Saint-Petersburg, Russia
| | - Keylie M Gibson
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA.
| | - Pavel Avdeyev
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA; Department of Mathematics, George Washington University, Washington, DC, USA
| | - Dmitry Novik
- Computer Technologies Laboratory, ITMO University, Saint-Petersburg, Russia
| | - Matthew L Bendall
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA
| | - Marcos Pérez-Losada
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA; Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, George Washington University, Washington, DC, USA; CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Campus Agrário de Vairão, Vairão, Portugal
| | - Nikita Alexeev
- Computer Technologies Laboratory, ITMO University, Saint-Petersburg, Russia
| | - Keith A Crandall
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA; Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, George Washington University, Washington, DC, USA
| |
Collapse
|
10
|
I. Sardi S, H. Carvalho R, C. Pacheco LG, P. d. Almeida JP, M. d. A. Belitardo EM, S. Pinheiro C, S. Campos G, R. G. R. Aguiar E. High-Quality Resolution of the Outbreak-Related Zika Virus Genome and Discovery of New Viruses Using Ion Torrent-Based Metatranscriptomics. Viruses 2020; 12:v12070782. [PMID: 32708079 PMCID: PMC7411838 DOI: 10.3390/v12070782] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2020] [Revised: 05/15/2020] [Accepted: 05/20/2020] [Indexed: 01/13/2023] Open
Abstract
Arboviruses, including the Zika virus, have recently emerged as one of the most important threats to human health. The use of metagenomics-based approaches has already proven valuable to aid surveillance of arboviral infections, and the ability to reconstruct complete viral genomes from metatranscriptomics data is key to the development of new control strategies for these diseases. Herein, we used RNA-based metatranscriptomics associated with Ion Torrent deep sequencing to allow for the high-quality reconstitution of an outbreak-related Zika virus (ZIKV) genome (10,739 nt), with extended 5'-UTR and 3'-UTR regions, using a newly-implemented bioinformatics approach. Besides allowing for the assembly of one of the largest complete ZIKV genomes to date, our strategy also yielded high-quality complete genomes of two arthropod-infecting viruses co-infecting C6/36 cell lines, namely: Alphamesonivirus 1 strain Salvador (20,194 nt) and Aedes albopictus totivirus-like (4618 nt); the latter likely represents a new viral species. Altogether, our results demonstrate that our bioinformatics approach associated with Ion Torrent sequencing allows for the high-quality reconstruction of known and unknown viral genomes, overcoming the main limitation of RNA deep sequencing for virus identification.
Collapse
Affiliation(s)
- Silvia I. Sardi
- Laboratory of Virology, Instituto de Ciências da Saúde, Universidade Federal da Bahia, Salvador, Bahia 40.110-100, Brazil; (S.I.S.); (R.H.C.); (G.S.C.)
| | - Rejane H. Carvalho
- Laboratory of Virology, Instituto de Ciências da Saúde, Universidade Federal da Bahia, Salvador, Bahia 40.110-100, Brazil; (S.I.S.); (R.H.C.); (G.S.C.)
| | - Luis G. C. Pacheco
- Post-Graduate Program in Biotechnology, Instituto de Ciências da Saúde, Universidade Federal da Bahia, Salvador, Bahia 40.110-100, Brazil; (L.G.C.P.); (C.S.P.)
| | - João P. P. d. Almeida
- Department of Biochemistry and Immunology, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte (UFMG), Minas Gerais 31270-901, Brazil;
| | - Emilia M. M. d. A. Belitardo
- Post-Graduate Program in Immunology, Instituto de Ciências da Saúde, Universidade Federal da Bahia, Salvador, Bahia 40.110-100, Brazil;
| | - Carina S. Pinheiro
- Post-Graduate Program in Biotechnology, Instituto de Ciências da Saúde, Universidade Federal da Bahia, Salvador, Bahia 40.110-100, Brazil; (L.G.C.P.); (C.S.P.)
| | - Gúbio S. Campos
- Laboratory of Virology, Instituto de Ciências da Saúde, Universidade Federal da Bahia, Salvador, Bahia 40.110-100, Brazil; (S.I.S.); (R.H.C.); (G.S.C.)
| | - Eric R. G. R. Aguiar
- Post-Graduate Program in Biotechnology, Instituto de Ciências da Saúde, Universidade Federal da Bahia, Salvador, Bahia 40.110-100, Brazil; (L.G.C.P.); (C.S.P.)
- Virus Bioinformatics Laboratory, Department of Biological Science (DCB), Center of Biotechnology and Genetics (CBG), State University of Santa Cruz (UESC), Rodovia Ilhéus-Itabuna km 16, Ilhéus, Bahia 45652-900, Brazil
- Correspondence:
| |
Collapse
|
11
|
Katsiani A, Stainton D, Lamour K, Tzanetakis IE. The population structure of Rose rosette virus in the USA. J Gen Virol 2020; 101:676-684. [PMID: 32375952 DOI: 10.1099/jgv.0.001418] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Rose rosette virus (RRV) (genus Emaravirus) is the causal agent of the homonymous disease, the most destructive malady of roses in the USA. Although the importance of the disease is recognized, little sequence information and no full genomes are available for RRV, a multi-segmented RNA virus. To better understand the population structure of the virus we implemented a Hi-Plex PCR amplicon high-throughput sequencing approach to sequence all 7 segments and to quantify polymorphisms in 91 RRV isolates collected from 16 states in the USA. Analysis revealed insertion/deletion (indel) polymorphisms primarily in the 5' and 3' non-coding, but also within coding regions, including some resulting in changes of protein length. Phylogenetic analysis showed little geographical structuring, suggesting that topography does not have a strong influence on virus evolution. Overall, the virus populations were homogeneous, possibly because of regular movement of plants, the recent emergence of RRV and/or because the virus is under strong purification selection to preserve its integrity and biological functions.
Collapse
Affiliation(s)
- Asimina Katsiani
- Department of Entomology and Plant Pathology, Division of Agriculture, University of Arkansas System, Fayetteville AR 72701, USA
| | - Daisy Stainton
- Department of Entomology and Plant Pathology, Division of Agriculture, University of Arkansas System, Fayetteville AR 72701, USA
| | - Kurt Lamour
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN 37996, USA
| | - Ioannis E Tzanetakis
- Department of Entomology and Plant Pathology, Division of Agriculture, University of Arkansas System, Fayetteville AR 72701, USA
| |
Collapse
|
12
|
Alves BM, Siqueira JD, Prellwitz IM, Botelho OM, Da Hora VP, Sanabani S, Recordon-Pinson P, Fleury H, Soares EA, Soares MA. Estimating HIV-1 Genetic Diversity in Brazil Through Next-Generation Sequencing. Front Microbiol 2019; 10:749. [PMID: 31024510 PMCID: PMC6465556 DOI: 10.3389/fmicb.2019.00749] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2018] [Accepted: 03/25/2019] [Indexed: 12/26/2022] Open
Abstract
Approximately 36.7 million people were living with the human immunodeficiency virus (HIV) at the end of 2016 according to UNAIDS, representing a global prevalence rate of 0.8%. In Brazil, an HIV prevalence of 0.24% has been estimated, which represents approximately 830,000 individuals living with the virus. As a touristic and commercial hub in Latin America, Brazil harbors an elevated HIV genetic variability, further contributed by the selective pressure exerted by the host immune system and by antiretroviral treatment. Through the progress of the next-generation sequencing (NGS) techniques, it has been possible to expand the study of HIV genetic diversity, evolutionary, and epidemic processes, allowing the generation of HIV complete or near full-length genomes (NFLG) and improving the characterization of intra- and interhost diversity of viral populations. Greater sensitivity in the detection of viral recombinant forms represents one of the major improvements associated with this development. It is possible to identify unique or circulating recombinant forms using the near full-length viral genomes with increasing accuracy. It also permits the characterization of multiple viral infections within individual hosts. Previous Brazilian studies using NGS to analyze HIV diversity were able to identify several distinct unique and circulating recombinant forms and evidenced dual infections. These data unveiled unprecedented high rates of viral recombination and highlighted that novel recombinants are continually arising in the Brazilian epidemic. In the pooled analysis depicted in this report, HIV subtypes have been determined from HIV-positive patients in five states of Brazil with some of the highest HIV prevalence, three in the Southeast (Rio de Janeiro, São Paulo, and Minas Gerais), one in the Northeast (Pernambuco) and one in the South (Rio Grande do Sul). Combined data analysis showed a significant prevalence of recombinant forms (29%; 101/350), and a similar 26% when only NFLGs were considered. Moreover, the analysis was able to evidence the occurrence of multiple infections in some individuals. Our data highlight the great HIV genetic diversity found in Brazil and unveils a more accurate scenario of the HIV evolutionary dynamics in the region.
Collapse
Affiliation(s)
- Brunna M Alves
- Programa de Oncovirologia, Instituto Nacional de Câncer, Rio de Janeiro, Brazil
| | - Juliana D Siqueira
- Programa de Oncovirologia, Instituto Nacional de Câncer, Rio de Janeiro, Brazil
| | - Isabel M Prellwitz
- Programa de Oncovirologia, Instituto Nacional de Câncer, Rio de Janeiro, Brazil
| | - Ornella M Botelho
- Programa de Oncovirologia, Instituto Nacional de Câncer, Rio de Janeiro, Brazil
| | - Vanusa P Da Hora
- Laboratório de Biologia Molecular, Escola de Medicina, Universidade Federal do Rio Grande, Rio Grande do Sul, Brazil
| | - Sabri Sanabani
- LIM-3, Hospital das Clinicas FMUSP, Faculty of Medicine, University of São Paulo, São Paulo, Brazil
| | | | - Hervé Fleury
- CNRS MFP-UMR 5234, University Hospital of Bordeaux, University of Bordeaux, Bordeaux, France
| | - Esmeralda A Soares
- Programa de Oncovirologia, Instituto Nacional de Câncer, Rio de Janeiro, Brazil
| | - Marcelo A Soares
- Programa de Oncovirologia, Instituto Nacional de Câncer, Rio de Janeiro, Brazil.,Departamento de Genética, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| |
Collapse
|
13
|
Raja R, Pareek A, Newar K, Dixit NM. Mutational pathway maps and founder effects define the within-host spectrum of hepatitis C virus mutants resistant to drugs. PLoS Pathog 2019; 15:e1007701. [PMID: 30934020 PMCID: PMC6459561 DOI: 10.1371/journal.ppat.1007701] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2018] [Revised: 04/11/2019] [Accepted: 03/13/2019] [Indexed: 12/11/2022] Open
Abstract
Knowledge of the within-host frequencies of resistance-associated amino acid variants (RAVs) is important to the identification of optimal drug combinations for the treatment of hepatitis C virus (HCV) infection. Multiple RAVs may exist in infected individuals, often below detection limits, at any resistance locus, defining the diversity of accessible resistance pathways. We developed a multiscale mathematical model to estimate the pre-treatment frequencies of the entire spectrum of mutants at chosen loci. Using a codon-level description of amino acids, we performed stochastic simulations of intracellular dynamics with every possible nucleotide variant as the infecting strain and estimated the relative infectivity of each variant and the resulting distribution of variants produced. We employed these quantities in a deterministic multi-strain model of extracellular dynamics and estimated mutant frequencies. Our predictions captured database frequencies of the RAV R155K, resistant to NS3/4A protease inhibitors, presenting a successful test of our formalism. We found that mutational pathway maps, interconnecting all viable mutants, and strong founder effects determined the mutant spectrum. The spectra were vastly different for HCV genotypes 1a and 1b, underlying their differential responses to drugs. Using a fitness landscape determined recently, we estimated that 13 amino acid variants, encoded by 44 codons, exist at the residue 93 of the NS5A protein, illustrating the massive diversity of accessible resistance pathways at specific loci. Accounting for this diversity, which our model enables, would help optimize drug combinations. Our model may be applied to describe the within-host evolution of other flaviviruses and inform vaccine design strategies.
Collapse
Affiliation(s)
- Rubesh Raja
- Department of Chemical Engineering, Indian Institute of Science, Bangalore, India
| | - Aditya Pareek
- Department of Chemical Engineering, Indian Institute of Science, Bangalore, India
| | - Kapil Newar
- Department of Chemical Engineering, Indian Institute of Science, Bangalore, India
| | - Narendra M. Dixit
- Department of Chemical Engineering, Indian Institute of Science, Bangalore, India
- Centre for Biosystems Science and Engineering, Indian Institute of Science, Bangalore, India
- * E-mail:
| |
Collapse
|
14
|
Full-Length Envelope Analyzer (FLEA): A tool for longitudinal analysis of viral amplicons. PLoS Comput Biol 2018; 14:e1006498. [PMID: 30543621 PMCID: PMC6314628 DOI: 10.1371/journal.pcbi.1006498] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2018] [Revised: 01/02/2019] [Accepted: 09/10/2018] [Indexed: 01/07/2023] Open
Abstract
Next generation sequencing of viral populations has advanced our understanding of viral population dynamics, the development of drug resistance, and escape from host immune responses. Many applications require complete gene sequences, which can be impossible to reconstruct from short reads. HIV env, the protein of interest for HIV vaccine studies, is exceptionally challenging for long-read sequencing and analysis due to its length, high substitution rate, and extensive indel variation. While long-read sequencing is attractive in this setting, the analysis of such data is not well handled by existing methods. To address this, we introduce FLEA (Full-Length Envelope Analyzer), which performs end-to-end analysis and visualization of long-read sequencing data. FLEA consists of both a pipeline (optionally run on a high-performance cluster), and a client-side web application that provides interactive results. The pipeline transforms FASTQ reads into high-quality consensus sequences (HQCSs) and uses them to build a codon-aware multiple sequence alignment. The resulting alignment is then used to infer phylogenies, selection pressure, and evolutionary dynamics. The web application provides publication-quality plots and interactive visualizations, including an annotated viral alignment browser, time series plots of evolutionary dynamics, visualizations of gene-wide selective pressures (such as dN/dS) across time and across protein structure, and a phylogenetic tree browser. We demonstrate how FLEA may be used to process Pacific Biosciences HIV env data and describe recent examples of its use. Simulations show how FLEA dramatically reduces the error rate of this sequencing platform, providing an accurate portrait of complex and variable HIV env populations. A public instance of FLEA is hosted at http://flea.datamonkey.org. The Python source code for the FLEA pipeline can be found at https://github.com/veg/flea-pipeline. The client-side application is available at https://github.com/veg/flea-web-app. A live demo of the P018 results can be found at http://flea.murrell.group/view/P018. Viral populations constantly evolve and diversify. In this article we introduce a method, FLEA, for reconstructing and visualizing the details of evolutionary changes. FLEA specifically processes data from sequencing platforms that generate reads that are long, but error-prone. To study the evolutionary dynamics of entire genes during viral infection, data is collected via long-read sequencing at discrete time points, allowing us to understand how the virus changes over time. However, the experimental and sequencing process is imperfect, so the resulting data contain not only real evolutionary changes, but also mutations and other genetic artifacts caused by sequencing errors. Our method corrects most of these errors by combining thousands of erroneous sequences into a much smaller number of unique consensus sequences that represent biologically meaningful variation. The resulting high-quality sequences are used for further analysis, such as building an evolutionary tree that tracks and interprets the genetic changes in the viral population over time. FLEA is open source, and is freely available online.
Collapse
|
15
|
Barik S, Das S, Vikalo H. QSdpR: Viral quasispecies reconstruction via correlation clustering. Genomics 2018; 110:375-381. [DOI: 10.1016/j.ygeno.2017.12.007] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2017] [Revised: 12/03/2017] [Accepted: 12/13/2017] [Indexed: 02/05/2023]
|
16
|
Parreira R. Laboratory Methods in Molecular Epidemiology: Viral Infections. Microbiol Spectr 2018; 6:10.1128/microbiolspec.ame-0003-2018. [PMID: 30387412 PMCID: PMC11633636 DOI: 10.1128/microbiolspec.ame-0003-2018] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2018] [Indexed: 01/05/2023] Open
Abstract
Viruses, which are the most abundant biological entities on the planet, have been regarded as the "dark matter" of biology in the sense that despite their ubiquity and frequent presence in large numbers, their detection and analysis are not always straightforward. The majority of them are very small (falling under the limit of 0.5 μm), and collectively, they are extraordinarily diverse. In fact, the majority of the genetic diversity on the planet is found in the so-called virosphere, or the world of viruses. Furthermore, the most frequent viral agents of disease in humans display an RNA genome, and frequently evolve very fast, due to the fact that most of their polymerases are devoid of proofreading activity. Therefore, their detection, genetic characterization, and epidemiological surveillance are rather challenging. This review (part of the Curated Collection on Advances in Molecular Epidemiology of Infectious Diseases) describes many of the methods that, throughout the last few decades, have been used for viral detection and analysis. Despite the challenge of having to deal with high genetic diversity, the majority of these methods still depend on the amplification of viral genomic sequences, using sequence-specific or sequence-independent approaches, exploring thermal profiles or a single nucleic acid amplification temperature. Furthermore, viral populations, and especially those with RNA genomes, are not usually genetically uniform but encompass swarms of genetically related, though distinct, viral genomes known as viral quasispecies. Therefore, sequence analysis of viral amplicons needs to take this fact into consideration, as it constitutes a potential analytic problem. Possible technical approaches to deal with it are also described here. *This article is part of a curated collection.
Collapse
Affiliation(s)
- Ricardo Parreira
- Unidade de Microbiologia Médica/Global Health and Tropical Medicine (GHTM) Research Centre, Instituto de Higiene e Medicina Tropical (IHMT), Universidade Nova de Lisboa (UNL), 1349-008 Lisboa, Portugal
| |
Collapse
|
17
|
Ahn S, Ke Z, Vikalo H. Viral quasispecies reconstruction via tensor factorization with successive read removal. Bioinformatics 2018; 34:i23-i31. [PMID: 29949976 PMCID: PMC6022648 DOI: 10.1093/bioinformatics/bty291] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Motivation As RNA viruses mutate and adapt to environmental changes, often developing resistance to anti-viral vaccines and drugs, they form an ensemble of viral strains--a viral quasispecies. While high-throughput sequencing (HTS) has enabled in-depth studies of viral quasispecies, sequencing errors and limited read lengths render the problem of reconstructing the strains and estimating their spectrum challenging. Inference of viral quasispecies is difficult due to generally non-uniform frequencies of the strains, and is further exacerbated when the genetic distances between the strains are small. Results This paper presents TenSQR, an algorithm that utilizes tensor factorization framework to analyze HTS data and reconstruct viral quasispecies characterized by highly uneven frequencies of its components. Fundamentally, TenSQR performs clustering with successive data removal to infer strains in a quasispecies in order from the most to the least abundant one; every time a strain is inferred, sequencing reads generated from that strain are removed from the dataset. The proposed successive strain reconstruction and data removal enables discovery of rare strains in a population and facilitates detection of deletions in such strains. Results on simulated datasets demonstrate that TenSQR can reconstruct full-length strains having widely different abundances, generally outperforming state-of-the-art methods at diversities 1-10% and detecting long deletions even in rare strains. A study on a real HIV-1 dataset demonstrates that TenSQR outperforms competing methods in experimental settings as well. Finally, we apply TenSQR to analyze a Zika virus sample and reconstruct the full-length strains it contains. Availability and implementation TenSQR is available at https://github.com/SoYeonA/TenSQR. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Soyeon Ahn
- Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX, USA
| | - Ziqi Ke
- Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX, USA
| | - Haris Vikalo
- Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
18
|
aBayesQR: A Bayesian Method for Reconstruction of Viral Populations Characterized by Low Diversity. J Comput Biol 2018; 25:637-648. [DOI: 10.1089/cmb.2017.0249] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
|
19
|
Karagiannis K, Simonyan V, Chumakov K, Mazumder R. Separation and assembly of deep sequencing data into discrete sub-population genomes. Nucleic Acids Res 2017; 45:10989-11003. [PMID: 28977510 PMCID: PMC5737798 DOI: 10.1093/nar/gkx755] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2016] [Accepted: 08/16/2017] [Indexed: 12/15/2022] Open
Abstract
Sequence heterogeneity is a common characteristic of RNA viruses that is often referred to as sub-populations or quasispecies. Traditional techniques used for assembly of short sequence reads produced by deep sequencing, such as de-novo assemblers, ignore the underlying diversity. Here, we introduce a novel algorithm that simultaneously assembles discrete sequences of multiple genomes present in populations. Using in silico data we were able to detect populations at as low as 0.1% frequency with complete global genome reconstruction and in a single sample detected 16 resolved sequences with no mismatches. We also applied the algorithm to high throughput sequencing data obtained for viruses present in sewage samples and successfully detected multiple sub-populations and recombination events in these diverse mixtures. High sensitivity of the algorithm also enables genomic analysis of heterogeneous pathogen genomes from patient samples and accurate detection of intra-host diversity, enabling not just basic research in personalized medicine but also accurate diagnostics and monitoring drug therapies, which are critical in clinical and regulatory decision-making process.
Collapse
Affiliation(s)
- Konstantinos Karagiannis
- Department of Biochemistry and Molecular Medicine, George Washington University Medical Center, Washington, DC 20037, USA.,Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Spring, MD 20993, USA
| | - Vahan Simonyan
- Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Spring, MD 20993, USA
| | - Konstantin Chumakov
- Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Spring, MD 20993, USA
| | - Raja Mazumder
- Department of Biochemistry and Molecular Medicine, George Washington University Medical Center, Washington, DC 20037, USA.,McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA
| |
Collapse
|
20
|
Malhotra R, Jha M, Poss M, Acharya R. A random forest classifier for detecting rare variants in NGS data from viral populations. Comput Struct Biotechnol J 2017; 15:388-395. [PMID: 28819548 PMCID: PMC5548337 DOI: 10.1016/j.csbj.2017.07.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Revised: 07/01/2017] [Accepted: 07/03/2017] [Indexed: 11/28/2022] Open
Abstract
We propose a random forest classifier for detecting rare variants from sequencing errors in Next Generation Sequencing (NGS) data from viral populations. The method utilizes counts of varying length of k-mers from the reads of a viral population to train a Random forest classifier, called MultiRes, that classifies k-mers as erroneous or rare variants. Our algorithm is rooted in concepts from signal processing and uses a frame-based representation of k-mers. Frames are sets of non-orthogonal basis functions that were traditionally used in signal processing for noise removal. We define discrete spatial signals for genomes and sequenced reads, and show that k-mers of a given size constitute a frame. We evaluate MultiRes on simulated and real viral population datasets, which consist of many low frequency variants, and compare it to the error detection methods used in correction tools known in the literature. MultiRes has 4 to 500 times less false positives k-mer predictions compared to other methods, essential for accurate estimation of viral population diversity and their de-novo assembly. It has high recall of the true k-mers, comparable to other error correction methods. MultiRes also has greater than 95% recall for detecting single nucleotide polymorphisms (SNPs) and fewer false positive SNPs, while detecting higher number of rare variants compared to other variant calling methods for viral populations. The software is available freely from the GitHub link https://github.com/raunaq-m/MultiRes.
Collapse
Affiliation(s)
- Raunaq Malhotra
- The School of Electrical Engineering and Computer Science, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Manjari Jha
- The School of Electrical Engineering and Computer Science, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Mary Poss
- Department of Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Raj Acharya
- School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA
| |
Collapse
|
21
|
Six Highly Conserved Targets of RNAi Revealed in HIV-1-Infected Patients from Russia Are Also Present in Many HIV-1 Strains Worldwide. MOLECULAR THERAPY. NUCLEIC ACIDS 2017; 8:330-344. [PMID: 28918033 PMCID: PMC5537207 DOI: 10.1016/j.omtn.2017.07.010] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/03/2017] [Revised: 07/10/2017] [Accepted: 07/10/2017] [Indexed: 12/26/2022]
Abstract
RNAi has been suggested for use in gene therapy of HIV/AIDS, but the main problem is that HIV-1 is highly variable and could escape attack from the small interfering RNAs (siRNAs) due to even single nucleotide substitutions in the potential targets. To exhaustively check the variability in selected RNA targets of HIV-1, we used ultra-deep sequencing of six regions of HIV-1 from the plasma of two independent cohorts of patients from Russia. Six RNAi targets were found that are invariable in 82%-97% of viruses in both cohorts and are located inside the domains specifying reverse transcriptase (RT), integrase, vpu, gp120, and p17. The analysis of mutation frequencies and their characteristics inside the targets suggests a likely role for APOBEC3G (apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like 3G, A3G) in G-to-A mutations and a predominant effect of RT biases in the detected variability of the virus. The lowest frequency of mutations was detected in the central part of all six targets. We also discovered that the identical RNAi targets are present in many HIV-1 strains from many countries and from all continents. The data are important for both the understanding of the patterns of HIV-1 mutability and properties of RT and for the development of gene therapy approaches using RNAi for the treatment of HIV/AIDS.
Collapse
|
22
|
Baaijens JA, Aabidine AZE, Rivals E, Schönhuth A. De novo assembly of viral quasispecies using overlap graphs. Genome Res 2017; 27:835-848. [PMID: 28396522 PMCID: PMC5411778 DOI: 10.1101/gr.215038.116] [Citation(s) in RCA: 74] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2016] [Accepted: 03/10/2017] [Indexed: 11/24/2022]
Abstract
A viral quasispecies, the ensemble of viral strains populating an infected person, can be highly diverse. For optimal assessment of virulence, pathogenesis, and therapy selection, determining the haplotypes of the individual strains can play a key role. As many viruses are subject to high mutation and recombination rates, high-quality reference genomes are often not available at the time of a new disease outbreak. We present SAVAGE, a computational tool for reconstructing individual haplotypes of intra-host virus strains without the need for a high-quality reference genome. SAVAGE makes use of either FM-index-based data structures or ad hoc consensus reference sequence for constructing overlap graphs from patient sample data. In this overlap graph, nodes represent reads and/or contigs, while edges reflect that two reads/contigs, based on sound statistical considerations, represent identical haplotypic sequence. Following an iterative scheme, a new overlap assembly algorithm that is based on the enumeration of statistically well-calibrated groups of reads/contigs then efficiently reconstructs the individual haplotypes from this overlap graph. In benchmark experiments on simulated and on real deep-coverage data, SAVAGE drastically outperforms generic de novo assemblers as well as the only specialized de novo viral quasispecies assembler available so far. When run on ad hoc consensus reference sequence, SAVAGE performs very favorably in comparison with state-of-the-art reference genome-guided tools. We also apply SAVAGE on two deep-coverage samples of patients infected by the Zika and the hepatitis C virus, respectively, which sheds light on the genetic structures of the respective viral quasispecies.
Collapse
Affiliation(s)
| | | | - Eric Rivals
- LIRMM, CNRS and Université de Montpellier, 34095 Montpellier, France
- Institut Biologie Computationnelle, CNRS and Université de Montpellier, 34095 Montpellier, France
| | | |
Collapse
|
23
|
Thannesberger J, Hellinger HJ, Klymiuk I, Kastner MT, Rieder FJJ, Schneider M, Fister S, Lion T, Kosulin K, Laengle J, Bergmann M, Rattei T, Steininger C. Viruses comprise an extensive pool of mobile genetic elements in eukaryote cell cultures and human clinical samples. FASEB J 2017; 31:1987-2000. [PMID: 28179422 DOI: 10.1096/fj.201601168r] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2016] [Accepted: 01/09/2017] [Indexed: 01/09/2023]
Abstract
Viruses shape a diversity of ecosystems by modulating their microbial, eukaryotic, or plant host metabolism. The complexity of virus-host interaction networks is progressively fathomed by novel metagenomic approaches. By using a novel metagenomic method, we explored the virome in mammalian cell cultures and clinical samples to identify an extensive pool of mobile genetic elements in all of these ecosystems. Despite aseptic treatment, cell cultures harbored extensive and diverse phage populations with a high abundance of as yet unknown and uncharacterized viruses (viral dark matter). Unknown phages also predominated in the oropharynx and urine of healthy individuals and patients infected with cytomegalovirus despite demonstration of active cytomegalovirus replication. The novelty of viral sequences correlated primarily with the individual evaluated, whereas relative abundance of encoded protein functions was associated with the ecologic niches probed. Together, these observations demonstrate the extensive presence of viral dark matter in human and artificial ecosystems.-Thannesberger, J., Hellinger, H.-J., Klymiuk, I., Kastner, M.-T., Rieder, F. J. J., Schneider, M., Fister, S., Lion, T., Kosulin, K., Laengle, J., Bergmann, M., Rattei, T., Steininger, C. Viruses comprise an extensive pool of mobile genetic elements in eukaryote cell cultures and human clinical samples.
Collapse
Affiliation(s)
- Jakob Thannesberger
- Division of Infectious Diseases, Department of Medicine 1, Medical University of Vienna, Vienna, Austria
| | - Hans-Joerg Hellinger
- CUBE-Division of Computational Systems Biology, Department of Microbiology and Ecosystem Science, University of Vienna, Vienna, Austria
| | - Ingeborg Klymiuk
- Center for Medical Research, Core Facility Molecular Biology, Medical University of Graz, Graz, Austria
| | - Marie-Theres Kastner
- Division of Infectious Diseases, Department of Medicine 1, Medical University of Vienna, Vienna, Austria
| | - Franz J J Rieder
- Division of Infectious Diseases, Department of Medicine 1, Medical University of Vienna, Vienna, Austria
| | - Martina Schneider
- Division of Infectious Diseases, Department of Medicine 1, Medical University of Vienna, Vienna, Austria
| | - Susanne Fister
- Christian Doppler Laboratory for Monitoring of Microbial Contaminants, University of Veterinary Medicine, Vienna, Austria
| | - Thomas Lion
- Children's Cancer Research Institute, Vienna, Austria
| | - Karin Kosulin
- Children's Cancer Research Institute, Vienna, Austria
| | - Johannes Laengle
- Department of General Surgery, Medical University of Vienna, Vienna, Austria
| | - Michael Bergmann
- Department of General Surgery, Medical University of Vienna, Vienna, Austria
| | - Thomas Rattei
- CUBE-Division of Computational Systems Biology, Department of Microbiology and Ecosystem Science, University of Vienna, Vienna, Austria
| | - Christoph Steininger
- Division of Infectious Diseases, Department of Medicine 1, Medical University of Vienna, Vienna, Austria;
| |
Collapse
|
24
|
aBayesQR: A Bayesian Method for Reconstruction of Viral Populations Characterized by Low Diversity. LECTURE NOTES IN COMPUTER SCIENCE 2017. [DOI: 10.1007/978-3-319-56970-3_22] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
25
|
Artyomenko A, Wu NC, Mangul S, Eskin E, Sun R, Zelikovsky A. Long Single-Molecule Reads Can Resolve the Complexity of the Influenza Virus Composed of Rare, Closely Related Mutant Variants. J Comput Biol 2016; 24:558-570. [PMID: 27901586 DOI: 10.1089/cmb.2016.0146] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
As a result of a high rate of mutations and recombination events, an RNA-virus exists as a heterogeneous "swarm" of mutant variants. The long read length offered by single-molecule sequencing technologies allows each mutant variant to be sequenced in a single pass. However, high error rate limits the ability to reconstruct heterogeneous viral population composed of rare, related mutant variants. In this article, we present two single-nucleotide variants (2SNV), a method able to tolerate the high error rate of the single-molecule protocol and reconstruct mutant variants. 2SNV uses linkage between single-nucleotide variations to efficiently distinguish them from read errors. To benchmark the sensitivity of 2SNV, we performed a single-molecule sequencing experiment on a sample containing a titrated level of known viral mutant variants. Our method is able to accurately reconstruct clone with frequency of 0.2% and distinguish clones that differed in only two nucleotides distantly located on the genome. 2SNV outperforms existing methods for full-length viral mutant reconstruction.
Collapse
Affiliation(s)
| | - Nicholas C Wu
- 2 Department of Integrative Structural and Computational Biology, The Scripps Research Institute , La Jolla, California
| | - Serghei Mangul
- 3 Department of Computer Science, University of California , Los Angeles, Los Angeles, California.,4 Institute for Quantitative and Computational Biosciences, University of California Los Angeles , Los Angeles, California
| | - Eleazar Eskin
- 3 Department of Computer Science, University of California , Los Angeles, Los Angeles, California
| | - Ren Sun
- 5 Molecular and Medical Pharmacology, University of California , Los Angeles, Los Angeles, California
| | - Alex Zelikovsky
- 1 Department of Computer Science, Georgia State University , Atlanta, Georgia
| |
Collapse
|
26
|
Posada-Cespedes S, Seifert D, Beerenwinkel N. Recent advances in inferring viral diversity from high-throughput sequencing data. Virus Res 2016; 239:17-32. [PMID: 27693290 DOI: 10.1016/j.virusres.2016.09.016] [Citation(s) in RCA: 77] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2016] [Revised: 09/23/2016] [Accepted: 09/24/2016] [Indexed: 02/05/2023]
Abstract
Rapidly evolving RNA viruses prevail within a host as a collection of closely related variants, referred to as viral quasispecies. Advances in high-throughput sequencing (HTS) technologies have facilitated the assessment of the genetic diversity of such virus populations at an unprecedented level of detail. However, analysis of HTS data from virus populations is challenging due to short, error-prone reads. In order to account for uncertainties originating from these limitations, several computational and statistical methods have been developed for studying the genetic heterogeneity of virus population. Here, we review methods for the analysis of HTS reads, including approaches to local diversity estimation and global haplotype reconstruction. Challenges posed by aligning reads, as well as the impact of reference biases on diversity estimates are also discussed. In addition, we address some of the experimental approaches designed to improve the biological signal-to-noise ratio. In the future, computational methods for the analysis of heterogeneous virus populations are likely to continue being complemented by technological developments.
Collapse
Affiliation(s)
- Susana Posada-Cespedes
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland; SIB, Basel, Switzerland
| | - David Seifert
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland; SIB, Basel, Switzerland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland; SIB, Basel, Switzerland.
| |
Collapse
|
27
|
Zukurov JP, do Nascimento-Brito S, Volpini AC, Oliveira GC, Janini LMR, Antoneli F. Estimation of genetic diversity in viral populations from next generation sequencing data with extremely deep coverage. Algorithms Mol Biol 2016; 11:2. [PMID: 26973707 PMCID: PMC4788855 DOI: 10.1186/s13015-016-0064-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2015] [Accepted: 02/25/2016] [Indexed: 12/16/2022] Open
Abstract
Background In this paper we propose a method and discuss its computational implementation as an integrated tool for the analysis of viral genetic diversity on data generated by high-throughput sequencing. The main motivation for this work is to better understand the genetic diversity of viruses with high rates of nucleotide substitution, as HIV-1 and Influenza. Most methods for viral diversity estimation proposed so far are intended to take benefit of the longer reads produced by some next-generation sequencing platforms in order to estimate a population of haplotypes which represent the diversity of the original population. The method proposed here is custom-made to take advantage of the very low error rate and extremely deep coverage per site, which are the main features of some neglected technologies that have not received much attention due to the short length of its reads, which precludes haplotype estimation. This approach allowed us to avoid some hard problems related to haplotype reconstruction (need of long reads, preliminary error filtering and assembly). Results We propose to measure genetic diversity of a viral population through a family of multinomial probability distributions indexed by the sites of the virus genome, each one representing the distribution of nucleic bases per site. Moreover, the implementation of the method focuses on two main optimization strategies: a read mapping/alignment procedure that aims at the recovery of the maximum possible number of short-reads; the inference of the multinomial parameters in a Bayesian framework with smoothed Dirichlet estimation. The Bayesian approach provides conditional probability distributions for the multinomial parameters allowing one to take into account the prior information of the control experiment and providing a natural way to separate signal from noise, since it automatically furnishes Bayesian confidence intervals and thus avoids the drawbacks of preliminary error filtering. Conclusions The methods described in this paper have been implemented as an integrated tool called Tanden (Tool for Analysis of Diversity in Viral Populations) and successfully tested on samples obtained from HIV-1 strain NL4-3 (group M, subtype B) cultivations on primary human cell cultures in many distinct viral propagation conditions. Tanden is written in C# (Microsoft), runs on the Windows operating system, and can be downloaded from: http://tanden.url.ph/.
Collapse
|
28
|
Long Single-Molecule Reads Can Resolve the Complexity of the Influenza Virus Composed of Rare, Closely Related Mutant Variants. LECTURE NOTES IN COMPUTER SCIENCE 2016. [DOI: 10.1007/978-3-319-31957-5_12] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
29
|
Villar LM, Cruz HM, Barbosa JR, Bezerra CS, Portilho MM, Scalioni LDP. Update on hepatitis B and C virus diagnosis. World J Virol 2015; 4:323-42. [PMID: 26568915 PMCID: PMC4641225 DOI: 10.5501/wjv.v4.i4.323] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/03/2015] [Revised: 09/25/2015] [Accepted: 10/23/2015] [Indexed: 02/05/2023] Open
Abstract
Viral hepatitis B and C virus (HBV and HCV) are responsible for the most of chronic liver disease worldwide and are transmitted by parenteral route, sexual and vertical transmission. One important measure to reduce the burden of these infections is the diagnosis of acute and chronic cases of HBV and HCV. In order to provide an effective diagnosis and monitoring of antiviral treatment, it is important to choose sensitive, rapid, inexpensive, and robust analytical methods. Primary diagnosis of HBV and HCV infection is made by using serological tests for detecting antigens and antibodies against these viruses. In order to confirm primary diagnosis, to quantify viral load, to determine genotypes and resistance mutants for antiviral treatment, qualitative and quantitative molecular tests are used. In this manuscript, we review the current serological and molecular methods for the diagnosis of hepatitis B and C.
Collapse
|
30
|
Khalifa ME, Varsani A, Ganley ARD, Pearson MN. Comparison of Illumina de novo assembled and Sanger sequenced viral genomes: A case study for RNA viruses recovered from the plant pathogenic fungus Sclerotinia sclerotiorum. Virus Res 2015; 219:51-57. [PMID: 26581665 DOI: 10.1016/j.virusres.2015.11.001] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2015] [Revised: 10/21/2015] [Accepted: 11/01/2015] [Indexed: 10/22/2022]
Abstract
The advent of 'next generation sequencing' (NGS) technologies has led to the discovery of many novel mycoviruses, the majority of which are sufficiently different from previously sequenced viruses that there is no appropriate reference sequence on which to base the sequence assembly. Although many new genome sequences are generated by NGS, confirmation of the sequence by Sanger sequencing is still essential for formal classification by the International Committee for the Taxonomy of Viruses (ICTV), although this is currently under review. To empirically test the validity of de novo assembled mycovirus genomes from dsRNA extracts, we compared the results from Illumina sequencing with those from random cloning plus targeted PCR coupled with Sanger sequencing for viruses from five Sclerotinia sclerotiorum isolates. Through Sanger sequencing we detected nine viral genomes while through Illumina sequencing we detected the same nine viruses plus one additional virus from the same samples. Critically, the Illumina derived sequences share >99.3 % identity to those obtained by cloning and Sanger sequencing. Although, there is scope for errors in de novo assembled viral genomes, our results demonstrate that by maximising the proportion of viral sequence in the data and using sufficiently rigorous quality controls, it is possible to generate de novo genome sequences of comparable accuracy from Illumina sequencing to those obtained by Sanger sequencing.
Collapse
Affiliation(s)
- Mahmoud E Khalifa
- School of Biological Sciences, The University of Auckland, Private Bag 92019, Auckland, New Zealand; Faculty of Sciences, Damietta University, Damietta, Egypt
| | - Arvind Varsani
- Biomolecular Interaction Centre and School of Biological Sciences, University of Canterbury, Private Bag 4800, Christchurch 8140, New Zealand; Structural Biology Research Unit, Department of Clinical Laboratory Sciences, University of Cape Town, Rondebosch, 7701 Cape Town, South Africa; Department of Plant Pathology and Emerging Pathogens Institute, University of Florida, Gainesville, Florida, FL 32611, USA
| | - Austen R D Ganley
- Institute of Natural and Mathematical Sciences, Massey University, Auckland, New Zealand
| | - Michael N Pearson
- School of Biological Sciences, The University of Auckland, Private Bag 92019, Auckland, New Zealand.
| |
Collapse
|
31
|
High-resolution genetic profile of viral genomes: why it matters. Curr Opin Virol 2015; 14:62-70. [DOI: 10.1016/j.coviro.2015.08.005] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2015] [Revised: 08/07/2015] [Accepted: 08/07/2015] [Indexed: 12/12/2022]
|
32
|
Orton RJ, Wright CF, Morelli MJ, King DJ, Paton DJ, King DP, Haydon DT. Distinguishing low frequency mutations from RT-PCR and sequence errors in viral deep sequencing data. BMC Genomics 2015; 16:229. [PMID: 25886445 PMCID: PMC4425905 DOI: 10.1186/s12864-015-1456-x] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2014] [Accepted: 03/09/2015] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND RNA viruses have high mutation rates and exist within their hosts as large, complex and heterogeneous populations, comprising a spectrum of related but non-identical genome sequences. Next generation sequencing is revolutionising the study of viral populations by enabling the ultra deep sequencing of their genomes, and the subsequent identification of the full spectrum of variants within the population. Identification of low frequency variants is important for our understanding of mutational dynamics, disease progression, immune pressure, and for the detection of drug resistant or pathogenic mutations. However, the current challenge is to accurately model the errors in the sequence data and distinguish real viral variants, particularly those that exist at low frequency, from errors introduced during sequencing and sample processing, which can both be substantial. RESULTS We have created a novel set of laboratory control samples that are derived from a plasmid containing a full-length viral genome with extremely limited diversity in the starting population. One sample was sequenced without PCR amplification whilst the other samples were subjected to increasing amounts of RT and PCR amplification prior to ultra-deep sequencing. This enabled the level of error introduced by the RT and PCR processes to be assessed and minimum frequency thresholds to be set for true viral variant identification. We developed a genome-scale computational model of the sample processing and NGS calling process to gain a detailed understanding of the errors at each step, which predicted that RT and PCR errors are more likely to occur at some genomic sites than others. The model can also be used to investigate whether the number of observed mutations at a given site of interest is greater than would be expected from processing errors alone in any NGS data set. After providing basic sample processing information and the site's coverage and quality scores, the model utilises the fitted RT-PCR error distributions to simulate the number of mutations that would be observed from processing errors alone. CONCLUSIONS These data sets and models provide an effective means of separating true viral mutations from those erroneously introduced during sample processing and sequencing.
Collapse
Affiliation(s)
- Richard J Orton
- Boyd Orr Centre for Population and Ecosystem Health, Institute of Biodiversity, Animal Health and Comparative Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, G12 8QQ, United Kingdom.
- Medical Research Council-University of Glasgow Centre for Virus Research, Institute of Infection, Inflammation and Immunity, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, G12 8QQ, United Kingdom.
| | | | - Marco J Morelli
- Center for Genomic Science of IIT@SEMM, Istituto Italiano di Tecnologia at the IFOM-IEO Campus, Via Adamello 16, Milano, 20139, Italy.
| | - David J King
- Pirbright Institute, Ash Road, Pirbright, GU24 0NF, UK.
| | - David J Paton
- Pirbright Institute, Ash Road, Pirbright, GU24 0NF, UK.
| | - Donald P King
- Pirbright Institute, Ash Road, Pirbright, GU24 0NF, UK.
| | - Daniel T Haydon
- Boyd Orr Centre for Population and Ecosystem Health, Institute of Biodiversity, Animal Health and Comparative Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, G12 8QQ, United Kingdom.
| |
Collapse
|
33
|
Isakov O, Bordería AV, Golan D, Hamenahem A, Celniker G, Yoffe L, Blanc H, Vignuzzi M, Shomron N. Deep sequencing analysis of viral infection and evolution allows rapid and detailed characterization of viral mutant spectrum. ACTA ACUST UNITED AC 2015; 31:2141-50. [PMID: 25701575 PMCID: PMC4481840 DOI: 10.1093/bioinformatics/btv101] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2014] [Accepted: 02/11/2015] [Indexed: 12/22/2022]
Abstract
Motivation: The study of RNA virus populations is a challenging task. Each population of RNA virus is composed of a collection of different, yet related genomes often referred to as mutant spectra or quasispecies. Virologists using deep sequencing technologies face major obstacles when studying virus population dynamics, both experimentally and in natural settings due to the relatively high error rates of these technologies and the lack of high performance pipelines. In order to overcome these hurdles we developed a computational pipeline, termed ViVan (Viral Variance Analysis). ViVan is a complete pipeline facilitating the identification, characterization and comparison of sequence variance in deep sequenced virus populations. Results: Applying ViVan on deep sequenced data obtained from samples that were previously characterized by more classical approaches, we uncovered novel and potentially crucial aspects of virus populations. With our experimental work, we illustrate how ViVan can be used for studies ranging from the more practical, detection of resistant mutations and effects of antiviral treatments, to the more theoretical temporal characterization of the population in evolutionary studies. Availability and implementation: Freely available on the web at http://www.vivanbioinfo.org Contact: nshomron@post.tau.ac.il Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ofer Isakov
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel, Institut Pasteur, Viral Populations and Pathogenesis, CNRS URA 3015, Paris, France and Department of Statistics and Operations Research, Tel Aviv University, Tel Aviv 69978, Israel
| | - Antonio V Bordería
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel, Institut Pasteur, Viral Populations and Pathogenesis, CNRS URA 3015, Paris, France and Department of Statistics and Operations Research, Tel Aviv University, Tel Aviv 69978, Israel
| | - David Golan
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel, Institut Pasteur, Viral Populations and Pathogenesis, CNRS URA 3015, Paris, France and Department of Statistics and Operations Research, Tel Aviv University, Tel Aviv 69978, Israel
| | - Amir Hamenahem
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel, Institut Pasteur, Viral Populations and Pathogenesis, CNRS URA 3015, Paris, France and Department of Statistics and Operations Research, Tel Aviv University, Tel Aviv 69978, Israel
| | - Gershon Celniker
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel, Institut Pasteur, Viral Populations and Pathogenesis, CNRS URA 3015, Paris, France and Department of Statistics and Operations Research, Tel Aviv University, Tel Aviv 69978, Israel
| | - Liron Yoffe
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel, Institut Pasteur, Viral Populations and Pathogenesis, CNRS URA 3015, Paris, France and Department of Statistics and Operations Research, Tel Aviv University, Tel Aviv 69978, Israel
| | - Hervé Blanc
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel, Institut Pasteur, Viral Populations and Pathogenesis, CNRS URA 3015, Paris, France and Department of Statistics and Operations Research, Tel Aviv University, Tel Aviv 69978, Israel
| | - Marco Vignuzzi
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel, Institut Pasteur, Viral Populations and Pathogenesis, CNRS URA 3015, Paris, France and Department of Statistics and Operations Research, Tel Aviv University, Tel Aviv 69978, Israel
| | - Noam Shomron
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel, Institut Pasteur, Viral Populations and Pathogenesis, CNRS URA 3015, Paris, France and Department of Statistics and Operations Research, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|