1
|
Kukkar D, Sharma PK, Kim KH. Recent advances in metagenomic analysis of different ecological niches for enhanced biodegradation of recalcitrant lignocellulosic biomass. ENVIRONMENTAL RESEARCH 2022; 215:114369. [PMID: 36165858 DOI: 10.1016/j.envres.2022.114369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 09/06/2022] [Accepted: 09/15/2022] [Indexed: 06/16/2023]
Abstract
Lignocellulose wastes stemming from agricultural residues can offer an excellent opportunity as alternative energy solutions in addition to fossil fuels. Besides, the unrestrained burning of agricultural residues can lead to the destruction of the soil microflora and associated soil sterilization. However, the difficulties associated with the biodegradation of lignocellulose biomasses remain as a formidable challenge for their sustainable management. In this respect, metagenomics can be used as an effective option to resolve such dilemma because of its potential as the next generation sequencing technology and bioinformatics tools to harness novel microbial consortia from diverse environments (e.g., soil, alpine forests, and hypersaline/acidic/hot sulfur springs). In light of the challenges associated with the bulk-scale biodegradation of lignocellulose-rich agricultural residues, this review is organized to help delineate the fundamental aspects of metagenomics towards the assessment of the microbial consortia and novel molecules (such as biocatalysts) which are otherwise unidentifiable by conventional laboratory culturing techniques. The discussion is extended further to highlight the recent advancements (e.g., from 2011 to 2022) in metagenomic approaches for the isolation and purification of lignocellulolytic microbes from different ecosystems along with the technical challenges and prospects associated with their wide implementation and scale-up. This review should thus be one of the first comprehensive reports on the metagenomics-based analysis of different environmental samples for the isolation and purification of lignocellulose degrading enzymes.
Collapse
Affiliation(s)
- Deepak Kukkar
- Department of Biotechnology, Chandigarh University, Gharuan, Mohali - 140413, Punjab, India; University Centre for Research and Development, Chandigarh University, Gharuan, Mohali - 140413, Punjab, India.
| | | | - Ki-Hyun Kim
- Department of Civil and Environmental Engineering, Hanyang University, Seongdong-gu, Wangsimni-ro, Seoul - 04763, South Korea.
| |
Collapse
|
2
|
Molina-Mora JA, Cordero-Laurent E, Calderón-Osorno M, Chacón-Ramírez E, Duarte-Martínez F. Metagenomic pipeline for identifying co-infections among distinct SARS-CoV-2 variants of concern: study cases from Alpha to Omicron. Sci Rep 2022; 12:9377. [PMID: 35672431 PMCID: PMC9172093 DOI: 10.1038/s41598-022-13113-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Accepted: 05/03/2022] [Indexed: 01/04/2023] Open
Abstract
Concomitant infection or co-infection with distinct SARS-CoV-2 genotypes has been reported as part of the epidemiological surveillance of the COVID-19 pandemic. In the context of the spread of more transmissible variants during 2021, co-infections are not only important due to the possible changes in the clinical outcome, but also the chance to generate new genotypes by recombination. However, a few approaches have developed bioinformatic pipelines to identify co-infections. Here we present a metagenomic pipeline based on the inference of multiple fragments similar to amplicon sequence variant (ASV-like) from sequencing data and a custom SARS-CoV-2 database to identify the concomitant presence of divergent SARS-CoV-2 genomes, i.e., variants of concern (VOCs). This approach was compared to another strategy based on whole-genome (metagenome) assembly. Using single or pairs of sequencing data of COVID-19 cases with distinct SARS-CoV-2 VOCs, each approach was used to predict the VOC classes (Alpha, Beta, Gamma, Delta, Omicron or non-VOC and their combinations). The performance of each pipeline was assessed using the ground-truth or expected VOC classes. Subsequently, the ASV-like pipeline was used to analyze 1021 cases of COVID-19 from Costa Rica to investigate the possible occurrence of co-infections. After the implementation of the two approaches, an accuracy of 96.2% was revealed for the ASV-like inference approach, which contrasts with the misclassification found (accuracy 46.2%) for the whole-genome assembly strategy. The custom SARS-CoV-2 database used for the ASV-like analysis can be updated according to the appearance of new VOCs to track co-infections with eventual new genotypes. In addition, the application of the ASV-like approach to all the 1021 sequenced samples from Costa Rica in the period October 12th-December 21th 2021 found that none corresponded to co-infections with VOCs. In conclusion, we developed a metagenomic pipeline based on ASV-like inference for the identification of co-infection with distinct SARS-CoV-2 VOCs, in which an outstanding accuracy was achieved. Due to the epidemiological, clinical, and molecular relevance of the concomitant infection with distinct genotypes, this work represents another piece in the process of the surveillance of the COVID-19 pandemic in Costa Rica and worldwide.
Collapse
Affiliation(s)
- Jose Arturo Molina-Mora
- Centro de Investigación en Enfermedades Tropicales (CIET) and Facultad de Microbiología, Universidad de Costa Rica, San José, Costa Rica.
| | - Estela Cordero-Laurent
- Instituto Costarricense de Investigación y Enseñanza en Nutrición y Salud (INCIENSA), Tres Ríos, Cartago, Costa Rica
| | - Melany Calderón-Osorno
- Instituto Costarricense de Investigación y Enseñanza en Nutrición y Salud (INCIENSA), Tres Ríos, Cartago, Costa Rica
| | - Edgar Chacón-Ramírez
- Centro de Investigación en Enfermedades Tropicales (CIET) and Facultad de Microbiología, Universidad de Costa Rica, San José, Costa Rica
| | - Francisco Duarte-Martínez
- Instituto Costarricense de Investigación y Enseñanza en Nutrición y Salud (INCIENSA), Tres Ríos, Cartago, Costa Rica
| |
Collapse
|
3
|
Knyazev S, Tsyvina V, Shankar A, Melnyk A, Artyomenko A, Malygina T, Porozov YB, Campbell EM, Switzer WM, Skums P, Mangul S, Zelikovsky A. Accurate assembly of minority viral haplotypes from next-generation sequencing through efficient noise reduction. Nucleic Acids Res 2021; 49:e102. [PMID: 34214168 PMCID: PMC8464054 DOI: 10.1093/nar/gkab576] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 05/25/2021] [Accepted: 06/18/2021] [Indexed: 12/21/2022] Open
Abstract
Rapidly evolving RNA viruses continuously produce minority haplotypes that can become dominant if they are drug-resistant or can better evade the immune system. Therefore, early detection and identification of minority viral haplotypes may help to promptly adjust the patient’s treatment plan preventing potential disease complications. Minority haplotypes can be identified using next-generation sequencing, but sequencing noise hinders accurate identification. The elimination of sequencing noise is a non-trivial task that still remains open. Here we propose CliqueSNV based on extracting pairs of statistically linked mutations from noisy reads. This effectively reduces sequencing noise and enables identifying minority haplotypes with the frequency below the sequencing error rate. We comparatively assess the performance of CliqueSNV using an in vitro mixture of nine haplotypes that were derived from the mutation profile of an existing HIV patient. We show that CliqueSNV can accurately assemble viral haplotypes with frequencies as low as 0.1% and maintains consistent performance across short and long bases sequencing platforms.
Collapse
Affiliation(s)
- Sergey Knyazev
- Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA.,Division of HIV Prevention, Centers for Disease Control and Prevention, Atlanta, GA 30333, USA.,Oak Ridge Institute for Science and Education, Oak Ridge, TN 37830, USA
| | - Viachaslau Tsyvina
- Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA
| | - Anupama Shankar
- Division of HIV Prevention, Centers for Disease Control and Prevention, Atlanta, GA 30333, USA
| | - Andrew Melnyk
- Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA
| | | | - Tatiana Malygina
- International Scientific and Research Institute of Bioengineering, ITMO University, St. Petersburg 197101, Russia
| | - Yuri B Porozov
- World-Class Research Center "Digital biodesign and personalized healthcare", I.M. Sechenov First Moscow State Medical University, Moscow 119991, Russia.,Department of Computational Biology, Sirius University of Science and Technology, Sochi 354340, Russia
| | - Ellsworth M Campbell
- Division of HIV Prevention, Centers for Disease Control and Prevention, Atlanta, GA 30333, USA
| | - William M Switzer
- Division of HIV Prevention, Centers for Disease Control and Prevention, Atlanta, GA 30333, USA
| | - Pavel Skums
- Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA
| | - Serghei Mangul
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, Los Angeles, CA 90089, USA
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA.,World-Class Research Center "Digital biodesign and personalized healthcare", I.M. Sechenov First Moscow State Medical University, Moscow 119991, Russia
| |
Collapse
|
4
|
Freire B, Ladra S, Paramá JR, Salmela L. Inference of viral quasispecies with a paired de Bruijn graph. Bioinformatics 2021; 37:473-481. [PMID: 32926162 DOI: 10.1093/bioinformatics/btaa782] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2019] [Revised: 03/11/2020] [Accepted: 09/02/2020] [Indexed: 12/28/2022] Open
Abstract
MOTIVATION RNA viruses exhibit a high mutation rate and thus they exist in infected cells as a population of closely related strains called viral quasispecies. The viral quasispecies assembly problem asks to characterize the quasispecies present in a sample from high-throughput sequencing data. We study the de novo version of the problem, where reference sequences of the quasispecies are not available. Current methods for assembling viral quasispecies are either based on overlap graphs or on de Bruijn graphs. Overlap graph-based methods tend to be accurate but slow, whereas de Bruijn graph-based methods are fast but less accurate. RESULTS We present viaDBG, which is a fast and accurate de Bruijn graph-based tool for de novo assembly of viral quasispecies. We first iteratively correct sequencing errors in the reads, which allows us to use large k-mers in the de Bruijn graph. To incorporate the paired-end information in the graph, we also adapt the paired de Bruijn graph for viral quasispecies assembly. These features enable the use of long-range information in contig construction without compromising the speed of de Bruijn graph-based approaches. Our experimental results show that viaDBG is both accurate and fast, whereas previous methods are either fast or accurate but not both. In particular, viaDBG has comparable or better accuracy than SAVAGE, while being at least nine times faster. Furthermore, the speed of viaDBG is comparable to PEHaplo but viaDBG is able to retrieve also low abundance quasispecies, which are often missed by PEHaplo. AVAILABILITY AND IMPLEMENTATION viaDBG is implemented in C++ and it is publicly available at https://bitbucket.org/bfreirec1/viadbg. All datasets used in this article are publicly available at https://bitbucket.org/bfreirec1/data-viadbg/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Borja Freire
- Department of Computer Science and Information Technologies, Facultade de Informática, Universidade da Coruña, Centro de investigación CITIC, A Coruña, Spain
| | - Susana Ladra
- Department of Computer Science and Information Technologies, Facultade de Informática, Universidade da Coruña, Centro de investigación CITIC, A Coruña, Spain
| | - Jose R Paramá
- Department of Computer Science and Information Technologies, Facultade de Informática, Universidade da Coruña, Centro de investigación CITIC, A Coruña, Spain
| | - Leena Salmela
- Department of Computer Science, Helsinki Institute for Information Technology, University of Helsinki, Helsinki, Finland
| |
Collapse
|
5
|
Detecting and phasing minor single-nucleotide variants from long-read sequencing data. Nat Commun 2021; 12:3032. [PMID: 34031367 PMCID: PMC8144375 DOI: 10.1038/s41467-021-23289-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Accepted: 04/15/2021] [Indexed: 02/04/2023] Open
Abstract
Cellular genetic heterogeneity is common in many biological conditions including cancer, microbiome, and co-infection of multiple pathogens. Detecting and phasing minor variants play an instrumental role in deciphering cellular genetic heterogeneity, but they are still difficult tasks because of technological limitations. Recently, long-read sequencing technologies, including those by Pacific Biosciences and Oxford Nanopore, provide an opportunity to tackle these challenges. However, high error rates make it difficult to take full advantage of these technologies. To fill this gap, we introduce iGDA, an open-source tool that can accurately detect and phase minor single-nucleotide variants (SNVs), whose frequencies are as low as 0.2%, from raw long-read sequencing data. We also demonstrate that iGDA can accurately reconstruct haplotypes in closely related strains of the same species (divergence ≥0.011%) from long-read metagenomic data.
Collapse
|
6
|
Alipanahi B, Muggli MD, Jundi M, Noyes NR, Boucher C. Metagenome SNP calling via read-colored de Bruijn graphs. Bioinformatics 2021; 36:5275-5281. [PMID: 32049324 DOI: 10.1093/bioinformatics/btaa081] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Revised: 01/08/2020] [Accepted: 02/03/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Metagenomics refers to the study of complex samples containing of genetic contents of multiple individual organisms and, thus, has been used to elucidate the microbiome and resistome of a complex sample. The microbiome refers to all microbial organisms in a sample, and the resistome refers to all of the antimicrobial resistance (AMR) genes in pathogenic and non-pathogenic bacteria. Single-nucleotide polymorphisms (SNPs) can be effectively used to 'fingerprint' specific organisms and genes within the microbiome and resistome and trace their movement across various samples. However, to effectively use these SNPs for this traceability, a scalable and accurate metagenomics SNP caller is needed. Moreover, such an SNP caller should not be reliant on reference genomes since 95% of microbial species is unculturable, making the determination of a reference genome extremely challenging. In this article, we address this need. RESULTS We present LueVari, a reference-free SNP caller based on the read-colored de Bruijn graph, an extension of the traditional de Bruijn graph that allows repeated regions longer than the k-mer length and shorter than the read length to be identified unambiguously. LueVari is able to identify SNPs in both AMR genes and chromosomal DNA from shotgun metagenomics data with reliable sensitivity (between 91% and 99%) and precision (between 71% and 99%) as the performance of competing methods varies widely. Furthermore, we show that LueVari constructs sequences containing the variation, which span up to 97.8% of genes in datasets, which can be helpful in detecting distinct AMR genes in large metagenomic datasets. AVAILABILITY AND IMPLEMENTATION Code and datasets are publicly available at https://github.com/baharpan/cosmo/tree/LueVari. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Bahar Alipanahi
- Department of Computer & Information Science & Engineering, University of Florida, Gainesville, FL 32611, USA
| | - Martin D Muggli
- Department of Computer & Information Science & Engineering, University of Florida, Gainesville, FL 32611, USA
| | - Musa Jundi
- Department of Computer & Information Science & Engineering, University of Florida, Gainesville, FL 32611, USA
| | - Noelle R Noyes
- Department of Computer & Information Science & Engineering, University of Florida, Gainesville, FL 32611, USA
| | - Christina Boucher
- Department of Computer & Information Science & Engineering, University of Florida, Gainesville, FL 32611, USA
| |
Collapse
|
7
|
Ramazzotti D, Angaroni F, Maspero D, Gambacorti-Passerini C, Antoniotti M, Graudenzi A, Piazza R. VERSO: A comprehensive framework for the inference of robust phylogenies and the quantification of intra-host genomic diversity of viral samples. PATTERNS (NEW YORK, N.Y.) 2021; 2:100212. [PMID: 33728416 PMCID: PMC7953447 DOI: 10.1016/j.patter.2021.100212] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Revised: 11/30/2020] [Accepted: 01/22/2021] [Indexed: 12/22/2022]
Abstract
We introduce VERSO, a two-step framework for the characterization of viral evolution from sequencing data of viral genomes, which is an improvement on phylogenomic approaches for consensus sequences. VERSO exploits an efficient algorithmic strategy to return robust phylogenies from clonal variant profiles, also in conditions of sampling limitations. It then leverages variant frequency patterns to characterize the intra-host genomic diversity of samples, revealing undetected infection chains and pinpointing variants likely involved in homoplasies. On simulations, VERSO outperforms state-of-the-art tools for phylogenetic inference. Notably, the application to 6,726 amplicon and RNA sequencing samples refines the estimation of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) evolution, while co-occurrence patterns of minor variants unveil undetected infection paths, which are validated with contact tracing data. Finally, the analysis of SARS-CoV-2 mutational landscape uncovers a temporal increase of overall genomic diversity and highlights variants transiting from minor to clonal state and homoplastic variants, some of which fall on the spike gene. Available at: https://github.com/BIMIB-DISCo/VERSO.
Collapse
Affiliation(s)
- Daniele Ramazzotti
- Department of Medicine and Surgery, Università degli Studi di Milano-Bicocca, Monza, Italy
| | - Fabrizio Angaroni
- Department of Informatics, Systems and Communication, Università degli Studi di Milano-Bicocca, Milan, Italy
| | - Davide Maspero
- Department of Informatics, Systems and Communication, Università degli Studi di Milano-Bicocca, Milan, Italy
- Inst. of Molecular Bioimaging and Physiology, Consiglio Nazionale delle Ricerche (IBFM-CNR), Segrate, Milan, Italy
| | | | - Marco Antoniotti
- Department of Informatics, Systems and Communication, Università degli Studi di Milano-Bicocca, Milan, Italy
- Bicocca Bioinformatics, Biostatistics and Bioimaging Centre – B4, Milan, Italy
| | - Alex Graudenzi
- Inst. of Molecular Bioimaging and Physiology, Consiglio Nazionale delle Ricerche (IBFM-CNR), Segrate, Milan, Italy
- Bicocca Bioinformatics, Biostatistics and Bioimaging Centre – B4, Milan, Italy
| | - Rocco Piazza
- Department of Medicine and Surgery, Università degli Studi di Milano-Bicocca, Monza, Italy
| |
Collapse
|
8
|
Riaz N, Leung P, Barton K, Smith MA, Carswell S, Bull R, Lloyd AR, Rodrigo C. Adaptation of Oxford Nanopore technology for hepatitis C whole genome sequencing and identification of within-host viral variants. BMC Genomics 2021; 22:148. [PMID: 33653280 PMCID: PMC7923462 DOI: 10.1186/s12864-021-07460-1] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2020] [Accepted: 02/19/2021] [Indexed: 01/23/2023] Open
Abstract
Background Hepatitis C (HCV) and many other RNA viruses exist as rapidly mutating quasi-species populations in a single infected host. High throughput characterization of full genome, within-host variants is still not possible despite advances in next generation sequencing. This limitation constrains viral genomic studies that depend on accurate identification of hemi-genome or whole genome, within-host variants, especially those occurring at low frequencies. With the advent of third generation long read sequencing technologies, including Oxford Nanopore Technology (ONT) and PacBio platforms, this problem is potentially surmountable. ONT is particularly attractive in this regard due to the portable nature of the MinION sequencer, which makes real-time sequencing in remote and resource-limited locations possible. However, this technology (termed here ‘nanopore sequencing’) has a comparatively high technical error rate. The present study aimed to assess the utility, accuracy and cost-effectiveness of nanopore sequencing for HCV genomes. We also introduce a new bioinformatics tool (Nano-Q) to differentiate within-host variants from nanopore sequencing. Results The Nanopore platform, when the coverage exceeded 300 reads, generated comparable consensus sequences to Illumina sequencing. Using HCV Envelope plasmids (~ 1800 nt) mixed in known proportions, the capacity of nanopore sequencing to reliably identify variants with an abundance as low as 0.1% was demonstrated, provided the autologous reference sequence was available to identify the matching reads. Successful pooling and nanopore sequencing of 52 samples from patients with HCV infection demonstrated its cost effectiveness (AUD$ 43 per sample with nanopore sequencing versus $100 with paired-end short read technology). The Nano-Q tool successfully separated between-host sequences, including those from the same subtype, by bulk sorting and phylogenetic clustering without an autologous reference sequence (using only a subtype-specific generic reference). The pipeline also identified within-host viral variants and their abundance when the parameters were appropriately adjusted. Conclusion Cost effective HCV whole genome sequencing and within-host variant identification without haplotype reconstruction are potential advantages of nanopore sequencing. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07460-1.
Collapse
Affiliation(s)
- Nasir Riaz
- Kirby Institute, UNSW Sydney, Sydney, NSW, 2052, Australia.,Department of Microbiology, Hazara University, KPK, Maneshra, 21120, Pakistan
| | - Preston Leung
- Kirby Institute, UNSW Sydney, Sydney, NSW, 2052, Australia
| | - Kirston Barton
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Sydney, Australia
| | - Martin A Smith
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Sydney, Australia
| | - Shaun Carswell
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Sydney, Australia
| | - Rowena Bull
- Kirby Institute, UNSW Sydney, Sydney, NSW, 2052, Australia.,Department of Pathology, School of Medical Sciences, UNSW Sydney, Sydney, NSW, 2052, Australia
| | - Andrew R Lloyd
- Kirby Institute, UNSW Sydney, Sydney, NSW, 2052, Australia
| | - Chaturaka Rodrigo
- Kirby Institute, UNSW Sydney, Sydney, NSW, 2052, Australia. .,Department of Pathology, School of Medical Sciences, UNSW Sydney, Sydney, NSW, 2052, Australia.
| |
Collapse
|
9
|
Graudenzi A, Maspero D, Angaroni F, Piazza R, Ramazzotti D. Mutational signatures and heterogeneous host response revealed via large-scale characterization of SARS-CoV-2 genomic diversity. iScience 2021; 24:102116. [PMID: 33532709 PMCID: PMC7842190 DOI: 10.1016/j.isci.2021.102116] [Citation(s) in RCA: 54] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Revised: 11/09/2020] [Accepted: 01/22/2021] [Indexed: 01/03/2023] Open
Abstract
To dissect the mechanisms underlying the inflation of variants in the Severe Acute Respiratory Syndrome CoronaVirus 2 (SARS-CoV-2) genome, we present a large-scale analysis of intra-host genomic diversity, which reveals that most samples exhibit heterogeneous genomic architectures, due to the interplay between host-related mutational processes and transmission dynamics. The decomposition of minor variants profiles unveils three non-overlapping mutational signatures related to nucleotide substitutions and likely ruled by APOlipoprotein B Editing Complex (APOBEC), Reactive Oxygen Species (ROS), and Adenosine Deaminase Acting on RNA (ADAR), highlighting heterogeneous host responses to SARS-CoV-2 infections. A corrected-for-signatures dN/dS analysis demonstrates that such mutational processes are affected by purifying selection, with important exceptions. In fact, several mutations appear to transit toward clonality, defining new clonal genotypes that increase the overall genomic diversity. Furthermore, the phylogenomic analysis shows the presence of homoplasies and supports the hypothesis of transmission of minor variants. This study paves the way for the integrated analysis of intra-host genomic diversity and clinical outcomes of SARS-CoV-2 infections.
Collapse
Affiliation(s)
- Alex Graudenzi
- Inst. of Molecular Bioimaging and Physiology, Consiglio Nazionale delle Ricerche (IBFM-CNR), Segrate, Milan, Italy
- Bicocca Bioinformatics, Biostatistics and Bioimaging Centre – B4, Milan, Italy
| | - Davide Maspero
- Inst. of Molecular Bioimaging and Physiology, Consiglio Nazionale delle Ricerche (IBFM-CNR), Segrate, Milan, Italy
- Department of Informatics, Systems and Communication, Univ. of Milan-Bicocca, Milan, Italy
| | - Fabrizio Angaroni
- Department of Informatics, Systems and Communication, Univ. of Milan-Bicocca, Milan, Italy
| | - Rocco Piazza
- Department of Medicine and Surgery, Univ. of Milan-Bicocca, Monza, Italy
| | - Daniele Ramazzotti
- Department of Medicine and Surgery, Univ. of Milan-Bicocca, Monza, Italy
| |
Collapse
|
10
|
Cao C, He J, Mak L, Perera D, Kwok D, Wang J, Li M, Mourier T, Gavriliuc S, Greenberg M, Morrissy AS, Sycuro LK, Yang G, Jeffares DC, Long Q. Reconstruction of Microbial Haplotypes by Integration of Statistical and Physical Linkage in Scaffolding. Mol Biol Evol 2021; 38:2660-2672. [PMID: 33547786 PMCID: PMC8136496 DOI: 10.1093/molbev/msab037] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
DNA sequencing technologies provide unprecedented opportunities to analyze within-host evolution of microorganism populations. Often, within-host populations are analyzed via pooled sequencing of the population, which contains multiple individuals or "haplotypes." However, current next-generation sequencing instruments, in conjunction with single-molecule barcoded linked-reads, cannot distinguish long haplotypes directly. Computational reconstruction of haplotypes from pooled sequencing has been attempted in virology, bacterial genomics, metagenomics, and human genetics, using algorithms based on either cross-host genetic sharing or within-host genomic reads. Here, we describe PoolHapX, a flexible computational approach that integrates information from both genetic sharing and genomic sequencing. We demonstrated that PoolHapX outperforms state-of-the-art tools tailored to specific organismal systems, and is robust to within-host evolution. Importantly, together with barcoded linked-reads, PoolHapX can infer whole-chromosome-scale haplotypes from 50 pools each containing 12 different haplotypes. By analyzing real data, we uncovered dynamic variations in the evolutionary processes of within-patient HIV populations previously unobserved in single position-based analysis.
Collapse
Affiliation(s)
- Chen Cao
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada
| | - Jingni He
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada,Department of Cardiology, Xiangya Hospital, Central South University, Changsha, China
| | - Lauren Mak
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada,Present address: Tri-Institutional Computational Biology & Medicine Program, Weill Cornell Medicine of Cornell University, New York, NY, USA
| | - Deshan Perera
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada
| | - Devin Kwok
- Department of Mathematics & Statistics, University of Calgary, Calgary, AB, Canada
| | - Jia Wang
- Electrical and Computer Engineering, Illinois Institute of Technology, Chicago, IL, USA
| | - Minghao Li
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada
| | - Tobias Mourier
- Pathogen Genomics Laboratory, Biological and Environmental Sciences and Engineering (BESE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Stefan Gavriliuc
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada
| | - Matthew Greenberg
- Department of Mathematics & Statistics, University of Calgary, Calgary, AB, Canada
| | - A Sorana Morrissy
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada
| | - Laura K Sycuro
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada,Department of Microbiology, Immunology, and Infectious Diseases, Snyder Institute for Chronic Diseases, University of Calgary, Calgary, AB, Canada
| | - Guang Yang
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada,Department of Medical Genetics, University of Calgary, Calgary, AB, Canada
| | - Daniel C Jeffares
- Department of Biology, York Biomedical Research Institute, University of York, York, United Kingdom
| | - Quan Long
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada,Department of Mathematics & Statistics, University of Calgary, Calgary, AB, Canada,Department of Medical Genetics, University of Calgary, Calgary, AB, Canada,Hotchkiss Brain Institute, O’Brien Institute for Public Health, University of Calgary, Calgary, AB, Canada,Corresponding author: E-mail:
| |
Collapse
|
11
|
Knyazev S, Hughes L, Skums P, Zelikovsky A. Epidemiological data analysis of viral quasispecies in the next-generation sequencing era. Brief Bioinform 2021; 22:96-108. [PMID: 32568371 PMCID: PMC8485218 DOI: 10.1093/bib/bbaa101] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Revised: 04/24/2020] [Accepted: 05/04/2020] [Indexed: 01/04/2023] Open
Abstract
The unprecedented coverage offered by next-generation sequencing (NGS) technology has facilitated the assessment of the population complexity of intra-host RNA viral populations at an unprecedented level of detail. Consequently, analysis of NGS datasets could be used to extract and infer crucial epidemiological and biomedical information on the levels of both infected individuals and susceptible populations, thus enabling the development of more effective prevention strategies and antiviral therapeutics. Such information includes drug resistance, infection stage, transmission clusters and structures of transmission networks. However, NGS data require sophisticated analysis dealing with millions of error-prone short reads per patient. Prior to the NGS era, epidemiological and phylogenetic analyses were geared toward Sanger sequencing technology; now, they must be redesigned to handle the large-scale NGS datasets and properly model the evolution of heterogeneous rapidly mutating viral populations. Additionally, dedicated epidemiological surveillance systems require big data analytics to handle millions of reads obtained from thousands of patients for rapid outbreak investigation and management. We survey bioinformatics tools analyzing NGS data for (i) characterization of intra-host viral population complexity including single nucleotide variant and haplotype calling; (ii) downstream epidemiological analysis and inference of drug-resistant mutations, age of infection and linkage between patients; and (iii) data collection and analytics in surveillance systems for fast response and control of outbreaks.
Collapse
|
12
|
Cevallos C, Culasso AC, Modenutti C, Gun A, Sued O, Avila MM, Flichman D, Delpino MV, Quarleri J. Longitudinal characterization of HIV-1 pol-gene in treatment-naïve men-who-have-sex-with-men from acute to chronic infection stages. Heliyon 2020; 6:e05679. [PMID: 33319116 PMCID: PMC7723807 DOI: 10.1016/j.heliyon.2020.e05679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Revised: 11/09/2020] [Accepted: 12/03/2020] [Indexed: 01/18/2023] Open
Abstract
HIV-1 is characterized by its ability to mutate and recombine even at polymerase (pol) gene. However, pol-gene diversity is limited due to functional constraints. The aim of this study was to characterize longitudinally, by next-generation sequencing (NGS), HIV-1 variants based on pol-gene sequences, at intra- and inter-host level, from acute/early to chronic stages of infection, in the absence of antiretroviral therapy. Ten men who have sex with men (MSM) were recruited during primary infection and yearly followed for five years. Even after a maximum of a five-year follow-up period, the phylogenetic analysis of HIV-1 pol-gene sequences showed a host-defined structured pattern, with a predominance of purifying selection forces during the follow-up. MSM had been acutely infected by different HIV-1 variants mainly ascribed to pure subtype B, or BF recombinant variants and showed different genetic mosaicism patterns that last until the chronic stage, representing a major challenge for prevention strategies.
Collapse
Affiliation(s)
- Cintia Cevallos
- Instituto de Investigaciones Biomédicas en Retrovirus y Sida (INBIRS), Facultad de Medicina, Universidad de Buenos Aires-CONICET, Buenos Aires, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| | - Andrés C.A. Culasso
- Instituto de Bacteriología y Virología Molecular (IBaViM), Facultad de Farmacia y Bioquímica, Universidad de Buenos Aires-CONICET, Buenos Aires, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| | - Carlos Modenutti
- Departamento de Química Biológica, Facultad de Ciencias Exactas y Naturales, Buenos Aires, Argentina
- Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), CONICET, Buenos Aires, Argentina
| | - Ana Gun
- Fundación Huésped, Pasaje Angel Peluffo 3932, C1202ABB, Ciudad Autónoma de Buenos Aires, Argentina
| | - Omar Sued
- Fundación Huésped, Pasaje Angel Peluffo 3932, C1202ABB, Ciudad Autónoma de Buenos Aires, Argentina
| | - María M. Avila
- Instituto de Investigaciones Biomédicas en Retrovirus y Sida (INBIRS), Facultad de Medicina, Universidad de Buenos Aires-CONICET, Buenos Aires, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| | - Diego Flichman
- Instituto de Investigaciones Biomédicas en Retrovirus y Sida (INBIRS), Facultad de Medicina, Universidad de Buenos Aires-CONICET, Buenos Aires, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| | - M. Victoria Delpino
- Instituto de Inmunología, Genética y Metabolismo (INIGEM), Universidad de Buenos Aires-CONICET, Buenos Aires, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| | - Jorge Quarleri
- Instituto de Investigaciones Biomédicas en Retrovirus y Sida (INBIRS), Facultad de Medicina, Universidad de Buenos Aires-CONICET, Buenos Aires, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| |
Collapse
|
13
|
Correa-Fiz F, Franzo G, Llorens A, Huerta E, Sibila M, Kekarainen T, Segalés J. Porcine circovirus 2 (PCV2) population study in experimentally infected pigs developing PCV2-systemic disease or a subclinical infection. Sci Rep 2020; 10:17747. [PMID: 33082419 PMCID: PMC7576782 DOI: 10.1038/s41598-020-74627-3] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Accepted: 10/01/2020] [Indexed: 02/08/2023] Open
Abstract
Porcine circovirus 2 (PCV2) is a single stranded DNA virus with one of the highest mutation rates among DNA viruses. This ability allows it to generate a cloud of mutants constantly providing new opportunities to adapt and evade the immune system. This pig pathogen is associated to many diseases, globally called porcine circovirus diseases (PCVD) and has been a threat to pig industry since its discovery in the early 90's. Although 11 ORFs have been predicted from its genome, only two main proteins have been deeply characterized, i.e. Rep and Cap. The structural Cap protein possesses the majority of the epitopic determinants of this non-enveloped virus. The evolution of PCV2 is affected by both natural and vaccine-induced immune responses, which enhances the genetic variability, especially in the most immunogenic Cap region. Intra-host variability has been also demonstrated in infected animals where long-lasting infections can take place. However, the association between this intra-host variability and pathogenesis has never been studied for this virus. Here, the within-host PCV2 variability was monitored over time by next generation sequencing during an experimental infection, demonstrating the presence of large heterogeneity. Remarkably, the level of quasispecies diversity, affecting particularly the Cap coding region, was statistically different depending on viremia levels and clinical signs detected after infection. Moreover, we proved the existence of hyper mutant subjects harboring a remarkably higher number of genetic variants. Altogether, these results suggest an interaction between genetic diversity, host immune system and disease severity.
Collapse
Affiliation(s)
- Florencia Correa-Fiz
- Centre de Recerca en Sanitat Animal (CReSA, IRTA-UAB), IRTA, Bellaterra, Spain. .,OIE Collaborating Centre for the Research and Control of Emerging and Re-Emerging Swine Diseases in Europe (IRTA-CReSA), Bellaterra, Barcelona, Spain.
| | - Giovanni Franzo
- Department of Animal Medicine, Production and Health (MAPS), University of Padua, Legnaro, PD, Italy
| | - Anna Llorens
- Centre de Recerca en Sanitat Animal (CReSA, IRTA-UAB), IRTA, Bellaterra, Spain.,OIE Collaborating Centre for the Research and Control of Emerging and Re-Emerging Swine Diseases in Europe (IRTA-CReSA), Bellaterra, Barcelona, Spain
| | - Eva Huerta
- Centre de Recerca en Sanitat Animal (CReSA, IRTA-UAB), IRTA, Bellaterra, Spain.,OIE Collaborating Centre for the Research and Control of Emerging and Re-Emerging Swine Diseases in Europe (IRTA-CReSA), Bellaterra, Barcelona, Spain
| | - Marina Sibila
- Centre de Recerca en Sanitat Animal (CReSA, IRTA-UAB), IRTA, Bellaterra, Spain.,OIE Collaborating Centre for the Research and Control of Emerging and Re-Emerging Swine Diseases in Europe (IRTA-CReSA), Bellaterra, Barcelona, Spain
| | - Tuija Kekarainen
- Centre de Recerca en Sanitat Animal (CReSA, IRTA-UAB), IRTA, Bellaterra, Spain.,Kuopio Center for Gene and Cell Therapy, Microkatu 1, Kuopio, Finland
| | - Joaquim Segalés
- Centre de Recerca en Sanitat Animal (CReSA, IRTA-UAB), IRTA, Bellaterra, Spain.,OIE Collaborating Centre for the Research and Control of Emerging and Re-Emerging Swine Diseases in Europe (IRTA-CReSA), Bellaterra, Barcelona, Spain.,Departament de Sanitat i Anatomia Animals, Facultat de Veterinària, UAB, Bellaterra, Spain
| |
Collapse
|
14
|
Streamlined Subpopulation, Subtype, and Recombination Analysis of HIV-1 Half-Genome Sequences Generated by High-Throughput Sequencing. mSphere 2020; 5:5/5/e00551-20. [PMID: 33055255 PMCID: PMC7565892 DOI: 10.1128/msphere.00551-20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
The highly recombinogenic nature of human immunodeficiency virus type 1 (HIV-1) leads to recombination and emergence of quasispecies. It is important to reliably identify subpopulations to understand the complexity of a viral population for drug resistance surveillance and vaccine development. High-throughput sequencing (HTS) provides improved resolution over Sanger sequencing for the analysis of heterogeneous viral subpopulations. However, current methods of analysis of HTS reads are unable to fully address accurate population reconstruction. Hence, there is a dire need for a more sensitive, accurate, user-friendly, and cost-effective method to analyze viral quasispecies. For this purpose, we have improved the HIVE-hexahedron algorithm that we previously developed with in silico short sequences to analyze raw HTS short reads. The significance of this study is that our standalone algorithm enables a streamlined analysis of quasispecies, subtype, and recombination patterns from long HIV-1 genome regions without the need of additional sequence analysis tools. Distinct viral populations and recombination patterns identified by HIVE-hexahedron are further validated by comparison with sequences obtained by single genome sequencing (SGS). High-throughput sequencing (HTS) has been widely used to characterize HIV-1 genome sequences. There are no algorithms currently that can directly determine genotype and quasispecies population using short HTS reads generated from long genome sequences without additional software. To establish a robust subpopulation, subtype, and recombination analysis workflow, we amplified the HIV-1 3′-half genome from plasma samples of 65 HIV-1-infected individuals and sequenced the entire amplicon (∼4,500 bp) by HTS. With direct analysis of raw reads using HIVE-hexahedron, we showed that 48% of samples harbored 2 to 13 subpopulations. We identified various subtypes (17 A1s, 4 Bs, 27 Cs, 6 CRF02_AGs, and 11 unique recombinant forms) and defined recombinant breakpoints of 10 recombinants. These results were validated with viral genome sequences generated by single genome sequencing (SGS) or the analysis of consensus sequence of the HTS reads. The HIVE-hexahedron workflow is more sensitive and accurate than just evaluating the consensus sequence and also more cost-effective than SGS. IMPORTANCE The highly recombinogenic nature of human immunodeficiency virus type 1 (HIV-1) leads to recombination and emergence of quasispecies. It is important to reliably identify subpopulations to understand the complexity of a viral population for drug resistance surveillance and vaccine development. High-throughput sequencing (HTS) provides improved resolution over Sanger sequencing for the analysis of heterogeneous viral subpopulations. However, current methods of analysis of HTS reads are unable to fully address accurate population reconstruction. Hence, there is a dire need for a more sensitive, accurate, user-friendly, and cost-effective method to analyze viral quasispecies. For this purpose, we have improved the HIVE-hexahedron algorithm that we previously developed with in silico short sequences to analyze raw HTS short reads. The significance of this study is that our standalone algorithm enables a streamlined analysis of quasispecies, subtype, and recombination patterns from long HIV-1 genome regions without the need of additional sequence analysis tools. Distinct viral populations and recombination patterns identified by HIVE-hexahedron are further validated by comparison with sequences obtained by single genome sequencing (SGS).
Collapse
|
15
|
Cacciabue M, Currá A, Carrillo E, König G, Gismondi MI. A beginner's guide for FMDV quasispecies analysis: sub-consensus variant detection and haplotype reconstruction using next-generation sequencing. Brief Bioinform 2020; 21:1766-1775. [PMID: 31697321 PMCID: PMC7110011 DOI: 10.1093/bib/bbz086] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Revised: 06/18/2019] [Accepted: 06/19/2019] [Indexed: 12/18/2022] Open
Abstract
Deep sequencing of viral genomes is a powerful tool to study RNA virus complexity. However, the analysis of next-generation sequencing data might be challenging for researchers who have never approached the study of viral quasispecies by this methodology. In this work we present a suitable and affordable guide to explore the sub-consensus variability and to reconstruct viral quasispecies from Illumina sequencing data. The guide includes a complete analysis pipeline along with user-friendly descriptions of software and file formats. In addition, we assessed the feasibility of the workflow proposed by analyzing a set of foot-and-mouth disease viruses (FMDV) with different degrees of variability. This guide introduces the analysis of quasispecies of FMDV and other viruses through this kind of approach.
Collapse
Affiliation(s)
- Marco Cacciabue
- Instituto de Agrobiotecnología y Biología Molecular (IABiMo, INTA-CONICET), Hurlingham, Argentina
- Departamento de Ciencias Básicas, Universidad Nacional de Luján, Luján, Argentina
| | - Anabella Currá
- Instituto de Agrobiotecnología y Biología Molecular (IABiMo, INTA-CONICET), Hurlingham, Argentina
- Departamento de Ciencias Básicas, Universidad Nacional de Luján, Luján, Argentina
| | - Elisa Carrillo
- Instituto de Agrobiotecnología y Biología Molecular (IABiMo, INTA-CONICET), Hurlingham, Argentina
| | - Guido König
- Instituto de Agrobiotecnología y Biología Molecular (IABiMo, INTA-CONICET), Hurlingham, Argentina
| | - María Inés Gismondi
- Instituto de Agrobiotecnología y Biología Molecular (IABiMo, INTA-CONICET), Hurlingham, Argentina
- Departamento de Ciencias Básicas, Universidad Nacional de Luján, Luján, Argentina
| |
Collapse
|
16
|
Eliseev A, Gibson KM, Avdeyev P, Novik D, Bendall ML, Pérez-Losada M, Alexeev N, Crandall KA. Evaluation of haplotype callers for next-generation sequencing of viruses. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2020; 82:104277. [PMID: 32151775 PMCID: PMC7293574 DOI: 10.1016/j.meegid.2020.104277] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/04/2019] [Revised: 03/04/2020] [Accepted: 03/06/2020] [Indexed: 01/30/2023]
Abstract
Currently, the standard practice for assembling next-generation sequencing (NGS) reads of viral genomes is to summarize thousands of individual short reads into a single consensus sequence, thus confounding useful intra-host diversity information for molecular phylodynamic inference. It is hypothesized that a few viral strains may dominate the intra-host genetic diversity with a variety of lower frequency strains comprising the rest of the population. Several software tools currently exist to convert NGS sequence variants into haplotypes. Previous benchmarks of viral haplotype reconstruction programs used simulation scenarios that are useful from a mathematical perspective but do not reflect viral evolution and epidemiology. Here, we tested twelve NGS haplotype reconstruction methods using viral populations simulated under realistic evolutionary dynamics. We simulated coalescent-based populations that spanned known levels of viral genetic diversity, including mutation rates, sample size and effective population size, to test the limits of the haplotype reconstruction methods and to ensure coverage of predicted intra-host viral diversity levels (especially HIV-1). All twelve investigated haplotype callers showed variable performance and produced drastically different results that were mainly driven by differences in mutation rate and, to a lesser extent, in effective population size. Most methods were able to accurately reconstruct haplotypes when genetic diversity was low. However, under higher levels of diversity (e.g., those seen intra-host HIV-1 infections), haplotype reconstruction quality was highly variable and, on average, poor. All haplotype reconstruction tools, except QuasiRecomb and ShoRAH, greatly underestimated intra-host diversity and the true number of haplotypes. PredictHaplo outperformed, in regard to highest precision, recall, and lowest UniFrac distance values, the other haplotype reconstruction tools followed by CliqueSNV, which, given more computational time, may have outperformed PredictHaplo. Here, we present an extensive comparison of available viral haplotype reconstruction tools and provide insights for future improvements in haplotype reconstruction tools using both short-read and long-read technologies.
Collapse
Affiliation(s)
- Anton Eliseev
- Computer Technologies Laboratory, ITMO University, Saint-Petersburg, Russia
| | - Keylie M Gibson
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA.
| | - Pavel Avdeyev
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA; Department of Mathematics, George Washington University, Washington, DC, USA
| | - Dmitry Novik
- Computer Technologies Laboratory, ITMO University, Saint-Petersburg, Russia
| | - Matthew L Bendall
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA
| | - Marcos Pérez-Losada
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA; Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, George Washington University, Washington, DC, USA; CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Campus Agrário de Vairão, Vairão, Portugal
| | - Nikita Alexeev
- Computer Technologies Laboratory, ITMO University, Saint-Petersburg, Russia
| | - Keith A Crandall
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA; Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, George Washington University, Washington, DC, USA
| |
Collapse
|
17
|
Li X, Saadat S, Hu H, Li X. BHap: a novel approach for bacterial haplotype reconstruction. Bioinformatics 2020; 35:4624-4631. [PMID: 31004480 DOI: 10.1093/bioinformatics/btz280] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Revised: 03/07/2019] [Accepted: 04/13/2019] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION The bacterial haplotype reconstruction is critical for selecting proper treatments for diseases caused by unknown haplotypes. Existing methods and tools do not work well on this task, because they are usually developed for viral instead of bacterial populations. RESULTS In this study, we developed BHap, a novel algorithm based on fuzzy flow networks, for reconstructing bacterial haplotypes from next generation sequencing data. Tested on simulated and experimental datasets, we showed that BHap was capable of reconstructing haplotypes of bacterial populations with an average F1 score of 0.87, an average precision of 0.87 and an average recall of 0.88. We also demonstrated that BHap had a low susceptibility to sequencing errors, was capable of reconstructing haplotypes with low coverage and could handle a wide range of mutation rates. Compared with existing approaches, BHap outperformed them in terms of higher F1 scores, better precision, better recall and more accurate estimation of the number of haplotypes. AVAILABILITY AND IMPLEMENTATION The BHap tool is available at http://www.cs.ucf.edu/∼xiaoman/BHap/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xin Li
- Department of Computer Science, College of Medicine, University of Central Florida, Orlando, FL 32816, USA
| | - Samaneh Saadat
- Department of Computer Science, College of Medicine, University of Central Florida, Orlando, FL 32816, USA
| | - Haiyan Hu
- Department of Computer Science, College of Medicine, University of Central Florida, Orlando, FL 32816, USA
| | - Xiaoman Li
- Burnett School of Biomedical Science, College of Medicine, University of Central Florida, Orlando, FL 32816, USA
| |
Collapse
|
18
|
Inferring Transmission Bottleneck Size from Viral Sequence Data Using a Novel Haplotype Reconstruction Method. J Virol 2020; 94:JVI.00014-20. [PMID: 32295920 PMCID: PMC7307158 DOI: 10.1128/jvi.00014-20] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Accepted: 04/08/2020] [Indexed: 12/12/2022] Open
Abstract
Viral populations undergo a repeated cycle of within-host growth followed by transmission. Viral evolution is affected by each stage of this cycle. The number of viral particles transmitted from one host to another, known as the transmission bottleneck, is an important factor in determining how the evolutionary dynamics of the population play out, restricting the extent to which the evolved diversity of the population can be passed from one host to another. Previous study of viral sequence data has suggested that the transmission bottleneck size for influenza A transmission between human hosts is small. Reevaluating these data using a novel and improved method, we largely confirm this result, albeit that we infer a slightly higher bottleneck size in some cases, of between 1 and 13 virions. While a tight bottleneck operates in human influenza transmission, it is not extreme in nature; some diversity can be meaningfully retained between hosts. The transmission bottleneck is defined as the number of viral particles that transmit from one host to establish an infection in another. Genome sequence data have been used to evaluate the size of the transmission bottleneck between humans infected with the influenza virus; however, the methods used to make these estimates have some limitations. Specifically, viral allele frequencies, which form the basis of many calculations, may not fully capture a process which involves the transmission of entire viral genomes. Here, we set out a novel approach for inferring viral transmission bottlenecks; our method combines an algorithm for haplotype reconstruction with maximum likelihood methods for bottleneck inference. This approach allows for rapid calculation and performs well when applied to data from simulated transmission events; errors in the haplotype reconstruction step did not adversely affect inferences of the population bottleneck. Applied to data from a previous household transmission study of influenza A infection, we confirm the result that the majority of transmission events involve a small number of viruses, albeit with slightly looser bottlenecks being inferred, with between 1 and 13 particles transmitted in the majority of cases. While influenza A transmission involves a tight population bottleneck, the bottleneck is not so tight as to universally prevent the transmission of within-host viral diversity. IMPORTANCE Viral populations undergo a repeated cycle of within-host growth followed by transmission. Viral evolution is affected by each stage of this cycle. The number of viral particles transmitted from one host to another, known as the transmission bottleneck, is an important factor in determining how the evolutionary dynamics of the population play out, restricting the extent to which the evolved diversity of the population can be passed from one host to another. Previous study of viral sequence data has suggested that the transmission bottleneck size for influenza A transmission between human hosts is small. Reevaluating these data using a novel and improved method, we largely confirm this result, albeit that we infer a slightly higher bottleneck size in some cases, of between 1 and 13 virions. While a tight bottleneck operates in human influenza transmission, it is not extreme in nature; some diversity can be meaningfully retained between hosts.
Collapse
|
19
|
Wang M, Li J, Zhang X, Han Y, Yu D, Zhang D, Yuan Z, Yang Z, Huang J, Zhang X. An integrated software for virus community sequencing data analysis. BMC Genomics 2020; 21:363. [PMID: 32414327 PMCID: PMC7227348 DOI: 10.1186/s12864-020-6744-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Accepted: 04/21/2020] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND A virus community is the spectrum of viral strains populating an infected host, which plays a key role in pathogenesis and therapy response in viral infectious diseases. However automatic and dedicated pipeline for interpreting virus community sequencing data has not been developed yet. RESULTS We developed Quasispecies Analysis Package (QAP), an integrated software platform to address the problems associated with making biological interpretations from massive viral population sequencing data. QAP provides quantitative insight into virus ecology by first introducing the definition "virus OTU" and supports a wide range of viral community analyses and results visualizations. Various forms of QAP were developed in consideration of broader users, including a command line, a graphical user interface and a web server. Utilities of QAP were thoroughly evaluated with high-throughput sequencing data from hepatitis B virus, hepatitis C virus, influenza virus and human immunodeficiency virus, and the results showed highly accurate viral quasispecies characteristics related to biological phenotypes. CONCLUSIONS QAP provides a complete solution for virus community high throughput sequencing data analysis, and it would facilitate the easy analysis of virus quasispecies in clinical applications.
Collapse
Affiliation(s)
- Mingjie Wang
- Research Laboratory of Clinical Virology, Ruijin Hospital, Shanghai Jiaotong University, School of Medicine, Shanghai, 200025, China
| | - Jianfeng Li
- State Key Laboratory of Medical Genomics, Shanghai Institute of Hematology, Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, 200025, China
| | - Xiaonan Zhang
- Key Lab of Medicine Molecular Virology of MOE/MOH, Shanghai Medical School, Fudan University, Shanghai, 200032, China
| | - Yue Han
- Research Laboratory of Clinical Virology, Ruijin Hospital, Shanghai Jiaotong University, School of Medicine, Shanghai, 200025, China
| | - Demin Yu
- Research Laboratory of Clinical Virology, Ruijin Hospital, Shanghai Jiaotong University, School of Medicine, Shanghai, 200025, China
| | - Donghua Zhang
- Research Laboratory of Clinical Virology, Ruijin Hospital, Shanghai Jiaotong University, School of Medicine, Shanghai, 200025, China
| | - Zhenghong Yuan
- Key Lab of Medicine Molecular Virology of MOE/MOH, Shanghai Medical School, Fudan University, Shanghai, 200032, China
| | - Zhitao Yang
- Emergency Department, Ruijin Hospital, Shanghai Jiaotong University, School of Medicine, Shanghai, 200025, China.
| | - Jinyan Huang
- State Key Laboratory of Medical Genomics, Shanghai Institute of Hematology, Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, 200025, China.
| | - Xinxin Zhang
- Research Laboratory of Clinical Virology, Ruijin Hospital, Shanghai Jiaotong University, School of Medicine, Shanghai, 200025, China. .,Clinical Research Center, Ruijin Hospital North, Shanghai Jiaotong University, School of Medicine, Shanghai, 201821, China.
| |
Collapse
|
20
|
Yoest JM, Shirai CL, Duncavage EJ. Sequencing-Based Measurable Residual Disease Testing in Acute Myeloid Leukemia. Front Cell Dev Biol 2020; 8:249. [PMID: 32457898 PMCID: PMC7225302 DOI: 10.3389/fcell.2020.00249] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Accepted: 03/24/2020] [Indexed: 12/31/2022] Open
Abstract
Next generation sequencing (NGS) methods have allowed for unprecedented genomic characterization of acute myeloid leukemia (AML) over the last several years. Further advances in NGS-based methods including error correction using unique molecular identifiers (UMIs) have more recently enabled the use of NGS-based measurable residual disease (MRD) detection. This review focuses on the use of NGS-based MRD detection in AML, including basic methodologies and clinical applications.
Collapse
Affiliation(s)
- Jennifer M Yoest
- Department of Pathology, Case Western Reserve University, Cleveland, OH, United States
| | - Cara Lunn Shirai
- Department of Pathology and Immunology, Washington University in St. Louis, St. Louis, MO, United States
| | - Eric J Duncavage
- Department of Pathology and Immunology, Washington University in St. Louis, St. Louis, MO, United States
| |
Collapse
|
21
|
Katsiani A, Stainton D, Lamour K, Tzanetakis IE. The population structure of Rose rosette virus in the USA. J Gen Virol 2020; 101:676-684. [PMID: 32375952 DOI: 10.1099/jgv.0.001418] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Rose rosette virus (RRV) (genus Emaravirus) is the causal agent of the homonymous disease, the most destructive malady of roses in the USA. Although the importance of the disease is recognized, little sequence information and no full genomes are available for RRV, a multi-segmented RNA virus. To better understand the population structure of the virus we implemented a Hi-Plex PCR amplicon high-throughput sequencing approach to sequence all 7 segments and to quantify polymorphisms in 91 RRV isolates collected from 16 states in the USA. Analysis revealed insertion/deletion (indel) polymorphisms primarily in the 5' and 3' non-coding, but also within coding regions, including some resulting in changes of protein length. Phylogenetic analysis showed little geographical structuring, suggesting that topography does not have a strong influence on virus evolution. Overall, the virus populations were homogeneous, possibly because of regular movement of plants, the recent emergence of RRV and/or because the virus is under strong purification selection to preserve its integrity and biological functions.
Collapse
Affiliation(s)
- Asimina Katsiani
- Department of Entomology and Plant Pathology, Division of Agriculture, University of Arkansas System, Fayetteville AR 72701, USA
| | - Daisy Stainton
- Department of Entomology and Plant Pathology, Division of Agriculture, University of Arkansas System, Fayetteville AR 72701, USA
| | - Kurt Lamour
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN 37996, USA
| | - Ioannis E Tzanetakis
- Department of Entomology and Plant Pathology, Division of Agriculture, University of Arkansas System, Fayetteville AR 72701, USA
| |
Collapse
|
22
|
Pérez-Losada M, Arenas M, Galán JC, Bracho MA, Hillung J, García-González N, González-Candelas F. High-throughput sequencing (HTS) for the analysis of viral populations. INFECTION GENETICS AND EVOLUTION 2020; 80:104208. [PMID: 32001386 DOI: 10.1016/j.meegid.2020.104208] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Revised: 01/21/2020] [Accepted: 01/24/2020] [Indexed: 12/12/2022]
Abstract
The development of High-Throughput Sequencing (HTS) technologies is having a major impact on the genomic analysis of viral populations. Current HTS platforms can capture nucleic acid variation across millions of genes for both selected amplicons and full viral genomes. HTS has already facilitated the discovery of new viruses, hinted new taxonomic classifications and provided a deeper and broader understanding of their diversity, population and genetic structure. Hence, HTS has already replaced standard Sanger sequencing in basic and applied research fields, but the next step is its implementation as a routine technology for the analysis of viruses in clinical settings. The most likely application of this implementation will be the analysis of viral genomics, because the huge population sizes, high mutation rates and very fast replacement of viral populations have demonstrated the limited information obtained with Sanger technology. In this review, we describe new technologies and provide guidelines for the high-throughput sequencing and genetic and evolutionary analyses of viral populations and metaviromes, including software applications. With the development of new HTS technologies, new and refurbished molecular and bioinformatic tools are also constantly being developed to process and integrate HTS data. These allow assembling viral genomes and inferring viral population diversity and dynamics. Finally, we also present several applications of these approaches to the analysis of viral clinical samples including transmission clusters and outbreak characterization.
Collapse
Affiliation(s)
- Marcos Pérez-Losada
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA; CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Campus Agrário de Vairão, Vairão 4485-661, Portugal
| | - Miguel Arenas
- Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain; Biomedical Research Center (CINBIO), University of Vigo, 36310 Vigo, Spain.
| | - Juan Carlos Galán
- Microbiology Service, Hospital Ramón y Cajal, Madrid, Spain; CIBER in Epidemiology and Public Health, Spain.
| | - Mª Alma Bracho
- CIBER in Epidemiology and Public Health, Spain; Joint Research Unit "Infection and Public Health" FISABIO-University of Valencia, Valencia, Spain.
| | - Julia Hillung
- Joint Research Unit "Infection and Public Health" FISABIO-University of Valencia, Valencia, Spain; Institute for Integrative Systems Biology (I2SysBio), CSIC-University of Valencia, Valencia, Spain.
| | - Neris García-González
- Joint Research Unit "Infection and Public Health" FISABIO-University of Valencia, Valencia, Spain; Institute for Integrative Systems Biology (I2SysBio), CSIC-University of Valencia, Valencia, Spain.
| | - Fernando González-Candelas
- CIBER in Epidemiology and Public Health, Spain; Joint Research Unit "Infection and Public Health" FISABIO-University of Valencia, Valencia, Spain; Institute for Integrative Systems Biology (I2SysBio), CSIC-University of Valencia, Valencia, Spain.
| |
Collapse
|
23
|
Guermouche H, Burrel S, Mercier-Darty M, Kofman T, Rogier O, Pawlotsky JM, Boutolleau D, Rodriguez C. Characterization of the dynamics of human cytomegalovirus resistance to antiviral drugs by ultra-deep sequencing. Antiviral Res 2019; 173:104647. [PMID: 31706899 DOI: 10.1016/j.antiviral.2019.104647] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Revised: 09/30/2019] [Accepted: 11/04/2019] [Indexed: 12/12/2022]
Abstract
Prophylactic or preemptive treatment strategies are required to prevent human cytomegalovirus (CMV) infections in transplant recipients. However, treatment failure occurs when CMV resistant-associated variants (RAVs) are selected. Although the diversity of CMV is lower than that of RNA viruses, CMV appears to show some genetic instability, with possible minor emerging resistance that may be undetectable by Sanger sequencing. We aimed to examine CMV-resistance mutations over time by ultra-deep sequencing (UDS) and Sanger sequencing in a kidney transplant recipient experiencing CMV infection. This patient showed a transient response to three different antiviral drugs (valganciclovir, foscarnet, and maribavir) and four episodes of CMV resistance over two years. The full-length UL97 (2.3kpb) and partial UL54 (2.4kpb) CMV genes were studied by UDS and Sanger sequencing and linkage mutations calculated to determine RAVs. We detected four major and five minor resistance mutations. Minor resistant variants (2-20%) were detected by UDS, whereas major resistance substitutions (>20%) were identified by both UDS and Sanger method. We detected cross-resistance to three drugs, despite high CMV loads, suggesting that the fitness of the viral mutants was not impaired. In conclusion, CMV showed complex dynamic of resistance under antiviral drug pressure, as described for highly variable viruses. The emergence of successive RAVs constitutes a clinically challenging complication and contributes to the difficulty of therapeutic management of patients.
Collapse
Affiliation(s)
- Hélène Guermouche
- Laboratoire de Virologie, CHU Henri Mondor (AP-HP), INSERM U955 Eq18, Plateforme « Génomiques », IMRB, UPEC, Créteil, France
| | - Sonia Burrel
- Centre National de Référence Herpèsvirus (laboratoire associé), Laboratoire de Virologie, Hôpital Universitaire La Pitié-Salpêtrière, GHU AP-PH. Sorbonne Université (AP-HP), INSERM U1136, iPLESP, Sorbonne Université, Paris, France
| | - Mélanie Mercier-Darty
- Laboratoire de Virologie, CHU Henri Mondor (AP-HP), INSERM U955 Eq18, Plateforme « Génomiques », IMRB, UPEC, Créteil, France
| | - Thomas Kofman
- Service de Néphrologie, Hôpital Universitaire Henri Mondor (AP-HP), Créteil, France
| | - Olivier Rogier
- Laboratoire de Virologie, CHU Henri Mondor (AP-HP), INSERM U955 Eq18, Plateforme « Génomiques », IMRB, UPEC, Créteil, France
| | - Jean-Michel Pawlotsky
- Laboratoire de Virologie, CHU Henri Mondor (AP-HP), INSERM U955 Eq18, Plateforme « Génomiques », IMRB, UPEC, Créteil, France
| | - David Boutolleau
- Centre National de Référence Herpèsvirus (laboratoire associé), Laboratoire de Virologie, Hôpital Universitaire La Pitié-Salpêtrière, GHU AP-PH. Sorbonne Université (AP-HP), INSERM U1136, iPLESP, Sorbonne Université, Paris, France
| | - Christophe Rodriguez
- Laboratoire de Virologie, CHU Henri Mondor (AP-HP), INSERM U955 Eq18, Plateforme « Génomiques », IMRB, UPEC, Créteil, France.
| |
Collapse
|
24
|
Intra-host dynamics and co-receptor usage of HIV-1 quasi-species in vertically infected patients with phenotypic switch. INFECTION GENETICS AND EVOLUTION 2019; 78:104066. [PMID: 31698113 DOI: 10.1016/j.meegid.2019.104066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2019] [Revised: 10/05/2019] [Accepted: 10/09/2019] [Indexed: 11/20/2022]
Abstract
HIV-1 infection through vertical transmission provides a good model to evaluate intra-host viral evolution and allows to gain insight into the dynamics of viral populations. Our aim was to assess the diversity and dynamics of X4- and R5-using HIV-1 variants in vertically infected children who presented a switch in SI/ NSI phenotype in MT-2 cell assays during chronic infection. Through molecular cloning and next generation sequencing of the C2-V5 env fragment, we investigated HIV-1 evolution and co-receptor usage based on V3 loop prediction bioinformatic tools of longitudinal samples obtained from 4 children. In all cases, the phylogenetic relationships were assessed by Maximum-Likelihood trees constructed with MEGA 6.0. In two cases, V3 loop sequences predicted exclusively R5-using and or X4-using strains, while in another two a higher degree of concordance was observed between the phenotypic and genotypic characteristics. In 3 of the 4 cases, C2-V5 env sequences from different time points were intermingled in phylogenetic trees, with no segregation neither by time or tropism. In only one case monophyletic clustering defined groups of sequences with different co-receptor usage. Comparison of amino acid frequency between isolates with SI and NSI phenotype allowed the identification of 9 possible genetic determinants in subtype F C2-V5 region of env associated to SI/ NSI phenotype in these patients, one of which had previously been reported for subtype B. Overall, we found a low degree of correlation between phenotypic and genotypic properties of HIV-1 quasispecies in patients under chronic infection. Whether HIV-1 subtype or other factors influence the evolution of HIV-1 in vivo will require further research.
Collapse
|
25
|
Chen J, Zhao Y, Sun Y. De novo haplotype reconstruction in viral quasispecies using paired-end read guided path finding. Bioinformatics 2019; 34:2927-2935. [PMID: 29617936 DOI: 10.1093/bioinformatics/bty202] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2017] [Accepted: 04/02/2018] [Indexed: 12/29/2022] Open
Abstract
Motivation RNA virus populations contain different but genetically related strains, all infecting an individual host. Reconstruction of the viral haplotypes is a fundamental step to characterize the virus population, predict their viral phenotypes and finally provide important information for clinical treatment and prevention. Advances of the next-generation sequencing technologies open up new opportunities to assemble full-length haplotypes. However, error-prone short reads, high similarities between related strains, an unknown number of haplotypes pose computational challenges for reference-free haplotype reconstruction. There is still much room to improve the performance of existing haplotype assembly tools. Results In this work, we developed a de novo haplotype reconstruction tool named PEHaplo, which employs paired-end reads to distinguish highly similar strains for viral quasispecies data. It was applied on both simulated and real quasispecies data, and the results were benchmarked against several recently published de novo haplotype reconstruction tools. The comparison shows that PEHaplo outperforms the benchmarked tools in a comprehensive set of metrics. Availability and implementation The source code and the documentation of PEHaplo are available at https://github.com/chjiao/PEHaplo. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jiao Chen
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, USA
| | - Yingchao Zhao
- School of Computing and Information Sciences, Caritas Institute of Higher Education, Hong Kong, China
| | - Yanni Sun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, USA
| |
Collapse
|
26
|
González R, Wu B, Li X, Martínez F, Elena SF. Mutagenesis Scanning Uncovers Evolutionary Constraints on Tobacco Etch Potyvirus Membrane-Associated 6K2 Protein. Genome Biol Evol 2019; 11:1207-1222. [PMID: 30918938 PMCID: PMC6482416 DOI: 10.1093/gbe/evz069] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/26/2019] [Indexed: 12/30/2022] Open
Abstract
RNA virus high mutation rate is a double-edged sword. At the one side, most mutations jeopardize proteins functions; at the other side, mutations are needed to fuel adaptation. The relevant question then is the ratio between beneficial and deleterious mutations. To evaluate this ratio, we created a mutant library of the 6K2 gene of tobacco etch potyvirus that contains every possible single-nucleotide substitution. 6K2 protein anchors the virus replication complex to the network of endoplasmic reticulum membranes. The library was inoculated into the natural host Nicotiana tabacum, allowing competition among all these mutants and selection of those that are potentially viable. We identified 11 nonsynonymous mutations that remain in the viral population at measurable frequencies and evaluated their fitness. Some had fitness values higher than the wild-type and some were deleterious. The effect of these mutations in the structure, transmembrane properties, and function of 6K2 was evaluated in silico. In parallel, the effect of these mutations in infectivity, virus accumulation, symptoms development, and subcellular localization was evaluated in the natural host. The α-helix H1 in the N-terminal part of 6K2 turned out to be under purifying selection, while most observed mutations affect the link between transmembrane α-helices H2 and H3, fusing them into a longer helix and increasing its rigidity. In general, these changes are associated with higher within-host fitness and development of milder or no symptoms. This finding suggests that in nature selection upon 6K2 may result from a tradeoff between within-host accumulation and severity of symptoms.
Collapse
Affiliation(s)
- Rubén González
- Instituto de Biología Integrativa de Sistemas (I2SysBio), CSIC-Universitat de València, València, Spain
| | - Beilei Wu
- Instituto de Biología Molecular y Celular de Plantas (IBMCP), CSIC-Universitat Politècnica de València, València, Spain.,Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Xianghua Li
- Systems Biology Program, Centre de Regulació Genòmica (CRG), The Barcelona Institute of Science and Technology, PRBB, Barcelona, Spain
| | - Fernando Martínez
- Instituto de Biología Molecular y Celular de Plantas (IBMCP), CSIC-Universitat Politècnica de València, València, Spain
| | - Santiago F Elena
- Instituto de Biología Integrativa de Sistemas (I2SysBio), CSIC-Universitat de València, València, Spain.,The Santa Fe Institute, Santa Fe, New Mexico
| |
Collapse
|
27
|
Henningsson R, Moratorio G, Bordería AV, Vignuzzi M, Fontes M. DISSEQT-DIStribution-based modeling of SEQuence space Time dynamics. Virus Evol 2019; 5:vez028. [PMID: 31392032 PMCID: PMC6680062 DOI: 10.1093/ve/vez028] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Rapidly evolving microbes are a challenge to model because of the volatile, complex, and dynamic nature of their populations. We developed the DISSEQT pipeline (DIStribution-based SEQuence space Time dynamics) for analyzing, visualizing, and predicting the evolution of heterogeneous biological populations in multidimensional genetic space, suited for population-based modeling of deep sequencing and high-throughput data. The pipeline is openly available on GitHub (https://github.com/rasmushenningsson/DISSEQT.jl, accessed 23 June 2019) and Synapse (https://www.synapse.org/#!Synapse: syn11425758, accessed 23 June 2019), covering the entire workflow from read alignment to visualization of results. Our pipeline is centered around robust dimension and model reduction algorithms for analysis of genotypic data with additional capabilities for including phenotypic features to explore dynamic genotype-phenotype maps. We illustrate its utility and capacity with examples from evolving RNA virus populations, which present one of the highest degrees of genetic heterogeneity within a given population found in nature. Using our pipeline, we empirically reconstruct the evolutionary trajectories of evolving populations in sequence space and genotype-phenotype fitness landscapes. We show that while sequence space is vastly multidimensional, the relevant genetic space of evolving microbial populations is of intrinsically low dimension. In addition, evolutionary trajectories of these populations can be faithfully monitored to identify the key minority genotypes contributing most to evolution. Finally, we show that empirical fitness landscapes, when reconstructed to include minority variants, can predict phenotype from genotype with high accuracy.
Collapse
Affiliation(s)
- R Henningsson
- The Centre for Mathematical Sciences, Lund University, Sweden
- Viral Populations and Pathogenesis Unit, Institut Pasteur, Paris, France
- The International Group for Data Analysis, Institut Pasteur, Paris, France
- Division of Clinical Genetics, Lund University, Sweden
| | - G Moratorio
- Viral Populations and Pathogenesis Unit, Institut Pasteur, Paris, France
- Laboratorio de Virología Molecular, Universidad de la República, Montevideo, Uruguay
| | - A V Bordería
- The International Group for Data Analysis, Institut Pasteur, Paris, France
| | - M Vignuzzi
- Viral Populations and Pathogenesis Unit, Institut Pasteur, Paris, France
| | - M Fontes
- The International Group for Data Analysis, Institut Pasteur, Paris, France
- Department of Cancer Immunology, Genentech, South San Francisco, CA, USA
- The Center for Genomic Medicine, Rigshospitalet, Copenhagen, Denmark
- Persimune, The Centre of Excellence for Personalized Medicine, Copenhagen, Denmark
| |
Collapse
|
28
|
Baaijens JA, Van der Roest B, Köster J, Stougie L, Schönhuth A. Full-length de novo viral quasispecies assembly through variation graph construction. Bioinformatics 2019; 35:5086-5094. [DOI: 10.1093/bioinformatics/btz443] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2018] [Revised: 04/17/2019] [Accepted: 05/27/2019] [Indexed: 11/14/2022] Open
Abstract
Abstract
Motivation
Viruses populate their hosts as a viral quasispecies: a collection of genetically related mutant strains. Viral quasispecies assembly is the reconstruction of strain-specific haplotypes from read data, and predicting their relative abundances within the mix of strains is an important step for various treatment-related reasons. Reference genome independent (‘de novo’) approaches have yielded benefits over reference-guided approaches, because reference-induced biases can become overwhelming when dealing with divergent strains. While being very accurate, extant de novo methods only yield rather short contigs. The remaining challenge is to reconstruct full-length haplotypes together with their abundances from such contigs.
Results
We present Virus-VG as a de novo approach to viral haplotype reconstruction from preassembled contigs. Our method constructs a variation graph from the short input contigs without making use of a reference genome. Then, to obtain paths through the variation graph that reflect the original haplotypes, we solve a minimization problem that yields a selection of maximal-length paths that is, optimal in terms of being compatible with the read coverages computed for the nodes of the variation graph. We output the resulting selection of maximal length paths as the haplotypes, together with their abundances. Benchmarking experiments on challenging simulated and real datasets show significant improvements in assembly contiguity compared to the input contigs, while preserving low error rates compared to the state-of-the-art viral quasispecies assemblers.
Availability and implementation
Virus-VG is freely available at https://bitbucket.org/jbaaijens/virus-vg.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jasmijn A Baaijens
- Life Sciences and Health Group, Centrum Wiskunde & Informatica, Amsterdam, Netherlands
| | | | - Johannes Köster
- Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
- Medical Oncology, Dana Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Leen Stougie
- Life Sciences and Health Group, Centrum Wiskunde & Informatica, Amsterdam, Netherlands
- Department of Econometrics and Operations Research, Vrije Universiteit, Amsterdam, Netherlands
- INRIA-Erable, Grenoble, France
| | - Alexander Schönhuth
- Life Sciences and Health Group, Centrum Wiskunde & Informatica, Amsterdam, Netherlands
- INRIA-Erable, Grenoble, France
- Theoretical Biology and Bioinformatics, Utrecht University, Utrecht, Netherlands
| |
Collapse
|
29
|
Reconstitution of HIV-1 reservoir following high-dose chemotherapy/autologous stem cell transplantation for lymphoma. AIDS 2019; 33:247-257. [PMID: 30325771 DOI: 10.1097/qad.0000000000002051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVES Autologous stem cell transplantation following high-dose chemotherapy (HDC/ASCT) is the prime model to study the impact of HDC in HIV-1-infected participants. We analyzed the impact of HDC/ASCT on the resurgent reservoir composition and origin. DESIGN We included retrospectively a homogenous group of HIV-1-infected patients treated for high-risk lymphoma in a reference center with similar chemotherapy regimens. METHODS Thirteen participants treated with HDC/ASCT from 2012 to 2015 were included. A median seven longitudinal blood samples per participant were available. Total HIV-1 DNA levels in peripheral blood mononuclear cells (PBMCs) were quantified by quantitative PCR. In six participants with sustained viral suppression, the highly variable C2V3 viral region was subjected to next-generation sequencing. Maximum-likelihood phylogeny trees were generated from the reconstructed viral haplotypes. Lymphocyte subsets were studied by flow cytometry. RESULTS PBMC-associated HIV-1 DNA levels were stable over time. Viral diversity decreased along lymphoma treatment, but increased promptly back to prechemotherapy numbers after HDC/ASCT. Blood viral populations from all time-points were intermingled in phylogeny trees: the resurgent reservoir was similar to pre-HDC circulating proviruses. Memory subsets were the main contributor to the early restoration of the CD4+ T-cell pool, with a delayed increase in naïve cell counts. CONCLUSIONS The characterization of HIV-1 reservoir in blood revealed a fast and consistent replenishment from memory CD4+ T cells after HDC/ASCT. As HDC/ASCT is increasingly involved in HIV cure trials with gene-modified hematopoietic stem cells, the management of infected T cells in HIV-positive autologous transplants will be critical.
Collapse
|
30
|
Barik S, Das S, Vikalo H. QSdpR: Viral quasispecies reconstruction via correlation clustering. Genomics 2018; 110:375-381. [DOI: 10.1016/j.ygeno.2017.12.007] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2017] [Revised: 12/03/2017] [Accepted: 12/13/2017] [Indexed: 02/05/2023]
|
31
|
Wong TKF, Ranjard L, Lin Y, Rodrigo AG. HaploJuice : accurate haplotype assembly from a pool of sequences with known relative concentrations. BMC Bioinformatics 2018; 19:389. [PMID: 30348075 PMCID: PMC6198429 DOI: 10.1186/s12859-018-2424-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2018] [Accepted: 10/09/2018] [Indexed: 11/10/2022] Open
Abstract
Background Pooling techniques, where multiple sub-samples are mixed in a single sample, are widely used to take full advantage of high-throughput DNA sequencing. Recently, Ranjard et al. (PLoS ONE 13:0195090, 2018) proposed a pooling strategy without the use of barcodes. Three sub-samples were mixed in different known proportions (i.e. 62.5%, 25% and 12.5%), and a method was developed to use these proportions to reconstruct the three haplotypes effectively. Results HaploJuice provides an alternative haplotype reconstruction algorithm for Ranjard et al.’s pooling strategy. HaploJuice significantly increases the accuracy by first identifying the empirical proportions of the three mixed sub-samples and then assembling the haplotypes using a dynamic programming approach. HaploJuice was evaluated against five different assembly algorithms, Hmmfreq (Ranjard et al., PLoS ONE 13:0195090, 2018), ShoRAH (Zagordi et al., BMC Bioinformatics 12:119, 2011), SAVAGE (Baaijens et al., Genome Res 27:835-848, 2017), PredictHaplo (Prabhakaran et al., IEEE/ACM Trans Comput Biol Bioinform 11:182-91, 2014) and QuRe (Prosperi and Salemi, Bioinformatics 28:132-3, 2012). Using simulated and real data sets, HaploJuice reconstructed the true sequences with the highest coverage and the lowest error rate. Conclusion HaploJuice provides high accuracy in haplotype reconstruction, making Ranjard et al.’s pooling strategy more efficient, feasible, and applicable, with the benefit of reducing the sequencing cost.
Collapse
Affiliation(s)
- Thomas K F Wong
- The Research School of Biology, The Australian National University, Acton ACT, 2601, Australia.
| | - Louis Ranjard
- The Research School of Biology, The Australian National University, Acton ACT, 2601, Australia
| | - Yu Lin
- College of Engineering and Computer Science, The Australian National University, Acton ACT, 2601, Australia
| | - Allen G Rodrigo
- The Research School of Biology, The Australian National University, Acton ACT, 2601, Australia
| |
Collapse
|
32
|
Correa-Fiz F, Franzo G, Llorens A, Segalés J, Kekarainen T. Porcine circovirus 2 (PCV-2) genetic variability under natural infection scenario reveals a complex network of viral quasispecies. Sci Rep 2018; 8:15469. [PMID: 30341330 PMCID: PMC6195574 DOI: 10.1038/s41598-018-33849-2] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2018] [Accepted: 10/04/2018] [Indexed: 11/09/2022] Open
Abstract
Porcine circovirus 2 (PCV-2) is a virus characterized by a high evolutionary rate, promoting the potential emergence of different genotypes and strains. Despite the likely relevance in the emergence of new PCV-2 variants, the subtle evolutionary patterns of PCV-2 at the individual-host level or over short transmission chains are still largely unknown. This study aimed to analyze the within-host genetic variability of PCV-2 subpopulations to unravel the forces driving PCV-2 evolution. A longitudinal weekly sampling was conducted on individual animals located in three farms after the first PCV-2 detection. The analysis of polymorphisms evaluated throughout the full PCV-2 genome demonstrated the presence of several single nucleotide polymorphisms (SNPs) especially in the genome region encoding for the capsid gene. The global haplotype reconstruction allowed inferring the virus transmission network over time, suggesting a relevant within-farm circulation. Evidences of co-infection and recombination involving multiple PCV-2 genotypes were found after mixing with pigs originating from other sources. The present study demonstrates the remarkable within-host genetic variability of PCV-2 quasispecies, suggesting the role of the natural selection induced by the host immune response in driving PCV-2 evolution. Moreover, the effect of pig management in multiple genotype coinfections occurrence and recombination likelihood was demonstrated.
Collapse
Affiliation(s)
| | - Giovanni Franzo
- Department of Animal Medicine, Production and Health (MAPS), University of Padua, Legnaro, PD, Italy
| | - Anna Llorens
- Centre de Recerca en Sanitat Animal (CReSA, IRTA-UAB), IRTA, Bellaterra, Spain
| | - Joaquim Segalés
- Centre de Recerca en Sanitat Animal (CReSA, IRTA-UAB), IRTA, Bellaterra, Spain.,Departament de Sanitat i Anatomia Animals, Facultat de Veterinària, UAB, Bellaterra, Spain
| | - Tuija Kekarainen
- Centre de Recerca en Sanitat Animal (CReSA, IRTA-UAB), IRTA, Bellaterra, Spain.,Kuopio Center for Gene and Cell Therapy, Microkatu 1, Kuopio, Finland
| |
Collapse
|
33
|
Ahn S, Ke Z, Vikalo H. Viral quasispecies reconstruction via tensor factorization with successive read removal. Bioinformatics 2018; 34:i23-i31. [PMID: 29949976 PMCID: PMC6022648 DOI: 10.1093/bioinformatics/bty291] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Motivation As RNA viruses mutate and adapt to environmental changes, often developing resistance to anti-viral vaccines and drugs, they form an ensemble of viral strains--a viral quasispecies. While high-throughput sequencing (HTS) has enabled in-depth studies of viral quasispecies, sequencing errors and limited read lengths render the problem of reconstructing the strains and estimating their spectrum challenging. Inference of viral quasispecies is difficult due to generally non-uniform frequencies of the strains, and is further exacerbated when the genetic distances between the strains are small. Results This paper presents TenSQR, an algorithm that utilizes tensor factorization framework to analyze HTS data and reconstruct viral quasispecies characterized by highly uneven frequencies of its components. Fundamentally, TenSQR performs clustering with successive data removal to infer strains in a quasispecies in order from the most to the least abundant one; every time a strain is inferred, sequencing reads generated from that strain are removed from the dataset. The proposed successive strain reconstruction and data removal enables discovery of rare strains in a population and facilitates detection of deletions in such strains. Results on simulated datasets demonstrate that TenSQR can reconstruct full-length strains having widely different abundances, generally outperforming state-of-the-art methods at diversities 1-10% and detecting long deletions even in rare strains. A study on a real HIV-1 dataset demonstrates that TenSQR outperforms competing methods in experimental settings as well. Finally, we apply TenSQR to analyze a Zika virus sample and reconstruct the full-length strains it contains. Availability and implementation TenSQR is available at https://github.com/SoYeonA/TenSQR. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Soyeon Ahn
- Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX, USA
| | - Ziqi Ke
- Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX, USA
| | - Haris Vikalo
- Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
34
|
aBayesQR: A Bayesian Method for Reconstruction of Viral Populations Characterized by Low Diversity. J Comput Biol 2018; 25:637-648. [DOI: 10.1089/cmb.2017.0249] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
|
35
|
NGS combined with phylogenetic analysis to detect HIV-1 dual infection in Romanian people who inject drugs. Microbes Infect 2018; 20:308-311. [PMID: 29626632 DOI: 10.1016/j.micinf.2018.03.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2017] [Revised: 02/24/2018] [Accepted: 03/21/2018] [Indexed: 11/23/2022]
Abstract
Dual HIV infections are possible and likely in people who inject drugs (PWID). Thirty-eight newly diagnosed patients, 19 PWID and 19 heterosexually HIV infected were analyzed. V2V3 loop of HIV-1 env gene was sequenced on the NGS platform 454 GSJunior (Roche). HIV-1 dual/multiple infections were identified in five PWID. For three of these patients, the reconstructed variants belonged to pure F1 subtype and CRF14_BG strains according to phylogenetic analysis. New recombinant forms between these parental strains were identified in two PWID samples. NGS data can provide, with the help of phylogenetic analysis, important insights about the intra-host sub-population structure.
Collapse
|
36
|
Leviyang S, Griva I, Ita S, Johnson WE. A penalized regression approach to haplotype reconstruction of viral populations arising in early HIV/SIV infection. Bioinformatics 2018; 33:2455-2463. [PMID: 28379346 DOI: 10.1093/bioinformatics/btx187] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2016] [Accepted: 03/29/2017] [Indexed: 12/14/2022] Open
Abstract
Motivation Next generation sequencing (NGS) has been increasingly applied to characterize viral evolution during HIV and SIV infections. In particular, NGS datasets sampled during the initial months of infection are characterized by relatively low levels of diversity as well as convergent evolution at multiple loci dispersed across the viral genome. Consequently, fully characterizing viral evolution from NGS datasets requires haplotype reconstruction across large regions of the viral genome. Existing haplotype reconstruction algorithms have not been developed with the particular characteristics of early HIV/SIV infection in mind, raising the possibility that better performance could be achieved through a specifically designed algorithm. Results Here, we introduce a haplotype reconstruction algorithm, RegressHaplo, specifically designed for low diversity and convergent evolution regimes. The algorithm uses a penalized regression that balances a data fitting term with a penalty term that encourages solutions with few haplotypes. The regression covariates are a large set of potential haplotypes and fitting the regression is made computationally feasible by the low diversity setting. Using simulated and in vivo datasets, we compare RegressHaplo to PredictHaplo and QuRe, two existing haplotype reconstruction algorithms. RegressHaplo performs better than these algorithms on simulated datasets with relatively low diversity levels. We suggest RegressHaplo as a novel tool for the investigation of early infection HIV/SIV datasets and, more generally, low diversity viral NGS datasets. Contact sr286@georgetown.edu. Availability and Implementation https://github.com/SLeviyang/RegressHaplo.
Collapse
Affiliation(s)
- Sivan Leviyang
- Department of Mathematics and Statistics, Georgetown University, Washington DC, 20057, USA
| | - Igor Griva
- Department of Mathematics, George Mason University, Fairfax, VA 22030, USA
| | - Sergio Ita
- Department of Medicine, University of California - San Diego, La Jolla, CA 92093, USA
| | - Welkin E Johnson
- Department of Biology, Boston College, Chestnut Hill, MA 02467, USA
| |
Collapse
|
37
|
Abstract
BACKGROUND Haplotype assembly is the task of reconstructing haplotypes of an individual from a mixture of sequenced chromosome fragments. Haplotype information enables studies of the effects of genetic variations on an organism's phenotype. Most of the mathematical formulations of haplotype assembly are known to be NP-hard and haplotype assembly becomes even more challenging as the sequencing technology advances and the length of the paired-end reads and inserts increases. Assembly of haplotypes polyploid organisms is considerably more difficult than in the case of diploids. Hence, scalable and accurate schemes with provable performance are desired for haplotype assembly of both diploid and polyploid organisms. RESULTS We propose a framework that formulates haplotype assembly from sequencing data as a sparse tensor decomposition. We cast the problem as that of decomposing a tensor having special structural constraints and missing a large fraction of its entries into a product of two factors, U and [Formula: see text]; tensor [Formula: see text] reveals haplotype information while U is a sparse matrix encoding the origin of erroneous sequencing reads. An algorithm, AltHap, which reconstructs haplotypes of either diploid or polyploid organisms by iteratively solving this decomposition problem is proposed. The performance and convergence properties of AltHap are theoretically analyzed and, in doing so, guarantees on the achievable minimum error correction scores and correct phasing rate are established. The developed framework is applicable to diploid, biallelic and polyallelic polyploid species. The code for AltHap is freely available from https://github.com/realabolfazl/AltHap . CONCLUSION AltHap was tested in a number of different scenarios and was shown to compare favorably to state-of-the-art methods in applications to haplotype assembly of diploids, and significantly outperforms existing techniques when applied to haplotype assembly of polyploids.
Collapse
Affiliation(s)
- Abolfazl Hashemi
- Department of ECE, University of Texas at Austin, Austin, Texas, USA
| | - Banghua Zhu
- EE Department, Tsinghua University, Beijing, China
| | - Haris Vikalo
- Department of ECE, University of Texas at Austin, Austin, Texas, USA
| |
Collapse
|
38
|
Karagiannis K, Simonyan V, Chumakov K, Mazumder R. Separation and assembly of deep sequencing data into discrete sub-population genomes. Nucleic Acids Res 2017; 45:10989-11003. [PMID: 28977510 PMCID: PMC5737798 DOI: 10.1093/nar/gkx755] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2016] [Accepted: 08/16/2017] [Indexed: 12/15/2022] Open
Abstract
Sequence heterogeneity is a common characteristic of RNA viruses that is often referred to as sub-populations or quasispecies. Traditional techniques used for assembly of short sequence reads produced by deep sequencing, such as de-novo assemblers, ignore the underlying diversity. Here, we introduce a novel algorithm that simultaneously assembles discrete sequences of multiple genomes present in populations. Using in silico data we were able to detect populations at as low as 0.1% frequency with complete global genome reconstruction and in a single sample detected 16 resolved sequences with no mismatches. We also applied the algorithm to high throughput sequencing data obtained for viruses present in sewage samples and successfully detected multiple sub-populations and recombination events in these diverse mixtures. High sensitivity of the algorithm also enables genomic analysis of heterogeneous pathogen genomes from patient samples and accurate detection of intra-host diversity, enabling not just basic research in personalized medicine but also accurate diagnostics and monitoring drug therapies, which are critical in clinical and regulatory decision-making process.
Collapse
Affiliation(s)
- Konstantinos Karagiannis
- Department of Biochemistry and Molecular Medicine, George Washington University Medical Center, Washington, DC 20037, USA.,Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Spring, MD 20993, USA
| | - Vahan Simonyan
- Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Spring, MD 20993, USA
| | - Konstantin Chumakov
- Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Spring, MD 20993, USA
| | - Raja Mazumder
- Department of Biochemistry and Molecular Medicine, George Washington University Medical Center, Washington, DC 20037, USA.,McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA
| |
Collapse
|
39
|
Stefic K, Salmona M, Capitao M, Splittgerber M, Maakaroun-Vermesse Z, Néré ML, Bernard L, Chaix ML, Barin F, Delaugerre C. Unravelling the dynamics of selection of multiresistant variants to integrase inhibitors in an HIV-1-infected child using ultra-deep sequencing. J Antimicrob Chemother 2017; 72:850-854. [PMID: 27999055 DOI: 10.1093/jac/dkw507] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2016] [Accepted: 10/26/2016] [Indexed: 12/20/2022] Open
Abstract
Background Ultra-deep sequencing (UDS) allows detection of minority resistant variants (MRVs) with a threshold of 1% and could be useful to identify variants harbouring single or multiple drug-resistance mutations (DRMs). Objectives We analysed the integrase gene region longitudinally using UDS in an HIV-1-infected child rapidly failing a raltegravir-based regimen. Methods Longitudinal plasma samples at baseline and weeks 4, 8, 13, 17 and 39 were obtained, as well as the mother's baseline plasma sample. Sanger sequencing and UDS were performed on the integrase gene using Roche 454 GS-Junior. A bioinformatic workflow was developed to identify the major DRMs, accessory mutations and the linkage between mutations. Results In Sanger sequencing and UDS, no MRV in the integrase gene was detected at baseline in either the mother or the child. The major DRM N155H conferring resistance to raltegravir and elvitegravir was detected in 4% of the sequences by week 4 using UDS, whereas it was not detected by Sanger sequencing. The double mutant E92Q + N155H, conferring resistance to the entire integrase inhibitor class, including dolutegravir, emerged at week 8 (16%) and became rapidly dominant (57% by week 13). At the last timepoint under raltegravir (week 17), Y143R emerged, leading to different resistance mutation patterns: single mutants N155H (47%) and Y143R (24%) and double mutants E92Q + N155H (13%), Y143R + N155H (2%) and E92Q + Y143R (2%). The polymorphic substitution M50I was preferentially selected on resistant variants, especially on E92Q + N155H variants. Conclusions This case study illustrates the usefulness of UDS in detecting early MRVs and determining the dynamics of selected HIV-1 variants in longitudinal analysis.
Collapse
Affiliation(s)
- Karl Stefic
- Inserm U966, Université François Rabelais, Tours, France.,Laboratoire de Bactériologie-Virologie & Centre National de Référence du VIH, CHU Bretonneau, Tours, France
| | - Maud Salmona
- Inserm U941, Université Paris Diderot, Paris, France.,Laboratoire de Virologie, Hôpital Saint-Louis, APHP, Paris, France
| | - Marisa Capitao
- Inserm U941, Université Paris Diderot, Paris, France.,Laboratoire de Virologie, Hôpital Saint-Louis, APHP, Paris, France
| | - Marion Splittgerber
- Inserm U941, Université Paris Diderot, Paris, France.,Laboratoire de Virologie, Hôpital Saint-Louis, APHP, Paris, France
| | | | - Marie-Laure Néré
- Inserm U941, Université Paris Diderot, Paris, France.,Laboratoire de Virologie, Hôpital Saint-Louis, APHP, Paris, France
| | - Louis Bernard
- CHU Bretonneau, Médecine Interne et Maladies Infectieuses, Tours, France
| | - Marie-Laure Chaix
- Inserm U941, Université Paris Diderot, Paris, France.,Laboratoire de Virologie, Hôpital Saint-Louis, APHP, Paris, France
| | - Francis Barin
- Inserm U966, Université François Rabelais, Tours, France.,Laboratoire de Bactériologie-Virologie & Centre National de Référence du VIH, CHU Bretonneau, Tours, France
| | - Constance Delaugerre
- Inserm U941, Université Paris Diderot, Paris, France.,Laboratoire de Virologie, Hôpital Saint-Louis, APHP, Paris, France
| |
Collapse
|
40
|
Time-Sampled Population Sequencing Reveals the Interplay of Selection and Genetic Drift in Experimental Evolution of Potato Virus Y. J Virol 2017; 91:JVI.00690-17. [PMID: 28592544 PMCID: PMC5533922 DOI: 10.1128/jvi.00690-17] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2017] [Accepted: 05/28/2017] [Indexed: 11/20/2022] Open
Abstract
RNA viruses are one of the fastest-evolving biological entities. Within their hosts, they exist as genetically diverse populations (i.e., viral mutant swarms), which are sculpted by different evolutionary mechanisms, such as mutation, natural selection, and genetic drift, and also the interactions between genetic variants within the mutant swarms. To elucidate the mechanisms that modulate the population diversity of an important plant-pathogenic virus, we performed evolution experiments with Potato virus Y (PVY) in potato genotypes that differ in their defense response against the virus. Using deep sequencing of small RNAs, we followed the temporal dynamics of standing and newly generated variations in the evolving viral lineages. A time-sampled approach allowed us to (i) reconstruct theoretical haplotypes in the starting population by using clustering of single nucleotide polymorphisms' trajectories and (ii) use quantitative population genetics approaches to estimate the contribution of selection and genetic drift, and their interplay, to the evolution of the virus. We detected imprints of strong selective sweeps and narrow genetic bottlenecks, followed by the shift in frequency of selected haplotypes. Comparison of patterns of viral evolution in differently susceptible host genotypes indicated possible diversifying evolution of PVY in the less-susceptible host (efficient in the accumulation of salicylic acid).IMPORTANCE High diversity of within-host populations of RNA viruses is an important aspect of their biology, since they represent a reservoir of genetic variants, which can enable quick adaptation of viruses to a changing environment. This study focuses on an important plant virus, Potato virus Y, and describes, at high resolution, temporal changes in the structure of viral populations within different potato genotypes. A novel and easy-to-implement computational approach was established to cluster single nucleotide polymorphisms into viral haplotypes from very short sequencing reads. During the experiment, a shift in the frequency of selected viral haplotypes was observed after a narrow genetic bottleneck, indicating an important role of the genetic drift in the evolution of the virus. On the other hand, a possible case of diversifying selection of the virus was observed in less susceptible host genotypes.
Collapse
|
41
|
Kinoti WM, Constable FE, Nancarrow N, Plummer KM, Rodoni B. Analysis of intra-host genetic diversity of Prunus necrotic ringspot virus (PNRSV) using amplicon next generation sequencing. PLoS One 2017; 12:e0179284. [PMID: 28632759 PMCID: PMC5478126 DOI: 10.1371/journal.pone.0179284] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2017] [Accepted: 05/08/2017] [Indexed: 12/28/2022] Open
Abstract
PCR amplicon next generation sequencing (NGS) analysis offers a broadly applicable and targeted approach to detect populations of both high- or low-frequency virus variants in one or more plant samples. In this study, amplicon NGS was used to explore the diversity of the tripartite genome virus, Prunus necrotic ringspot virus (PNRSV) from 53 PNRSV-infected trees using amplicons from conserved gene regions of each of PNRSV RNA1, RNA2 and RNA3. Sequencing of the amplicons from 53 PNRSV-infected trees revealed differing levels of polymorphism across the three different components of the PNRSV genome with a total number of 5040, 2083 and 5486 sequence variants observed for RNA1, RNA2 and RNA3 respectively. The RNA2 had the lowest diversity of sequences compared to RNA1 and RNA3, reflecting the lack of flexibility tolerated by the replicase gene that is encoded by this RNA component. Distinct PNRSV phylo-groups, consisting of closely related clusters of sequence variants, were observed in each of PNRSV RNA1, RNA2 and RNA3. Most plant samples had a single phylo-group for each RNA component. Haplotype network analysis showed that smaller clusters of PNRSV sequence variants were genetically connected to the largest sequence variant cluster within a phylo-group of each RNA component. Some plant samples had sequence variants occurring in multiple PNRSV phylo-groups in at least one of each RNA and these phylo-groups formed distinct clades that represent PNRSV genetic strains. Variants within the same phylo-group of each Prunus plant sample had ≥97% similarity and phylo-groups within a Prunus plant sample and between samples had less ≤97% similarity. Based on the analysis of diversity, a definition of a PNRSV genetic strain was proposed. The proposed definition was applied to determine the number of PNRSV genetic strains in each of the plant samples and the complexity in defining genetic strains in multipartite genome viruses was explored.
Collapse
Affiliation(s)
- Wycliff M. Kinoti
- Agriculture Victoria, AgriBio, La Trobe University, Melbourne, VIC, Australia
- School of Applied Systems Biology, AgriBio, La Trobe University, Melbourne, VIC, Australia
| | - Fiona E. Constable
- Agriculture Victoria, AgriBio, La Trobe University, Melbourne, VIC, Australia
| | - Narelle Nancarrow
- Agriculture Victoria, AgriBio, La Trobe University, Melbourne, VIC, Australia
| | - Kim M. Plummer
- Department of Animal, Plant and Soil Sciences, AgriBio, La Trobe University, Melbourne, VIC, Australia
| | - Brendan Rodoni
- Agriculture Victoria, AgriBio, La Trobe University, Melbourne, VIC, Australia
- School of Applied Systems Biology, AgriBio, La Trobe University, Melbourne, VIC, Australia
| |
Collapse
|
42
|
Baaijens JA, Aabidine AZE, Rivals E, Schönhuth A. De novo assembly of viral quasispecies using overlap graphs. Genome Res 2017; 27:835-848. [PMID: 28396522 PMCID: PMC5411778 DOI: 10.1101/gr.215038.116] [Citation(s) in RCA: 74] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2016] [Accepted: 03/10/2017] [Indexed: 11/24/2022]
Abstract
A viral quasispecies, the ensemble of viral strains populating an infected person, can be highly diverse. For optimal assessment of virulence, pathogenesis, and therapy selection, determining the haplotypes of the individual strains can play a key role. As many viruses are subject to high mutation and recombination rates, high-quality reference genomes are often not available at the time of a new disease outbreak. We present SAVAGE, a computational tool for reconstructing individual haplotypes of intra-host virus strains without the need for a high-quality reference genome. SAVAGE makes use of either FM-index-based data structures or ad hoc consensus reference sequence for constructing overlap graphs from patient sample data. In this overlap graph, nodes represent reads and/or contigs, while edges reflect that two reads/contigs, based on sound statistical considerations, represent identical haplotypic sequence. Following an iterative scheme, a new overlap assembly algorithm that is based on the enumeration of statistically well-calibrated groups of reads/contigs then efficiently reconstructs the individual haplotypes from this overlap graph. In benchmark experiments on simulated and on real deep-coverage data, SAVAGE drastically outperforms generic de novo assemblers as well as the only specialized de novo viral quasispecies assembler available so far. When run on ad hoc consensus reference sequence, SAVAGE performs very favorably in comparison with state-of-the-art reference genome-guided tools. We also apply SAVAGE on two deep-coverage samples of patients infected by the Zika and the hepatitis C virus, respectively, which sheds light on the genetic structures of the respective viral quasispecies.
Collapse
Affiliation(s)
| | | | - Eric Rivals
- LIRMM, CNRS and Université de Montpellier, 34095 Montpellier, France
- Institut Biologie Computationnelle, CNRS and Université de Montpellier, 34095 Montpellier, France
| | | |
Collapse
|
43
|
aBayesQR: A Bayesian Method for Reconstruction of Viral Populations Characterized by Low Diversity. LECTURE NOTES IN COMPUTER SCIENCE 2017. [DOI: 10.1007/978-3-319-56970-3_22] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
44
|
Skums P, Artyomenko A, Glebova O, Ramachandran S, Campo DS, Dimitrova Z, Măndoiu II, Zelikovsky A, Khudyakov Y. Pooling Strategy for Massive Viral Sequencing. COMPUTATIONAL METHODS FOR NEXT GENERATION SEQUENCING DATA ANALYSIS 2016:57-83. [DOI: 10.1002/9781119272182.ch3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2025]
|
45
|
Ghurye JS, Cepeda-Espinoza V, Pop M. Metagenomic Assembly: Overview, Challenges and Applications. THE YALE JOURNAL OF BIOLOGY AND MEDICINE 2016; 89:353-362. [PMID: 27698619 PMCID: PMC5045144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Advances in sequencing technologies have led to the increased use of high throughput sequencing in characterizing the microbial communities associated with our bodies and our environment. Critical to the analysis of the resulting data are sequence assembly algorithms able to reconstruct genes and organisms from complex mixtures. Metagenomic assembly involves new computational challenges due to the specific characteristics of the metagenomic data. In this survey, we focus on major algorithmic approaches for genome and metagenome assembly, and discuss the new challenges and opportunities afforded by this new field. We also review several applications of metagenome assembly in addressing interesting biological problems.
Collapse
Affiliation(s)
| | | | - Mihai Pop
- To whom all correspondence should be addressed: Mihai Pop, Department of Computer Science and Center of Bioinformatics and Computational Biology, University of Maryland, Center for Bioinformatics and Computational Biology, Biomolecular Sciences Building. Rm. 3120F, College Park, MD 20742, Phone Number: 301-405-7245,
| |
Collapse
|
46
|
Posada-Cespedes S, Seifert D, Beerenwinkel N. Recent advances in inferring viral diversity from high-throughput sequencing data. Virus Res 2016; 239:17-32. [PMID: 27693290 DOI: 10.1016/j.virusres.2016.09.016] [Citation(s) in RCA: 77] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2016] [Revised: 09/23/2016] [Accepted: 09/24/2016] [Indexed: 02/05/2023]
Abstract
Rapidly evolving RNA viruses prevail within a host as a collection of closely related variants, referred to as viral quasispecies. Advances in high-throughput sequencing (HTS) technologies have facilitated the assessment of the genetic diversity of such virus populations at an unprecedented level of detail. However, analysis of HTS data from virus populations is challenging due to short, error-prone reads. In order to account for uncertainties originating from these limitations, several computational and statistical methods have been developed for studying the genetic heterogeneity of virus population. Here, we review methods for the analysis of HTS reads, including approaches to local diversity estimation and global haplotype reconstruction. Challenges posed by aligning reads, as well as the impact of reference biases on diversity estimates are also discussed. In addition, we address some of the experimental approaches designed to improve the biological signal-to-noise ratio. In the future, computational methods for the analysis of heterogeneous virus populations are likely to continue being complemented by technological developments.
Collapse
Affiliation(s)
- Susana Posada-Cespedes
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland; SIB, Basel, Switzerland
| | - David Seifert
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland; SIB, Basel, Switzerland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland; SIB, Basel, Switzerland.
| |
Collapse
|
47
|
Rose R, Constantinides B, Tapinos A, Robertson DL, Prosperi M. Challenges in the analysis of viral metagenomes. Virus Evol 2016; 2:vew022. [PMID: 29492275 PMCID: PMC5822887 DOI: 10.1093/ve/vew022] [Citation(s) in RCA: 59] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Genome sequencing technologies continue to develop with remarkable pace, yet
analytical approaches for reconstructing and classifying viral genomes from
mixed samples remain limited in their performance and usability. Existing
solutions generally target expert users and often have unclear scope, making it
challenging to critically evaluate their performance. There is a growing need
for intuitive analytical tooling for researchers lacking specialist computing
expertise and that is applicable in diverse experimental circumstances. Notable
technical challenges have impeded progress; for example, fragments of viral
genomes are typically orders of magnitude less abundant than those of host,
bacteria, and/or other organisms in clinical and environmental metagenomes;
observed viral genomes often deviate considerably from reference genomes
demanding use of exhaustive alignment approaches; high intrapopulation viral
diversity can lead to ambiguous sequence reconstruction; and finally, the
relatively few documented viral reference genomes compared to the estimated
number of distinct viral taxa renders classification problematic. Various
software tools have been developed to accommodate the unique challenges and use
cases associated with characterizing viral sequences; however, the quality of
these tools varies, and their use often necessitates computing expertise or
access to powerful computers, thus limiting their usefulness to many
researchers. In this review, we consider the general and application-specific
challenges posed by viral sequencing and analysis, outline the landscape of
available tools and methodologies, and propose ways of overcoming the current
barriers to effective analysis.
Collapse
Affiliation(s)
- Rebecca Rose
- BioInfoExperts, Norfolk, VA, USA.,Computational and Evolutionary Biology Faculty of Life Sciences, University of Manchester, Manchester, UK.,Department of Epidemiology, University of Florida, Gainesville, FL, USA
| | - Bede Constantinides
- BioInfoExperts, Norfolk, VA, USA.,Computational and Evolutionary Biology Faculty of Life Sciences, University of Manchester, Manchester, UK.,Department of Epidemiology, University of Florida, Gainesville, FL, USA
| | - Avraam Tapinos
- BioInfoExperts, Norfolk, VA, USA.,Computational and Evolutionary Biology Faculty of Life Sciences, University of Manchester, Manchester, UK.,Department of Epidemiology, University of Florida, Gainesville, FL, USA
| | - David L Robertson
- BioInfoExperts, Norfolk, VA, USA.,Computational and Evolutionary Biology Faculty of Life Sciences, University of Manchester, Manchester, UK.,Department of Epidemiology, University of Florida, Gainesville, FL, USA
| | - Mattia Prosperi
- BioInfoExperts, Norfolk, VA, USA.,Computational and Evolutionary Biology Faculty of Life Sciences, University of Manchester, Manchester, UK.,Department of Epidemiology, University of Florida, Gainesville, FL, USA
| |
Collapse
|
48
|
Lavezzo E, Barzon L, Toppo S, Palù G. Third generation sequencing technologies applied to diagnostic microbiology: benefits and challenges in applications and data analysis. Expert Rev Mol Diagn 2016; 16:1011-23. [PMID: 27453996 DOI: 10.1080/14737159.2016.1217158] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
INTRODUCTION The diagnosis of infectious diseases is among the most successful areas of application of new generation sequencing technologies. The field has seen the development of numerous experimental and analytical approaches for the detection and the fine description of pathogenic and non-pathogenic microorganisms. AREAS COVERED Without claiming to be exhaustive with respect to all applications and methods developed over the years, this review focuses on the advantages and the issues brought by the new technologies, with an eye in particular to third generation sequencing methods. Both experimental procedures and algorithmic strategies are presented, following the most relevant publications which have led to progress in our ability of detecting infectious agents. Expert commentary: The technical advance brought by third generation sequencing platforms has the potential to significantly expand the range of diagnostic tools that will be available to clinicians. Nonetheless, the implementation of these technologies in clinical practice is still far from being actionable and will temporally follow the path undertaken by second generation methods, which still require the setup of standardized pipelines in both wet and dry laboratory procedures.
Collapse
Affiliation(s)
- Enrico Lavezzo
- a Department of Molecular Medicine , University of Padova , Padova , Italy
| | - Luisa Barzon
- a Department of Molecular Medicine , University of Padova , Padova , Italy
| | - Stefano Toppo
- a Department of Molecular Medicine , University of Padova , Padova , Italy
| | - Giorgio Palù
- a Department of Molecular Medicine , University of Padova , Padova , Italy
| |
Collapse
|
49
|
Differences in the Selection Bottleneck between Modes of Sexual Transmission Influence the Genetic Composition of the HIV-1 Founder Virus. PLoS Pathog 2016; 12:e1005619. [PMID: 27163788 PMCID: PMC4862634 DOI: 10.1371/journal.ppat.1005619] [Citation(s) in RCA: 75] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2015] [Accepted: 04/18/2016] [Indexed: 01/18/2023] Open
Abstract
Due to the stringent population bottleneck that occurs during sexual HIV-1 transmission, systemic infection is typically established by a limited number of founder viruses. Elucidation of the precise forces influencing the selection of founder viruses may reveal key vulnerabilities that could aid in the development of a vaccine or other clinical interventions. Here, we utilize deep sequencing data and apply a genetic distance-based method to investigate whether the mode of sexual transmission shapes the nascent founder viral genome. Analysis of 74 acute and early HIV-1 infected subjects revealed that 83% of men who have sex with men (MSM) exhibit a single founder virus, levels similar to those previously observed in heterosexual (HSX) transmission. In a metadata analysis of a total of 354 subjects, including HSX, MSM and injecting drug users (IDU), we also observed no significant differences in the frequency of single founder virus infections between HSX and MSM transmissions. However, comparison of HIV-1 envelope sequences revealed that HSX founder viruses exhibited a greater number of codon sites under positive selection, as well as stronger transmission indices possibly reflective of higher fitness variants. Moreover, specific genetic “signatures” within MSM and HSX founder viruses were identified, with single polymorphisms within gp41 enriched among HSX viruses while more complex patterns, including clustered polymorphisms surrounding the CD4 binding site, were enriched in MSM viruses. While our findings do not support an influence of the mode of sexual transmission on the number of founder viruses, they do demonstrate that there are marked differences in the selection bottleneck that can significantly shape their genetic composition. This study illustrates the complex dynamics of the transmission bottleneck and reveals that distinct genetic bottleneck processes exist dependent upon the mode of HIV-1 transmission. While the global spread of HIV-1 has been fueled by sexual transmission the genetic determinants underlying the transmission bottleneck remains poorly understood. Here we characterized founder virus population diversity from next generation sequencing data in a cohort of 74 acute and early HIV-1 infected individuals. We observe that the risk of multi-variant infection in men-who-have-sex-with-men (MSM) is not greater than that observed for heterosexuals (HSX), contrary to reports of higher rates of multiple founder virus infections in higher-risk MSM transmissions. These findings were further supported through a metadata analysis of 354 acute and early HIV-1 subjects. We did, however, observe differences between HSM and MSM founder viruses, including a higher selection barrier in HSX transmission with founder viruses being more cohort consensus-like that may be reflective of increased replicative fitness. We also identified a number of residues within Envelope that behave in a risk-dependent manner and could be key for HIV-1 transmission. These novel insights improve our understanding of the HIV-1 transmission bottleneck and underscore the differential selective pressures that founder viruses within the two major transmission risk groups are subjected to.
Collapse
|
50
|
Persistent HIV-1 replication maintains the tissue reservoir during therapy. Nature 2016; 530:51-56. [PMID: 26814962 PMCID: PMC4865637 DOI: 10.1038/nature16933] [Citation(s) in RCA: 508] [Impact Index Per Article: 56.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2015] [Accepted: 12/18/2015] [Indexed: 12/11/2022]
Abstract
Lymphoid tissue is a key reservoir established by HIV-1 during acute infection. It is a site associated with viral production, storage of viral particles in immune complexes, and viral persistence. Although combinations of antiretroviral drugs usually suppress viral replication and reduce viral RNA to undetectable levels in blood, it is unclear whether treatment fully suppresses viral replication in lymphoid tissue reservoirs. Here we show that virus evolution and trafficking between tissue compartments continues in patients with undetectable levels of virus in their bloodstream. We present a spatial and dynamic model of persistent viral replication and spread that indicates why the development of drug resistance is not a foregone conclusion under conditions in which drug concentrations are insufficient to completely block virus replication. These data provide new insights into the evolutionary and infection dynamics of the virus population within the host, revealing that HIV-1 can continue to replicate and replenish the viral reservoir despite potent antiretroviral therapy.
Collapse
|