1
|
Ortigas-Vasquez A, Szpara M. Embracing Complexity: What Novel Sequencing Methods Are Teaching Us About Herpesvirus Genomic Diversity. Annu Rev Virol 2024; 11:67-87. [PMID: 38848592 DOI: 10.1146/annurev-virology-100422-010336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2024]
Abstract
The arrival of novel sequencing technologies throughout the past two decades has led to a paradigm shift in our understanding of herpesvirus genomic diversity. Previously, herpesviruses were seen as a family of DNA viruses with low genomic diversity. However, a growing body of evidence now suggests that herpesviruses exist as dynamic populations that possess standing variation and evolve at much faster rates than previously assumed. In this review, we explore how strategies such as deep sequencing, long-read sequencing, and haplotype reconstruction are allowing scientists to dissect the genomic composition of herpesvirus populations. We also discuss the challenges that need to be addressed before a detailed picture of herpesvirus diversity can emerge.
Collapse
Affiliation(s)
- Alejandro Ortigas-Vasquez
- Departments of Biology and of Biochemistry and Molecular Biology; Center for Infectious Disease Dynamics; and Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania, USA;
| | - Moriah Szpara
- Departments of Biology and of Biochemistry and Molecular Biology; Center for Infectious Disease Dynamics; and Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania, USA;
| |
Collapse
|
2
|
Verchot J, Herath V, Jordan R, Hammond J. Genetic Diversity among Rose Rosette Virus Isolates: A Roadmap towards Studies of Gene Function and Pathogenicity. Pathogens 2023; 12:pathogens12050707. [PMID: 37242377 DOI: 10.3390/pathogens12050707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Revised: 04/11/2023] [Accepted: 05/10/2023] [Indexed: 05/28/2023] Open
Abstract
The phylogenetic relationships of ninety-five rose rosette virus (RRV) isolates with full-length genomic sequences were analyzed. These isolates were recovered mostly from commercial roses that are vegetatively propagated rather than grown from seed. First, the genome segments were concatenated, and the maximum likelihood (ML) tree shows that the branches arrange independent of their geographic origination. There were six major groups of isolates, with 54 isolates in group 6 and distributed in two subgroups. An analysis of nucleotide diversity across the concatenated isolates showed lower genetic differences among RNAs encoding the core proteins required for encapsidation than the latter genome segments. Recombination breakpoints were identified near the junctions of several genome segments, suggesting that the genetic exchange of segments contributes to differences among isolates. The ML analysis of individual RNA segments revealed different relationship patterns among isolates, which supports the notion of genome reassortment. We tracked the branch positions of two newly sequenced isolates to highlight how genome segments relate to segments of other isolates. RNA6 has an interesting pattern of single-nucleotide mutations that appear to influence amino acid changes in the protein products derived from ORF6a and ORF6b. The P6a proteins were typically 61 residues, although three isolates encoded P6a proteins truncated to 29 residues, and four proteins extended 76-94 residues. Homologous P5 and P7 proteins appear to be evolving independently. These results suggest greater diversity among RRV isolates than previously recognized.
Collapse
Affiliation(s)
- Jeanmarie Verchot
- Department of Plant Pathology & Microbiology, Texas A&M University, College Station, TX 77845, USA
| | - Venura Herath
- Department of Agriculture Biology, Faculty of Agriculture, University of Peradeniya, Peradeniya 20400, Sri Lanka
| | - Ramon Jordan
- Floral and Nursery Plants Research Unit, US National Arboretum, United States Department of Agriculture, Agriculture Research Service, Beltsville, MD 20705, USA
| | - John Hammond
- Floral and Nursery Plants Research Unit, US National Arboretum, United States Department of Agriculture, Agriculture Research Service, Beltsville, MD 20705, USA
| |
Collapse
|
3
|
Yan T, Li G, Zhou D, Hu L, Hao X, Li R, Wang G, Cheng Z. Long read sequencing revealed proventricular virome of broiler chicken with transmission viral proventriculitis. BMC Vet Res 2022; 18:253. [PMID: 35768837 PMCID: PMC9241223 DOI: 10.1186/s12917-022-03339-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Accepted: 06/07/2022] [Indexed: 11/20/2022] Open
Abstract
Background Transmissible viral proventriculitis (TVP) causes significant economic loss to the poultry industry. However, the exact causative agents are obscure. Here we examine the virome of proventriculus from specified pathogen free (SPF) chickens that reproduced by infection of proventricular homogenate from broiler chicken with TVP using long read sequencing of the Pacific Biosciences RSII platform. The normal SPF chickens were used as control. Results Our investigation reveals a virome of proventriculitis, including three Gyrovirus genera of the Aneloviridae: Gyrovirus homsa1 (GyH1) (also known as Gyrovirus 3, GyV3) (n = 2662), chicken anemia virus (CAV) (n = 482) and Gyrovirus galga1 (GyG1) (also known as avian Gyrovirus 2, AGV2) (n = 11); a plethora of novel CRESS viral genomes (n = 26) and a novel genomovirus. The 27 novel viruses were divided into three clusters. Phylogenetic analysis showed that the GyH1 strain was more closely related to the strains from chicken (MG366592) than mammalian (human and cat), the GyG1 strain was closely related to the strains from cat in China (MK089245) and from chicken in Brazil (HM590588), and the CAV strain was more closely related to the strains from Germany (AJ297684) and United Kingdom (U66304) than that previously found in China. Conclusion In this study, we revealed that Gyrovirus virome showed high abundance in chickens with TVP, suggesting their potential role in TVP, especially GyH1. This study is expected to contribute to the knowledge of the etiology of TVP. Supplementary Information The online version contains supplementary material available at 10.1186/s12917-022-03339-9.
Collapse
Affiliation(s)
- Tianxing Yan
- Present Address: College of Veterinary Medicine, Shandong Agricultural University, Shandong Provence, Tai'an, 271018, China
| | - Gen Li
- Present Address: College of Veterinary Medicine, Shandong Agricultural University, Shandong Provence, Tai'an, 271018, China.,College of Veterinary Medicine, Qingdao Agricultural University, Qingdao, 266000, China
| | - Defang Zhou
- Present Address: College of Veterinary Medicine, Shandong Agricultural University, Shandong Provence, Tai'an, 271018, China
| | - Liping Hu
- Animal Epidemic Prevention and Control Center of Shandong Province, Jinan, China
| | - Xiaojing Hao
- Animal Husbandry and Veterinary Research Institute of Qingdao, Qingdao, China
| | - Ruiqi Li
- Present Address: College of Veterinary Medicine, Shandong Agricultural University, Shandong Provence, Tai'an, 271018, China
| | - Guihua Wang
- Present Address: College of Veterinary Medicine, Shandong Agricultural University, Shandong Provence, Tai'an, 271018, China
| | - Ziqiang Cheng
- Present Address: College of Veterinary Medicine, Shandong Agricultural University, Shandong Provence, Tai'an, 271018, China.
| |
Collapse
|
4
|
Molina-Mora JA, Cordero-Laurent E, Calderón-Osorno M, Chacón-Ramírez E, Duarte-Martínez F. Metagenomic pipeline for identifying co-infections among distinct SARS-CoV-2 variants of concern: study cases from Alpha to Omicron. Sci Rep 2022; 12:9377. [PMID: 35672431 PMCID: PMC9172093 DOI: 10.1038/s41598-022-13113-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Accepted: 05/03/2022] [Indexed: 01/04/2023] Open
Abstract
Concomitant infection or co-infection with distinct SARS-CoV-2 genotypes has been reported as part of the epidemiological surveillance of the COVID-19 pandemic. In the context of the spread of more transmissible variants during 2021, co-infections are not only important due to the possible changes in the clinical outcome, but also the chance to generate new genotypes by recombination. However, a few approaches have developed bioinformatic pipelines to identify co-infections. Here we present a metagenomic pipeline based on the inference of multiple fragments similar to amplicon sequence variant (ASV-like) from sequencing data and a custom SARS-CoV-2 database to identify the concomitant presence of divergent SARS-CoV-2 genomes, i.e., variants of concern (VOCs). This approach was compared to another strategy based on whole-genome (metagenome) assembly. Using single or pairs of sequencing data of COVID-19 cases with distinct SARS-CoV-2 VOCs, each approach was used to predict the VOC classes (Alpha, Beta, Gamma, Delta, Omicron or non-VOC and their combinations). The performance of each pipeline was assessed using the ground-truth or expected VOC classes. Subsequently, the ASV-like pipeline was used to analyze 1021 cases of COVID-19 from Costa Rica to investigate the possible occurrence of co-infections. After the implementation of the two approaches, an accuracy of 96.2% was revealed for the ASV-like inference approach, which contrasts with the misclassification found (accuracy 46.2%) for the whole-genome assembly strategy. The custom SARS-CoV-2 database used for the ASV-like analysis can be updated according to the appearance of new VOCs to track co-infections with eventual new genotypes. In addition, the application of the ASV-like approach to all the 1021 sequenced samples from Costa Rica in the period October 12th-December 21th 2021 found that none corresponded to co-infections with VOCs. In conclusion, we developed a metagenomic pipeline based on ASV-like inference for the identification of co-infection with distinct SARS-CoV-2 VOCs, in which an outstanding accuracy was achieved. Due to the epidemiological, clinical, and molecular relevance of the concomitant infection with distinct genotypes, this work represents another piece in the process of the surveillance of the COVID-19 pandemic in Costa Rica and worldwide.
Collapse
Affiliation(s)
- Jose Arturo Molina-Mora
- Centro de Investigación en Enfermedades Tropicales (CIET) and Facultad de Microbiología, Universidad de Costa Rica, San José, Costa Rica.
| | - Estela Cordero-Laurent
- Instituto Costarricense de Investigación y Enseñanza en Nutrición y Salud (INCIENSA), Tres Ríos, Cartago, Costa Rica
| | - Melany Calderón-Osorno
- Instituto Costarricense de Investigación y Enseñanza en Nutrición y Salud (INCIENSA), Tres Ríos, Cartago, Costa Rica
| | - Edgar Chacón-Ramírez
- Centro de Investigación en Enfermedades Tropicales (CIET) and Facultad de Microbiología, Universidad de Costa Rica, San José, Costa Rica
| | - Francisco Duarte-Martínez
- Instituto Costarricense de Investigación y Enseñanza en Nutrición y Salud (INCIENSA), Tres Ríos, Cartago, Costa Rica
| |
Collapse
|
5
|
Leigh DM, Peranić K, Prospero S, Cornejo C, Ćurković-Perica M, Kupper Q, Nuskern L, Rigling D, Ježić M. Long-read sequencing reveals the evolutionary drivers of intra-host diversity across natural RNA mycovirus infections. Virus Evol 2021; 7:veab101. [PMID: 35299787 PMCID: PMC8923234 DOI: 10.1093/ve/veab101] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Revised: 11/23/2021] [Accepted: 12/01/2021] [Indexed: 01/05/2023] Open
Abstract
Intra-host dynamics are a core component of virus evolution but most intra-host data come from a narrow range of hosts or experimental infections. Gaining broader information on the intra-host diversity and dynamics of naturally occurring virus infections is essential to our understanding of evolution across the virosphere. Here we used PacBio long-read HiFi sequencing to characterize the intra-host populations of natural infections of the RNA mycovirus Cryphonectria hypovirus 1 (CHV1). CHV1 is a biocontrol agent for the chestnut blight fungus (Cryphonectria parasitica), which co-invaded Europe alongside the fungus. We characterized the mutational and haplotypic intra-host virus diversity of thirty-eight natural CHV1 infections spread across four locations in Croatia and Switzerland. Intra-host CHV1 diversity values were shaped by purifying selection and accumulation of mutations over time as well as epistatic interactions within the host genome at defense loci. Geographical landscape features impacted CHV1 inter-host relationships through restricting dispersal and causing founder effects. Interestingly, a small number of intra-host viral haplotypes showed high sequence similarity across large geographical distances unlikely to be linked by dispersal.
Collapse
Affiliation(s)
- Deborah M Leigh
- Phytopathology, Swiss Federal Institute for Forest, Snow and Landscape Research WSL, Birmensdorf CH-8903, Switzerland
| | - Karla Peranić
- Faculty of Science, University of Zagreb, Zagreb, Grad Zagreb 10000, Croatia
| | - Simone Prospero
- Phytopathology, Swiss Federal Institute for Forest, Snow and Landscape Research WSL, Birmensdorf CH-8903, Switzerland
| | - Carolina Cornejo
- Phytopathology, Swiss Federal Institute for Forest, Snow and Landscape Research WSL, Birmensdorf CH-8903, Switzerland
| | | | | | - Lucija Nuskern
- Faculty of Science, University of Zagreb, Zagreb, Grad Zagreb 10000, Croatia
| | - Daniel Rigling
- Phytopathology, Swiss Federal Institute for Forest, Snow and Landscape Research WSL, Birmensdorf CH-8903, Switzerland
| | - Marin Ježić
- Faculty of Science, University of Zagreb, Zagreb, Grad Zagreb 10000, Croatia
| |
Collapse
|
6
|
Intra-Population Competition during Adaptation to Increased Temperature in an RNA Bacteriophage. Int J Mol Sci 2021; 22:ijms22136815. [PMID: 34202838 PMCID: PMC8268601 DOI: 10.3390/ijms22136815] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Revised: 06/17/2021] [Accepted: 06/22/2021] [Indexed: 01/21/2023] Open
Abstract
Evolution of RNA bacteriophages of the family Leviviridae is governed by the high error rates of their RNA-dependent RNA polymerases. This fact, together with their large population sizes, leads to the generation of highly heterogeneous populations that adapt rapidly to most changes in the environment. Throughout adaptation, the different mutants that make up a viral population compete with each other in a non-trivial process in which their selective values change over time due to the generation of new mutations. In this work we have characterised the intra-population dynamics of a well-studied levivirus, Qβ, when it is propagated at a higher-than-optimal temperature. Our results show that adapting populations experienced rapid changes that involved the ascent of particular genotypes and the loss of some beneficial mutations of early generation. Artificially reconstructed populations, containing a fraction of the diversity present in actual populations, fixed mutations more rapidly, illustrating how population bottlenecks may guide the adaptive pathways. The conclusion is that, when the availability of beneficial mutations under a particular selective condition is elevated, the final outcome of adaptation depends more on the occasional occurrence of population bottlenecks and how mutations combine in genomes than on the selective value of particular mutations.
Collapse
|
7
|
Zhang X, Liu Y, Yu Z, Blumenstein M, Hutvagner G, Li J. Instance-based error correction for short reads of disease-associated genes. BMC Bioinformatics 2021; 22:142. [PMID: 34078284 PMCID: PMC8170817 DOI: 10.1186/s12859-021-04058-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Accepted: 03/02/2021] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Genomic reads from sequencing platforms contain random errors. Global correction algorithms have been developed, aiming to rectify all possible errors in the reads using generic genome-wide patterns. However, the non-uniform sequencing depths hinder the global approach to conduct effective error removal. As some genes may get under-corrected or over-corrected by the global approach, we conduct instance-based error correction for short reads of disease-associated genes or pathways. The paramount requirement is to ensure the relevant reads, instead of the whole genome, are error-free to provide significant benefits for single-nucleotide polymorphism (SNP) or variant calling studies on the specific genes. RESULTS To rectify possible errors in the short reads of disease-associated genes, our novel idea is to exploit local sequence features and statistics directly related to these genes. Extensive experiments are conducted in comparison with state-of-the-art methods on both simulated and real datasets of lung cancer associated genes (including single-end and paired-end reads). The results demonstrated the superiority of our method with the best performance on precision, recall and gain rate, as well as on sequence assembly results (e.g., N50, the length of contig and contig quality). CONCLUSION Instance-based strategy makes it possible to explore fine-grained patterns focusing on specific genes, providing high precision error correction and convincing gene sequence assembly. SNP case studies show that errors occurring at some traditional SNP areas can be accurately corrected, providing high precision and sensitivity for investigations on disease-causing point mutations.
Collapse
Affiliation(s)
- Xuan Zhang
- Advanced Analytics Institute, Faculty of Engineering and IT, University of Technology Sydney, Ultimo, NSW, 2007, Australia
| | - Yuansheng Liu
- Advanced Analytics Institute, Faculty of Engineering and IT, University of Technology Sydney, Ultimo, NSW, 2007, Australia
| | - Zuguo Yu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, 411105, China
| | - Michael Blumenstein
- Faculty of Engineering and IT, University of Technology Sydney, Ultimo, NSW, 2007, Australia
| | - Gyorgy Hutvagner
- Faculty of Engineering and IT, University of Technology Sydney, Ultimo, NSW, 2007, Australia
| | - Jinyan Li
- Advanced Analytics Institute, Faculty of Engineering and IT, University of Technology Sydney, Ultimo, NSW, 2007, Australia.
| |
Collapse
|
8
|
Gelbart M, Harari S, Ben-Ari Y, Kustin T, Wolf D, Mandelboim M, Mor O, Pennings PS, Stern A. Drivers of within-host genetic diversity in acute infections of viruses. PLoS Pathog 2020; 16:e1009029. [PMID: 33147296 PMCID: PMC7668575 DOI: 10.1371/journal.ppat.1009029] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 11/16/2020] [Accepted: 10/04/2020] [Indexed: 12/01/2022] Open
Abstract
Genetic diversity is the fuel of evolution and facilitates adaptation to novel environments. However, our understanding of what drives differences in the genetic diversity during the early stages of viral infection is somewhat limited. Here, we use ultra-deep sequencing to interrogate 43 clinical samples taken from early infections of the human-infecting viruses HIV, RSV and CMV. Hundreds to thousands of virus templates were sequenced per sample, allowing us to reveal dramatic differences in within-host genetic diversity among virus populations. We found that increased diversity was mostly driven by presence of multiple divergent genotypes in HIV and CMV samples, which we suggest reflect multiple transmitted/founder viruses. Conversely, we detected an abundance of low frequency hyper-edited genomes in RSV samples, presumably reflecting defective virus genomes (DVGs). We suggest that RSV is characterized by higher levels of cellular co-infection, which allow for complementation and hence elevated levels of DVGs. The few days or weeks following infection with a virus, termed acute infection, are critical for virus establishment. Here we sought to characterize what leads to differences in the genetic diversity of different viruses sampled during acute infection. We performed ultra-deep sequencing of hundreds to thousands viral genomes from forty-three samples spanning three pathogenic human viruses: HIV, RSV and CMV. We found major differences in the genetic diversity of these different viruses, and in different patients infected with the same virus. We investigated the factors responsible for these differences. We found that the DNA virus CMV was less diverse, most likely since it has a lower mutation rate than the RNA viruses HIV and RSV. We also found that the samples with the highest genetic diversity, which included one CMV sample and two HIV samples, bore evidence for multiple genotype infection. In other words, patients from whom these samples were taken were infected with two different “strains” of the virus. Finally, we also found evidence that viral genomes of HIV, and in particular RSV, are edited by the innate immune system of the host, leading to the presence of defective virus genomes.
Collapse
Affiliation(s)
- Maoz Gelbart
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Sheri Harari
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Ya’ara Ben-Ari
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Talia Kustin
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Dana Wolf
- Clinical Virology Unit, Hadassah Hebrew University Medical Center, Jerusalem, Israel
- The Lautenberg Center for General and Tumor Immunology, IMRIC, the Faculty of Medicine, the Hebrew University, Jerusalem, Israel
| | - Michal Mandelboim
- Central Virology Laboratory, Ministry of Health, Sheba Medical Center, Ramat-Gan, Israel
- Department of Epidemiology and Preventive Medicine, School of Public Health, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Orna Mor
- Central Virology Laboratory, Ministry of Health, Sheba Medical Center, Ramat-Gan, Israel
- Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Pleuni S. Pennings
- Department of Biology, San Francisco State University, San Francisco, California, United States of America
| | - Adi Stern
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
- * E-mail:
| |
Collapse
|
9
|
Streamlined Subpopulation, Subtype, and Recombination Analysis of HIV-1 Half-Genome Sequences Generated by High-Throughput Sequencing. mSphere 2020; 5:5/5/e00551-20. [PMID: 33055255 PMCID: PMC7565892 DOI: 10.1128/msphere.00551-20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
The highly recombinogenic nature of human immunodeficiency virus type 1 (HIV-1) leads to recombination and emergence of quasispecies. It is important to reliably identify subpopulations to understand the complexity of a viral population for drug resistance surveillance and vaccine development. High-throughput sequencing (HTS) provides improved resolution over Sanger sequencing for the analysis of heterogeneous viral subpopulations. However, current methods of analysis of HTS reads are unable to fully address accurate population reconstruction. Hence, there is a dire need for a more sensitive, accurate, user-friendly, and cost-effective method to analyze viral quasispecies. For this purpose, we have improved the HIVE-hexahedron algorithm that we previously developed with in silico short sequences to analyze raw HTS short reads. The significance of this study is that our standalone algorithm enables a streamlined analysis of quasispecies, subtype, and recombination patterns from long HIV-1 genome regions without the need of additional sequence analysis tools. Distinct viral populations and recombination patterns identified by HIVE-hexahedron are further validated by comparison with sequences obtained by single genome sequencing (SGS). High-throughput sequencing (HTS) has been widely used to characterize HIV-1 genome sequences. There are no algorithms currently that can directly determine genotype and quasispecies population using short HTS reads generated from long genome sequences without additional software. To establish a robust subpopulation, subtype, and recombination analysis workflow, we amplified the HIV-1 3′-half genome from plasma samples of 65 HIV-1-infected individuals and sequenced the entire amplicon (∼4,500 bp) by HTS. With direct analysis of raw reads using HIVE-hexahedron, we showed that 48% of samples harbored 2 to 13 subpopulations. We identified various subtypes (17 A1s, 4 Bs, 27 Cs, 6 CRF02_AGs, and 11 unique recombinant forms) and defined recombinant breakpoints of 10 recombinants. These results were validated with viral genome sequences generated by single genome sequencing (SGS) or the analysis of consensus sequence of the HTS reads. The HIVE-hexahedron workflow is more sensitive and accurate than just evaluating the consensus sequence and also more cost-effective than SGS. IMPORTANCE The highly recombinogenic nature of human immunodeficiency virus type 1 (HIV-1) leads to recombination and emergence of quasispecies. It is important to reliably identify subpopulations to understand the complexity of a viral population for drug resistance surveillance and vaccine development. High-throughput sequencing (HTS) provides improved resolution over Sanger sequencing for the analysis of heterogeneous viral subpopulations. However, current methods of analysis of HTS reads are unable to fully address accurate population reconstruction. Hence, there is a dire need for a more sensitive, accurate, user-friendly, and cost-effective method to analyze viral quasispecies. For this purpose, we have improved the HIVE-hexahedron algorithm that we previously developed with in silico short sequences to analyze raw HTS short reads. The significance of this study is that our standalone algorithm enables a streamlined analysis of quasispecies, subtype, and recombination patterns from long HIV-1 genome regions without the need of additional sequence analysis tools. Distinct viral populations and recombination patterns identified by HIVE-hexahedron are further validated by comparison with sequences obtained by single genome sequencing (SGS).
Collapse
|
10
|
Boskova V, Stadler T. PIQMEE: Bayesian Phylodynamic Method for Analysis of Large Data Sets with Duplicate Sequences. Mol Biol Evol 2020; 37:3061-3075. [PMID: 32492139 PMCID: PMC7530608 DOI: 10.1093/molbev/msaa136] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Next-generation sequencing of pathogen quasispecies within a host yields data sets of tens to hundreds of unique sequences. However, the full data set often contains thousands of sequences, because many of those unique sequences have multiple identical copies. Data sets of this size represent a computational challenge for currently available Bayesian phylogenetic and phylodynamic methods. Through simulations, we explore how large data sets with duplicate sequences affect the speed and accuracy of phylogenetic and phylodynamic analysis within BEAST 2. We show that using unique sequences only leads to biases, and using a random subset of sequences yields imprecise parameter estimates. To overcome these shortcomings, we introduce PIQMEE, a BEAST 2 add-on that produces reliable parameter estimates from full data sets with increased computational efficiency as compared with the currently available methods within BEAST 2. The principle behind PIQMEE is to resolve the tree structure of the unique sequences only, while simultaneously estimating the branching times of the duplicate sequences. Distinguishing between unique and duplicate sequences allows our method to perform well even for very large data sets. Although the classic method converges poorly for data sets of 6,000 sequences when allowed to run for 7 days, our method converges in slightly more than 1 day. In fact, PIQMEE can handle data sets of around 21,000 sequences with 20 unique sequences in 14 days. Finally, we apply the method to a real, within-host HIV sequencing data set with several thousand sequences per patient.
Collapse
Affiliation(s)
- Veronika Boskova
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
- Swiss Institute of Bioinformatics (SIB), Switzerland
- Center for Integrative Bioinformatics Vienna, Max Perutz Labs, University of Vienna and Medical University of Vienna, Vienna, Austria
| | - Tanja Stadler
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
- Swiss Institute of Bioinformatics (SIB), Switzerland
| |
Collapse
|
11
|
Eliseev A, Gibson KM, Avdeyev P, Novik D, Bendall ML, Pérez-Losada M, Alexeev N, Crandall KA. Evaluation of haplotype callers for next-generation sequencing of viruses. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2020; 82:104277. [PMID: 32151775 PMCID: PMC7293574 DOI: 10.1016/j.meegid.2020.104277] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/04/2019] [Revised: 03/04/2020] [Accepted: 03/06/2020] [Indexed: 01/30/2023]
Abstract
Currently, the standard practice for assembling next-generation sequencing (NGS) reads of viral genomes is to summarize thousands of individual short reads into a single consensus sequence, thus confounding useful intra-host diversity information for molecular phylodynamic inference. It is hypothesized that a few viral strains may dominate the intra-host genetic diversity with a variety of lower frequency strains comprising the rest of the population. Several software tools currently exist to convert NGS sequence variants into haplotypes. Previous benchmarks of viral haplotype reconstruction programs used simulation scenarios that are useful from a mathematical perspective but do not reflect viral evolution and epidemiology. Here, we tested twelve NGS haplotype reconstruction methods using viral populations simulated under realistic evolutionary dynamics. We simulated coalescent-based populations that spanned known levels of viral genetic diversity, including mutation rates, sample size and effective population size, to test the limits of the haplotype reconstruction methods and to ensure coverage of predicted intra-host viral diversity levels (especially HIV-1). All twelve investigated haplotype callers showed variable performance and produced drastically different results that were mainly driven by differences in mutation rate and, to a lesser extent, in effective population size. Most methods were able to accurately reconstruct haplotypes when genetic diversity was low. However, under higher levels of diversity (e.g., those seen intra-host HIV-1 infections), haplotype reconstruction quality was highly variable and, on average, poor. All haplotype reconstruction tools, except QuasiRecomb and ShoRAH, greatly underestimated intra-host diversity and the true number of haplotypes. PredictHaplo outperformed, in regard to highest precision, recall, and lowest UniFrac distance values, the other haplotype reconstruction tools followed by CliqueSNV, which, given more computational time, may have outperformed PredictHaplo. Here, we present an extensive comparison of available viral haplotype reconstruction tools and provide insights for future improvements in haplotype reconstruction tools using both short-read and long-read technologies.
Collapse
Affiliation(s)
- Anton Eliseev
- Computer Technologies Laboratory, ITMO University, Saint-Petersburg, Russia
| | - Keylie M Gibson
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA.
| | - Pavel Avdeyev
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA; Department of Mathematics, George Washington University, Washington, DC, USA
| | - Dmitry Novik
- Computer Technologies Laboratory, ITMO University, Saint-Petersburg, Russia
| | - Matthew L Bendall
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA
| | - Marcos Pérez-Losada
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA; Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, George Washington University, Washington, DC, USA; CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Campus Agrário de Vairão, Vairão, Portugal
| | - Nikita Alexeev
- Computer Technologies Laboratory, ITMO University, Saint-Petersburg, Russia
| | - Keith A Crandall
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA; Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, George Washington University, Washington, DC, USA
| |
Collapse
|
12
|
Deng ZL, Dhingra A, Fritz A, Götting J, Münch PC, Steinbrück L, Schulz TF, Ganzenmüller T, McHardy AC. Evaluating assembly and variant calling software for strain-resolved analysis of large DNA viruses. Brief Bioinform 2020; 22:5868070. [PMID: 34020538 PMCID: PMC8138829 DOI: 10.1093/bib/bbaa123] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2019] [Revised: 05/18/2020] [Accepted: 05/19/2020] [Indexed: 02/06/2023] Open
Abstract
Infection with human cytomegalovirus (HCMV) can cause severe complications in immunocompromised individuals and congenitally infected children. Characterizing heterogeneous viral populations and their evolution by high-throughput sequencing of clinical specimens requires the accurate assembly of individual strains or sequence variants and suitable variant calling methods. However, the performance of most methods has not been assessed for populations composed of low divergent viral strains with large genomes, such as HCMV. In an extensive benchmarking study, we evaluated 15 assemblers and 6 variant callers on 10 lab-generated benchmark data sets created with two different library preparation protocols, to identify best practices and challenges for analyzing such data. Most assemblers, especially metaSPAdes and IVA, performed well across a range of metrics in recovering abundant strains. However, only one, Savage, recovered low abundant strains and in a highly fragmented manner. Two variant callers, LoFreq and VarScan2, excelled across all strain abundances. Both shared a large fraction of false positive variant calls, which were strongly enriched in T to G changes in a 'G.G' context. The magnitude of this context-dependent systematic error is linked to the experimental protocol. We provide all benchmarking data, results and the entire benchmarking workflow named QuasiModo, Quasispecies Metric determination on omics, under the GNU General Public License v3.0 (https://github.com/hzi-bifo/Quasimodo), to enable full reproducibility and further benchmarking on these and other data.
Collapse
Affiliation(s)
- Zhi-Luo Deng
- Department Computational Biology of Infection Research of the Helmholtz Centre for Infection Research
| | | | - Adrian Fritz
- Department Computational Biology of Infection Research of the Helmholtz Centre for Infection Research
| | | | - Philipp C Münch
- Department Computational Biology of Infection Research of the Helmholtz Centre for Infection Research and Max von Pettenkofer Institute in Ludwig Maximilian University of Munich
| | | | | | | | - Alice C McHardy
- Department Computational Biology of Infection Research of the Helmholtz Centre for Infection Research
| |
Collapse
|
13
|
Pérez-Losada M, Arenas M, Galán JC, Bracho MA, Hillung J, García-González N, González-Candelas F. High-throughput sequencing (HTS) for the analysis of viral populations. INFECTION GENETICS AND EVOLUTION 2020; 80:104208. [PMID: 32001386 DOI: 10.1016/j.meegid.2020.104208] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Revised: 01/21/2020] [Accepted: 01/24/2020] [Indexed: 12/12/2022]
Abstract
The development of High-Throughput Sequencing (HTS) technologies is having a major impact on the genomic analysis of viral populations. Current HTS platforms can capture nucleic acid variation across millions of genes for both selected amplicons and full viral genomes. HTS has already facilitated the discovery of new viruses, hinted new taxonomic classifications and provided a deeper and broader understanding of their diversity, population and genetic structure. Hence, HTS has already replaced standard Sanger sequencing in basic and applied research fields, but the next step is its implementation as a routine technology for the analysis of viruses in clinical settings. The most likely application of this implementation will be the analysis of viral genomics, because the huge population sizes, high mutation rates and very fast replacement of viral populations have demonstrated the limited information obtained with Sanger technology. In this review, we describe new technologies and provide guidelines for the high-throughput sequencing and genetic and evolutionary analyses of viral populations and metaviromes, including software applications. With the development of new HTS technologies, new and refurbished molecular and bioinformatic tools are also constantly being developed to process and integrate HTS data. These allow assembling viral genomes and inferring viral population diversity and dynamics. Finally, we also present several applications of these approaches to the analysis of viral clinical samples including transmission clusters and outbreak characterization.
Collapse
Affiliation(s)
- Marcos Pérez-Losada
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA; CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Campus Agrário de Vairão, Vairão 4485-661, Portugal
| | - Miguel Arenas
- Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain; Biomedical Research Center (CINBIO), University of Vigo, 36310 Vigo, Spain.
| | - Juan Carlos Galán
- Microbiology Service, Hospital Ramón y Cajal, Madrid, Spain; CIBER in Epidemiology and Public Health, Spain.
| | - Mª Alma Bracho
- CIBER in Epidemiology and Public Health, Spain; Joint Research Unit "Infection and Public Health" FISABIO-University of Valencia, Valencia, Spain.
| | - Julia Hillung
- Joint Research Unit "Infection and Public Health" FISABIO-University of Valencia, Valencia, Spain; Institute for Integrative Systems Biology (I2SysBio), CSIC-University of Valencia, Valencia, Spain.
| | - Neris García-González
- Joint Research Unit "Infection and Public Health" FISABIO-University of Valencia, Valencia, Spain; Institute for Integrative Systems Biology (I2SysBio), CSIC-University of Valencia, Valencia, Spain.
| | - Fernando González-Candelas
- CIBER in Epidemiology and Public Health, Spain; Joint Research Unit "Infection and Public Health" FISABIO-University of Valencia, Valencia, Spain; Institute for Integrative Systems Biology (I2SysBio), CSIC-University of Valencia, Valencia, Spain.
| |
Collapse
|
14
|
Chen J, Zhao Y, Sun Y. De novo haplotype reconstruction in viral quasispecies using paired-end read guided path finding. Bioinformatics 2019; 34:2927-2935. [PMID: 29617936 DOI: 10.1093/bioinformatics/bty202] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2017] [Accepted: 04/02/2018] [Indexed: 12/29/2022] Open
Abstract
Motivation RNA virus populations contain different but genetically related strains, all infecting an individual host. Reconstruction of the viral haplotypes is a fundamental step to characterize the virus population, predict their viral phenotypes and finally provide important information for clinical treatment and prevention. Advances of the next-generation sequencing technologies open up new opportunities to assemble full-length haplotypes. However, error-prone short reads, high similarities between related strains, an unknown number of haplotypes pose computational challenges for reference-free haplotype reconstruction. There is still much room to improve the performance of existing haplotype assembly tools. Results In this work, we developed a de novo haplotype reconstruction tool named PEHaplo, which employs paired-end reads to distinguish highly similar strains for viral quasispecies data. It was applied on both simulated and real quasispecies data, and the results were benchmarked against several recently published de novo haplotype reconstruction tools. The comparison shows that PEHaplo outperforms the benchmarked tools in a comprehensive set of metrics. Availability and implementation The source code and the documentation of PEHaplo are available at https://github.com/chjiao/PEHaplo. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jiao Chen
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, USA
| | - Yingchao Zhao
- School of Computing and Information Sciences, Caritas Institute of Higher Education, Hong Kong, China
| | - Yanni Sun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, USA
| |
Collapse
|
15
|
Takeda H, Yamashita T, Ueda Y, Sekine A. Exploring the hepatitis C virus genome using single molecule real-time sequencing. World J Gastroenterol 2019; 25:4661-4672. [PMID: 31528092 PMCID: PMC6718035 DOI: 10.3748/wjg.v25.i32.4661] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Revised: 07/04/2019] [Accepted: 07/19/2019] [Indexed: 02/06/2023] Open
Abstract
Single molecular real-time (SMRT) sequencing, also called third-generation sequencing, is a novel sequencing technique capable of generating extremely long contiguous sequence reads. While conventional short-read sequencing cannot evaluate the linkage of nucleotide substitutions distant from one another, SMRT sequencing can directly demonstrate linkage of nucleotide changes over a span of more than 20 kbp, and thus can be applied to directly examine the haplotypes of viruses or bacteria whose genome structures are changing in real time. In addition, an error correction method (circular consensus sequencing) has been established and repeated sequencing of a single-molecule DNA template can result in extremely high accuracy. The advantages of long read sequencing enable accurate determination of the haplotypes of individual viral clones. SMRT sequencing has been applied in various studies of viral genomes including determination of the full-length contiguous genome sequence of hepatitis C virus (HCV), targeted deep sequencing of the HCV NS5A gene, and assessment of heterogeneity among viral populations. Recently, the emergence of multi-drug resistant HCV viruses has become a significant clinical issue and has been also demonstrated using SMRT sequencing. In this review, we introduce the novel third-generation PacBio RSII/Sequel systems, compare them with conventional next-generation sequencers, and summarize previous studies in which SMRT sequencing technology has been applied for HCV genome analysis. We also refer to another long-read sequencing platform, nanopore sequencing technology, and discuss the advantages, limitations and future perspectives in using these third-generation sequencers for HCV genome analysis.
Collapse
Affiliation(s)
- Haruhiko Takeda
- Department of Omics-based Medicine, Center for Preventive Medical Science, Chiba University, Chiba 260-0856, Japan
- Department of Gastroenterology and Hepatology, Graduate School of Medicine, Kyoto University, Kyoto 606-8507, Japan
| | - Taiki Yamashita
- Department of Omics-based Medicine, Center for Preventive Medical Science, Chiba University, Chiba 260-0856, Japan
| | - Yoshihide Ueda
- Department of Gastroenterology and Hepatology, Graduate School of Medicine, Kyoto University, Kyoto 606-8507, Japan
| | - Akihiro Sekine
- Department of Omics-based Medicine, Center for Preventive Medical Science, Chiba University, Chiba 260-0856, Japan
| |
Collapse
|
16
|
Singer JB, Thomson EC, Hughes J, Aranday-Cortes E, McLauchlan J, da Silva Filipe A, Tong L, Manso CF, Gifford RJ, Robertson DL, Barnes E, Ansari MA, Mbisa JL, Bibby DF, Bradshaw D, Smith D. Interpreting Viral Deep Sequencing Data with GLUE. Viruses 2019; 11:E323. [PMID: 30987147 PMCID: PMC6520954 DOI: 10.3390/v11040323] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Revised: 03/13/2019] [Accepted: 03/14/2019] [Indexed: 01/29/2023] Open
Abstract
Using deep sequencing technologies such as Illumina's platform, it is possible to obtain reads from the viral RNA population revealing the viral genome diversity within a single host. A range of software tools and pipelines can transform raw deep sequencing reads into Sequence Alignment Mapping (SAM) files. We propose that interpretation tools should process these SAM files, directly translating individual reads to amino acids in order to extract statistics of interest such as the proportion of different amino acid residues at specific sites. This preserves per-read linkage between nucleotide variants at different positions within a codon location. The samReporter is a subsystem of the GLUE software toolkit which follows this direct read translation approach in its processing of SAM files. We test samReporter on a deep sequencing dataset obtained from a cohort of 241 UK HCV patients for whom prior treatment with direct-acting antivirals has failed; deep sequencing and resistance testing have been suggested to be of clinical use in this context. We compared the polymorphism interpretation results of the samReporter against an approach that does not preserve per-read linkage. We found that the samReporter was able to properly interpret the sequence data at resistance-associated locations in nine patients where the alternative approach was equivocal. In three cases, the samReporter confirmed that resistance or an atypical substitution was present at NS5A position 30. In three further cases, it confirmed that the sofosbuvir-resistant NS5B substitution S282T was absent. This suggests the direct read translation approach implemented is of value for interpreting viral deep sequencing data.
Collapse
Affiliation(s)
- Joshua B Singer
- MRC-University of Glasgow Centre for Virus Research, Glasgow G61 1QH, UK.
| | - Emma C Thomson
- MRC-University of Glasgow Centre for Virus Research, Glasgow G61 1QH, UK.
| | - Joseph Hughes
- MRC-University of Glasgow Centre for Virus Research, Glasgow G61 1QH, UK.
| | | | - John McLauchlan
- MRC-University of Glasgow Centre for Virus Research, Glasgow G61 1QH, UK.
| | | | - Lily Tong
- MRC-University of Glasgow Centre for Virus Research, Glasgow G61 1QH, UK.
| | - Carmen F Manso
- Virus Reference Department, National Infection Service, Public Health England, Colindale, London NW9 5EQ, UK.
| | - Robert J Gifford
- MRC-University of Glasgow Centre for Virus Research, Glasgow G61 1QH, UK.
| | - David L Robertson
- MRC-University of Glasgow Centre for Virus Research, Glasgow G61 1QH, UK.
| | - Eleanor Barnes
- Peter Medawar Building for Pathogen Research, Nuffield Department of Medicine, University of Oxford, Oxford OX1 3SY, UK.
| | - M Azim Ansari
- Peter Medawar Building for Pathogen Research, Nuffield Department of Medicine, University of Oxford, Oxford OX1 3SY, UK.
| | - Jean L Mbisa
- Virus Reference Department, National Infection Service, Public Health England, Colindale, London NW9 5EQ, UK.
| | - David F Bibby
- Virus Reference Department, National Infection Service, Public Health England, Colindale, London NW9 5EQ, UK.
| | - Daniel Bradshaw
- Virus Reference Department, National Infection Service, Public Health England, Colindale, London NW9 5EQ, UK.
| | - David Smith
- Peter Medawar Building for Pathogen Research, Nuffield Department of Medicine, University of Oxford, Oxford OX1 3SY, UK.
| |
Collapse
|
17
|
Ahn S, Ke Z, Vikalo H. Viral quasispecies reconstruction via tensor factorization with successive read removal. Bioinformatics 2018; 34:i23-i31. [PMID: 29949976 PMCID: PMC6022648 DOI: 10.1093/bioinformatics/bty291] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Motivation As RNA viruses mutate and adapt to environmental changes, often developing resistance to anti-viral vaccines and drugs, they form an ensemble of viral strains--a viral quasispecies. While high-throughput sequencing (HTS) has enabled in-depth studies of viral quasispecies, sequencing errors and limited read lengths render the problem of reconstructing the strains and estimating their spectrum challenging. Inference of viral quasispecies is difficult due to generally non-uniform frequencies of the strains, and is further exacerbated when the genetic distances between the strains are small. Results This paper presents TenSQR, an algorithm that utilizes tensor factorization framework to analyze HTS data and reconstruct viral quasispecies characterized by highly uneven frequencies of its components. Fundamentally, TenSQR performs clustering with successive data removal to infer strains in a quasispecies in order from the most to the least abundant one; every time a strain is inferred, sequencing reads generated from that strain are removed from the dataset. The proposed successive strain reconstruction and data removal enables discovery of rare strains in a population and facilitates detection of deletions in such strains. Results on simulated datasets demonstrate that TenSQR can reconstruct full-length strains having widely different abundances, generally outperforming state-of-the-art methods at diversities 1-10% and detecting long deletions even in rare strains. A study on a real HIV-1 dataset demonstrates that TenSQR outperforms competing methods in experimental settings as well. Finally, we apply TenSQR to analyze a Zika virus sample and reconstruct the full-length strains it contains. Availability and implementation TenSQR is available at https://github.com/SoYeonA/TenSQR. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Soyeon Ahn
- Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX, USA
| | - Ziqi Ke
- Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX, USA
| | - Haris Vikalo
- Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
18
|
Leviyang S, Griva I, Ita S, Johnson WE. A penalized regression approach to haplotype reconstruction of viral populations arising in early HIV/SIV infection. Bioinformatics 2018; 33:2455-2463. [PMID: 28379346 DOI: 10.1093/bioinformatics/btx187] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2016] [Accepted: 03/29/2017] [Indexed: 12/14/2022] Open
Abstract
Motivation Next generation sequencing (NGS) has been increasingly applied to characterize viral evolution during HIV and SIV infections. In particular, NGS datasets sampled during the initial months of infection are characterized by relatively low levels of diversity as well as convergent evolution at multiple loci dispersed across the viral genome. Consequently, fully characterizing viral evolution from NGS datasets requires haplotype reconstruction across large regions of the viral genome. Existing haplotype reconstruction algorithms have not been developed with the particular characteristics of early HIV/SIV infection in mind, raising the possibility that better performance could be achieved through a specifically designed algorithm. Results Here, we introduce a haplotype reconstruction algorithm, RegressHaplo, specifically designed for low diversity and convergent evolution regimes. The algorithm uses a penalized regression that balances a data fitting term with a penalty term that encourages solutions with few haplotypes. The regression covariates are a large set of potential haplotypes and fitting the regression is made computationally feasible by the low diversity setting. Using simulated and in vivo datasets, we compare RegressHaplo to PredictHaplo and QuRe, two existing haplotype reconstruction algorithms. RegressHaplo performs better than these algorithms on simulated datasets with relatively low diversity levels. We suggest RegressHaplo as a novel tool for the investigation of early infection HIV/SIV datasets and, more generally, low diversity viral NGS datasets. Contact sr286@georgetown.edu. Availability and Implementation https://github.com/SLeviyang/RegressHaplo.
Collapse
Affiliation(s)
- Sivan Leviyang
- Department of Mathematics and Statistics, Georgetown University, Washington DC, 20057, USA
| | - Igor Griva
- Department of Mathematics, George Mason University, Fairfax, VA 22030, USA
| | - Sergio Ita
- Department of Medicine, University of California - San Diego, La Jolla, CA 92093, USA
| | - Welkin E Johnson
- Department of Biology, Boston College, Chestnut Hill, MA 02467, USA
| |
Collapse
|
19
|
Karagiannis K, Simonyan V, Chumakov K, Mazumder R. Separation and assembly of deep sequencing data into discrete sub-population genomes. Nucleic Acids Res 2017; 45:10989-11003. [PMID: 28977510 PMCID: PMC5737798 DOI: 10.1093/nar/gkx755] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2016] [Accepted: 08/16/2017] [Indexed: 12/15/2022] Open
Abstract
Sequence heterogeneity is a common characteristic of RNA viruses that is often referred to as sub-populations or quasispecies. Traditional techniques used for assembly of short sequence reads produced by deep sequencing, such as de-novo assemblers, ignore the underlying diversity. Here, we introduce a novel algorithm that simultaneously assembles discrete sequences of multiple genomes present in populations. Using in silico data we were able to detect populations at as low as 0.1% frequency with complete global genome reconstruction and in a single sample detected 16 resolved sequences with no mismatches. We also applied the algorithm to high throughput sequencing data obtained for viruses present in sewage samples and successfully detected multiple sub-populations and recombination events in these diverse mixtures. High sensitivity of the algorithm also enables genomic analysis of heterogeneous pathogen genomes from patient samples and accurate detection of intra-host diversity, enabling not just basic research in personalized medicine but also accurate diagnostics and monitoring drug therapies, which are critical in clinical and regulatory decision-making process.
Collapse
Affiliation(s)
- Konstantinos Karagiannis
- Department of Biochemistry and Molecular Medicine, George Washington University Medical Center, Washington, DC 20037, USA.,Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Spring, MD 20993, USA
| | - Vahan Simonyan
- Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Spring, MD 20993, USA
| | - Konstantin Chumakov
- Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Spring, MD 20993, USA
| | - Raja Mazumder
- Department of Biochemistry and Molecular Medicine, George Washington University Medical Center, Washington, DC 20037, USA.,McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA
| |
Collapse
|
20
|
Meixenberger K, Yousef KP, Smith MR, Somogyi S, Fiedler S, Bartmeyer B, Hamouda O, Bannert N, von Kleist M, Kücherer C. Molecular evolution of HIV-1 integrase during the 20 years prior to the first approval of integrase inhibitors. Virol J 2017; 14:223. [PMID: 29137637 PMCID: PMC5686839 DOI: 10.1186/s12985-017-0887-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2017] [Accepted: 10/31/2017] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Detailed knowledge of the evolutionary potential of polymorphic sites in a viral protein is important for understanding the development of drug resistance in the presence of an inhibitor. We therefore set out to analyse the molecular evolution of the HIV-1 subtype B integrase at the inter-patient level in Germany during a 20-year period prior to the first introduction of integrase strand inhibitors (INSTIs). METHODS We determined 337 HIV-1 integrase subtype B sequences (amino acids 1-278) from stored plasma samples of antiretroviral treatment-naïve individuals newly diagnosed with HIV-1 between 1986 and 2006. Shannon entropy was calculated to determine the variability at each amino acid position. Time trends in the frequency of amino acid variants were identified by linear regression. Direct coupling analysis was applied to detect covarying sites. RESULTS Twenty-two time trends in the frequency of amino acid variants demonstrated either single amino acid exchanges or variation in the degree of polymorphy. Covariation was observed for 17 amino acid variants with a temporal trend. Some minor INSTI resistance mutations (T124A, V151I, K156 N, T206S, S230 N) and some INSTI-selected mutations (M50I, L101I, T122I, T124 N, T125A, M154I, G193E, V201I) were identified at overall frequencies >5%. Among these, the frequencies of L101I, T122I, and V201I increased over time, whereas the frequency of M154I decreased. Moreover, L101I, T122I, T124A, T125A, M154I, and V201I covaried with non-resistance-associated variants. CONCLUSIONS Time-trending, covarying polymorphisms indicate that long-term evolutionary changes of the HIV-1 integrase involve defined clusters of possibly structurally or functionally associated sites independent of selective pressure through INSTIs at the inter-patient level. Linkage between polymorphic resistance- and non-resistance-associated sites can impact the selection of INSTI resistance mutations in complex ways. Identification of these sites can help in improving genotypic resistance assays, resistance prediction algorithms, and the development of new integrase inhibitors.
Collapse
Affiliation(s)
| | - Kaveh Pouran Yousef
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| | - Maureen Rebecca Smith
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| | - Sybille Somogyi
- HIV and other Retroviruses, Robert Koch Institute, Berlin, Germany
| | - Stefan Fiedler
- HIV and other Retroviruses, Robert Koch Institute, Berlin, Germany
| | - Barbara Bartmeyer
- HIV/AIDS, STI and Blood-borne Infections, Robert Koch Institute, Berlin, Germany
| | - Osamah Hamouda
- HIV/AIDS, STI and Blood-borne Infections, Robert Koch Institute, Berlin, Germany
| | - Norbert Bannert
- HIV and other Retroviruses, Robert Koch Institute, Berlin, Germany
| | - Max von Kleist
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| | - Claudia Kücherer
- HIV and other Retroviruses, Robert Koch Institute, Berlin, Germany
| |
Collapse
|
21
|
Time-Sampled Population Sequencing Reveals the Interplay of Selection and Genetic Drift in Experimental Evolution of Potato Virus Y. J Virol 2017; 91:JVI.00690-17. [PMID: 28592544 PMCID: PMC5533922 DOI: 10.1128/jvi.00690-17] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2017] [Accepted: 05/28/2017] [Indexed: 11/20/2022] Open
Abstract
RNA viruses are one of the fastest-evolving biological entities. Within their hosts, they exist as genetically diverse populations (i.e., viral mutant swarms), which are sculpted by different evolutionary mechanisms, such as mutation, natural selection, and genetic drift, and also the interactions between genetic variants within the mutant swarms. To elucidate the mechanisms that modulate the population diversity of an important plant-pathogenic virus, we performed evolution experiments with Potato virus Y (PVY) in potato genotypes that differ in their defense response against the virus. Using deep sequencing of small RNAs, we followed the temporal dynamics of standing and newly generated variations in the evolving viral lineages. A time-sampled approach allowed us to (i) reconstruct theoretical haplotypes in the starting population by using clustering of single nucleotide polymorphisms' trajectories and (ii) use quantitative population genetics approaches to estimate the contribution of selection and genetic drift, and their interplay, to the evolution of the virus. We detected imprints of strong selective sweeps and narrow genetic bottlenecks, followed by the shift in frequency of selected haplotypes. Comparison of patterns of viral evolution in differently susceptible host genotypes indicated possible diversifying evolution of PVY in the less-susceptible host (efficient in the accumulation of salicylic acid).IMPORTANCE High diversity of within-host populations of RNA viruses is an important aspect of their biology, since they represent a reservoir of genetic variants, which can enable quick adaptation of viruses to a changing environment. This study focuses on an important plant virus, Potato virus Y, and describes, at high resolution, temporal changes in the structure of viral populations within different potato genotypes. A novel and easy-to-implement computational approach was established to cluster single nucleotide polymorphisms into viral haplotypes from very short sequencing reads. During the experiment, a shift in the frequency of selected viral haplotypes was observed after a narrow genetic bottleneck, indicating an important role of the genetic drift in the evolution of the virus. On the other hand, a possible case of diversifying selection of the virus was observed in less susceptible host genotypes.
Collapse
|
22
|
Multiple Sources of Genetic Diversity of Influenza A Viruses during the Hajj. J Virol 2017; 91:JVI.00096-17. [PMID: 28331081 DOI: 10.1128/jvi.00096-17] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2017] [Accepted: 03/11/2017] [Indexed: 11/20/2022] Open
Abstract
Outbreaks of respiratory virus infection at mass gatherings pose significant health risks to attendees, host communities, and ultimately the global population if they help facilitate viral emergence. However, little is known about the genetic diversity, evolution, and patterns of viral transmission during mass gatherings, particularly how much diversity is generated by in situ transmission compared to that imported from other locations. Here, we describe the genome-scale evolution of influenza A viruses sampled from the Hajj pilgrimages at Makkah during 2013 to 2015. Phylogenetic analysis revealed that the diversity of influenza viruses at the Hajj pilgrimages was shaped by multiple introduction events, comprising multiple cocirculating lineages in each year, including those that have circulated in the Middle East and those whose origins likely lie on different continents. At the scale of individual hosts, the majority of minor variants resulted from de novo mutation, with only limited evidence of minor variant transmission or minor variants circulating at subconsensus level despite the likely identification of multiple transmission clusters. Together, these data highlight the complexity of influenza virus infection at the Hajj pilgrimages, reflecting a mix of global genetic diversity drawn from multiple sources combined with local transmission, and reemphasize the need for vigilant surveillance at mass gatherings.IMPORTANCE Large population sizes and densities at mass gatherings such as the Hajj (Makkah, Saudi Arabia) can contribute to outbreaks of respiratory virus infection by providing local hot spots for transmission followed by spread to other localities. Using a genome-scale analysis, we show that the genetic diversity of influenza A viruses at the Hajj gatherings during 2013 to 2015 was largely shaped by the introduction of multiple viruses from diverse geographic regions, including the Middle East, with only little evidence of interhost virus transmission at the Hajj and seemingly limited spread of subconsensus mutational variants. The diversity of viruses at the Hajj pilgrimages highlights the potential for lineage cocirculation during mass gatherings, in turn fuelling segment reassortment and the emergence of novel variants, such that the continued surveillance of respiratory pathogens at mass gatherings should be a public health priority.
Collapse
|
23
|
Evolution of multi-drug resistant HCV clones from pre-existing resistant-associated variants during direct-acting antiviral therapy determined by third-generation sequencing. Sci Rep 2017; 7:45605. [PMID: 28361915 PMCID: PMC5374541 DOI: 10.1038/srep45605] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2017] [Accepted: 02/28/2017] [Indexed: 02/07/2023] Open
Abstract
Resistance-associated variant (RAV) is one of the most significant clinical challenges in treating HCV-infected patients with direct-acting antivirals (DAAs). We investigated the viral dynamics in patients receiving DAAs using third-generation sequencing technology. Among 283 patients with genotype-1b HCV receiving daclatasvir + asunaprevir (DCV/ASV), 32 (11.3%) failed to achieve sustained virological response (SVR). Conventional ultra-deep sequencing of HCV genome was performed in 104 patients (32 non-SVR, 72 SVR), and detected representative RAVs in all non-SVR patients at baseline, including Y93H in 28 (87.5%). Long contiguous sequences spanning NS3 to NS5A regions of each viral clone in 12 sera from 6 representative non-SVR patients were determined by third-generation sequencing, and showed the concurrent presence of several synonymous mutations linked to resistance-associated substitutions in a subpopulation of pre-existing RAVs and dominant isolates at treatment failure. Phylogenetic analyses revealed close genetic distances between pre-existing RAVs and dominant RAVs at treatment failure. In addition, multiple drug-resistant mutations developed on pre-existing RAVs after DCV/ASV in all non-SVR cases. In conclusion, multi-drug resistant viral clones at treatment failure certainly originated from a subpopulation of pre-existing RAVs in HCV-infected patients. Those RAVs were selected for and became dominant with the acquisition of multiple resistance-associated substitutions under DAA treatment pressure.
Collapse
|
24
|
Characterization of Hepatitis C Virus (HCV) Envelope Diversification from Acute to Chronic Infection within a Sexually Transmitted HCV Cluster by Using Single-Molecule, Real-Time Sequencing. J Virol 2017; 91:JVI.02262-16. [PMID: 28077634 DOI: 10.1128/jvi.02262-16] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2016] [Accepted: 12/29/2016] [Indexed: 12/18/2022] Open
Abstract
In contrast to other available next-generation sequencing platforms, PacBio single-molecule, real-time (SMRT) sequencing has the advantage of generating long reads albeit with a relatively higher error rate in unprocessed data. Using this platform, we longitudinally sampled and sequenced the hepatitis C virus (HCV) envelope genome region (1,680 nucleotides [nt]) from individuals belonging to a cluster of sexually transmitted cases. All five subjects were coinfected with HIV-1 and a closely related strain of HCV genotype 4d. In total, 50 samples were analyzed by using SMRT sequencing. By using 7 passes of circular consensus sequencing, the error rate was reduced to 0.37%, and the median number of sequences was 612 per sample. A further reduction of insertions was achieved by alignment against a sample-specific reference sequence. However, in vitro recombination during PCR amplification could not be excluded. Phylogenetic analysis supported close relationships among HCV sequences from the four male subjects and subsequent transmission from one subject to his female partner. Transmission was characterized by a strong genetic bottleneck. Viral genetic diversity was low during acute infection and increased upon progression to chronicity but subsequently fluctuated during chronic infection, caused by the alternate detection of distinct coexisting lineages. SMRT sequencing combines long reads with sufficient depth for many phylogenetic analyses and can therefore provide insights into within-host HCV evolutionary dynamics without the need for haplotype reconstruction using statistical algorithms.IMPORTANCE Next-generation sequencing has revolutionized the study of genetically variable RNA virus populations, but for phylogenetic and evolutionary analyses, longer sequences than those generated by most available platforms, while minimizing the intrinsic error rate, are desired. Here, we demonstrate for the first time that PacBio SMRT sequencing technology can be used to generate full-length HCV envelope sequences at the single-molecule level, providing a data set with large sequencing depth for the characterization of intrahost viral dynamics. The selection of consensus reads derived from at least 7 full circular consensus sequencing rounds significantly reduced the intrinsic high error rate of this method. We used this method to genetically characterize a unique transmission cluster of sexually transmitted HCV infections, providing insight into the distinct evolutionary pathways in each patient over time and identifying the transmission-associated genetic bottleneck as well as fluctuations in viral genetic diversity over time, accompanied by dynamic shifts in viral subpopulations.
Collapse
|
25
|
aBayesQR: A Bayesian Method for Reconstruction of Viral Populations Characterized by Low Diversity. LECTURE NOTES IN COMPUTER SCIENCE 2017. [DOI: 10.1007/978-3-319-56970-3_22] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
26
|
Brumme CJ, Poon AFY. Promises and pitfalls of Illumina sequencing for HIV resistance genotyping. Virus Res 2016; 239:97-105. [PMID: 27993623 DOI: 10.1016/j.virusres.2016.12.008] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2016] [Revised: 12/15/2016] [Accepted: 12/15/2016] [Indexed: 12/13/2022]
Abstract
Genetic sequencing ("genotyping") plays a critical role in the modern clinical management of HIV infection. This virus evolves rapidly within patients because of its error-prone reverse transcriptase and short generation time. Consequently, HIV variants with mutations that confer resistance to one or more antiretroviral drugs can emerge during sub-optimal treatment. There are now multiple HIV drug resistance interpretation algorithms that take the region of the HIV genome encoding the major drug targets as inputs; expert use of these algorithms can significantly improve to clinical outcomes in HIV treatment. Next-generation sequencing has the potential to revolutionize HIV resistance genotyping by lowering the threshold that rare but clinically significant HIV variants can be detected reproducibly, and by conferring improved cost-effectiveness in high-throughput scenarios. In this review, we discuss the relative merits and challenges of deploying the Illumina MiSeq instrument for clinical HIV genotyping.
Collapse
Affiliation(s)
- Chanson J Brumme
- BC Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canada
| | - Art F Y Poon
- Department of Pathology & Laboratory Medicine, Western University, London, Ontario, Canada.
| |
Collapse
|
27
|
Evolutionary dynamics of dengue virus populations within the mosquito vector. Curr Opin Virol 2016; 21:47-53. [DOI: 10.1016/j.coviro.2016.07.013] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2016] [Revised: 07/23/2016] [Accepted: 07/27/2016] [Indexed: 02/05/2023]
|
28
|
Leung P, Eltahla AA, Lloyd AR, Bull RA, Luciani F. Understanding the complex evolution of rapidly mutating viruses with deep sequencing: Beyond the analysis of viral diversity. Virus Res 2016; 239:43-54. [PMID: 27888126 DOI: 10.1016/j.virusres.2016.10.014] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2016] [Revised: 10/24/2016] [Accepted: 10/25/2016] [Indexed: 12/24/2022]
Abstract
With the advent of affordable deep sequencing technologies, detection of low frequency variants within genetically diverse viral populations can now be achieved with unprecedented depth and efficiency. The high-resolution data provided by next generation sequencing technologies is currently recognised as the gold standard in estimation of viral diversity. In the analysis of rapidly mutating viruses, longitudinal deep sequencing datasets from viral genomes during individual infection episodes, as well as at the epidemiological level during outbreaks, now allow for more sophisticated analyses such as statistical estimates of the impact of complex mutation patterns on the evolution of the viral populations both within and between hosts. These analyses are revealing more accurate descriptions of the evolutionary dynamics that underpin the rapid adaptation of these viruses to the host response, and to drug therapies. This review assesses recent developments in methods and provide informative research examples using deep sequencing data generated from rapidly mutating viruses infecting humans, particularly hepatitis C virus (HCV), human immunodeficiency virus (HIV), Ebola virus and influenza virus, to understand the evolution of viral genomes and to explore the relationship between viral mutations and the host adaptive immune response. Finally, we discuss limitations in current technologies, and future directions that take advantage of publically available large deep sequencing datasets.
Collapse
Affiliation(s)
- Preston Leung
- School of Medical Sciences, Faculty of Medicine, UNSW Australia, Sydney, NSW 2052, Australia; The Kirby Institute, UNSW Australia, Sydney, NSW 2052, Australia
| | - Auda A Eltahla
- School of Medical Sciences, Faculty of Medicine, UNSW Australia, Sydney, NSW 2052, Australia; The Kirby Institute, UNSW Australia, Sydney, NSW 2052, Australia
| | - Andrew R Lloyd
- The Kirby Institute, UNSW Australia, Sydney, NSW 2052, Australia
| | - Rowena A Bull
- School of Medical Sciences, Faculty of Medicine, UNSW Australia, Sydney, NSW 2052, Australia; The Kirby Institute, UNSW Australia, Sydney, NSW 2052, Australia
| | - Fabio Luciani
- School of Medical Sciences, Faculty of Medicine, UNSW Australia, Sydney, NSW 2052, Australia; The Kirby Institute, UNSW Australia, Sydney, NSW 2052, Australia.
| |
Collapse
|
29
|
King DJ, Freimanis GL, Orton RJ, Waters RA, Haydon DT, King DP. Investigating intra-host and intra-herd sequence diversity of foot-and-mouth disease virus. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2016; 44:286-292. [PMID: 27421209 PMCID: PMC5036933 DOI: 10.1016/j.meegid.2016.07.010] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/26/2016] [Revised: 07/06/2016] [Accepted: 07/11/2016] [Indexed: 11/23/2022]
Abstract
Due to the poor-fidelity of the enzymes involved in RNA genome replication, foot-and-mouth disease (FMD) virus samples comprise of unique polymorphic populations. In this study, deep sequencing was utilised to characterise the diversity of FMD virus (FMDV) populations in 6 infected cattle present on a single farm during the series of outbreaks in the UK in 2007. A novel RT-PCR method was developed to amplify a 7.6kb nucleotide fragment encompassing the polyprotein coding region of the FMDV genome. Illumina sequencing of each sample identified the fine polymorphic structures at each nucleotide position, from consensus level changes to variants present at a 0.24% frequency. These data were used to investigate population dynamics of FMDV at both herd and host levels, evaluate the impact of host on the viral swarm structure and to identify transmission links with viruses recovered from other farms in the same series of outbreaks. In 7 samples, from 6 different animals, a total of 5 consensus level variants were identified, in addition to 104 sub-consensus variants of which 22 were shared between 2 or more animals. Further analysis revealed differences in swarm structures from samples derived from the same animal suggesting the presence of distinct viral populations evolving independently at different lesion sites within the same infected animal.
Collapse
Affiliation(s)
- David J King
- The Pirbright Institute, Ash Road, Pirbright, Woking, Surrey GU24 0NF, UK
| | - Graham L Freimanis
- The Pirbright Institute, Ash Road, Pirbright, Woking, Surrey GU24 0NF, UK
| | - Richard J Orton
- Institute of Biodiversity, Animal Health and Comparative Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, G12 8QQ, UK; MRC-University of Glasgow, Centre for Virus Research, University of Glasgow, 464 Bearsden Road, G61 1QH, UK
| | - Ryan A Waters
- The Pirbright Institute, Ash Road, Pirbright, Woking, Surrey GU24 0NF, UK
| | - Daniel T Haydon
- Institute of Biodiversity, Animal Health and Comparative Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, G12 8QQ, UK
| | - Donald P King
- The Pirbright Institute, Ash Road, Pirbright, Woking, Surrey GU24 0NF, UK.
| |
Collapse
|
30
|
Posada-Cespedes S, Seifert D, Beerenwinkel N. Recent advances in inferring viral diversity from high-throughput sequencing data. Virus Res 2016; 239:17-32. [PMID: 27693290 DOI: 10.1016/j.virusres.2016.09.016] [Citation(s) in RCA: 77] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2016] [Revised: 09/23/2016] [Accepted: 09/24/2016] [Indexed: 02/05/2023]
Abstract
Rapidly evolving RNA viruses prevail within a host as a collection of closely related variants, referred to as viral quasispecies. Advances in high-throughput sequencing (HTS) technologies have facilitated the assessment of the genetic diversity of such virus populations at an unprecedented level of detail. However, analysis of HTS data from virus populations is challenging due to short, error-prone reads. In order to account for uncertainties originating from these limitations, several computational and statistical methods have been developed for studying the genetic heterogeneity of virus population. Here, we review methods for the analysis of HTS reads, including approaches to local diversity estimation and global haplotype reconstruction. Challenges posed by aligning reads, as well as the impact of reference biases on diversity estimates are also discussed. In addition, we address some of the experimental approaches designed to improve the biological signal-to-noise ratio. In the future, computational methods for the analysis of heterogeneous virus populations are likely to continue being complemented by technological developments.
Collapse
Affiliation(s)
- Susana Posada-Cespedes
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland; SIB, Basel, Switzerland
| | - David Seifert
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland; SIB, Basel, Switzerland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland; SIB, Basel, Switzerland.
| |
Collapse
|
31
|
Lavezzo E, Barzon L, Toppo S, Palù G. Third generation sequencing technologies applied to diagnostic microbiology: benefits and challenges in applications and data analysis. Expert Rev Mol Diagn 2016; 16:1011-23. [PMID: 27453996 DOI: 10.1080/14737159.2016.1217158] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
INTRODUCTION The diagnosis of infectious diseases is among the most successful areas of application of new generation sequencing technologies. The field has seen the development of numerous experimental and analytical approaches for the detection and the fine description of pathogenic and non-pathogenic microorganisms. AREAS COVERED Without claiming to be exhaustive with respect to all applications and methods developed over the years, this review focuses on the advantages and the issues brought by the new technologies, with an eye in particular to third generation sequencing methods. Both experimental procedures and algorithmic strategies are presented, following the most relevant publications which have led to progress in our ability of detecting infectious agents. Expert commentary: The technical advance brought by third generation sequencing platforms has the potential to significantly expand the range of diagnostic tools that will be available to clinicians. Nonetheless, the implementation of these technologies in clinical practice is still far from being actionable and will temporally follow the path undertaken by second generation methods, which still require the setup of standardized pipelines in both wet and dry laboratory procedures.
Collapse
Affiliation(s)
- Enrico Lavezzo
- a Department of Molecular Medicine , University of Padova , Padova , Italy
| | - Luisa Barzon
- a Department of Molecular Medicine , University of Padova , Padova , Italy
| | - Stefano Toppo
- a Department of Molecular Medicine , University of Padova , Padova , Italy
| | - Giorgio Palù
- a Department of Molecular Medicine , University of Padova , Padova , Italy
| |
Collapse
|
32
|
Differences in the Selection Bottleneck between Modes of Sexual Transmission Influence the Genetic Composition of the HIV-1 Founder Virus. PLoS Pathog 2016; 12:e1005619. [PMID: 27163788 PMCID: PMC4862634 DOI: 10.1371/journal.ppat.1005619] [Citation(s) in RCA: 75] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2015] [Accepted: 04/18/2016] [Indexed: 01/18/2023] Open
Abstract
Due to the stringent population bottleneck that occurs during sexual HIV-1 transmission, systemic infection is typically established by a limited number of founder viruses. Elucidation of the precise forces influencing the selection of founder viruses may reveal key vulnerabilities that could aid in the development of a vaccine or other clinical interventions. Here, we utilize deep sequencing data and apply a genetic distance-based method to investigate whether the mode of sexual transmission shapes the nascent founder viral genome. Analysis of 74 acute and early HIV-1 infected subjects revealed that 83% of men who have sex with men (MSM) exhibit a single founder virus, levels similar to those previously observed in heterosexual (HSX) transmission. In a metadata analysis of a total of 354 subjects, including HSX, MSM and injecting drug users (IDU), we also observed no significant differences in the frequency of single founder virus infections between HSX and MSM transmissions. However, comparison of HIV-1 envelope sequences revealed that HSX founder viruses exhibited a greater number of codon sites under positive selection, as well as stronger transmission indices possibly reflective of higher fitness variants. Moreover, specific genetic “signatures” within MSM and HSX founder viruses were identified, with single polymorphisms within gp41 enriched among HSX viruses while more complex patterns, including clustered polymorphisms surrounding the CD4 binding site, were enriched in MSM viruses. While our findings do not support an influence of the mode of sexual transmission on the number of founder viruses, they do demonstrate that there are marked differences in the selection bottleneck that can significantly shape their genetic composition. This study illustrates the complex dynamics of the transmission bottleneck and reveals that distinct genetic bottleneck processes exist dependent upon the mode of HIV-1 transmission. While the global spread of HIV-1 has been fueled by sexual transmission the genetic determinants underlying the transmission bottleneck remains poorly understood. Here we characterized founder virus population diversity from next generation sequencing data in a cohort of 74 acute and early HIV-1 infected individuals. We observe that the risk of multi-variant infection in men-who-have-sex-with-men (MSM) is not greater than that observed for heterosexuals (HSX), contrary to reports of higher rates of multiple founder virus infections in higher-risk MSM transmissions. These findings were further supported through a metadata analysis of 354 acute and early HIV-1 subjects. We did, however, observe differences between HSM and MSM founder viruses, including a higher selection barrier in HSX transmission with founder viruses being more cohort consensus-like that may be reflective of increased replicative fitness. We also identified a number of residues within Envelope that behave in a risk-dependent manner and could be key for HIV-1 transmission. These novel insights improve our understanding of the HIV-1 transmission bottleneck and underscore the differential selective pressures that founder viruses within the two major transmission risk groups are subjected to.
Collapse
|
33
|
Liang M, Raley C, Zheng X, Kutty G, Gogineni E, Sherman BT, Sun Q, Chen X, Skelly T, Jones K, Stephens R, Zhou B, Lau W, Johnson C, Imamichi T, Jiang M, Dewar R, Lempicki RA, Tran B, Kovacs JA, Huang DW. Distinguishing highly similar gene isoforms with a clustering-based bioinformatics analysis of PacBio single-molecule long reads. BioData Min 2016; 9:13. [PMID: 27051465 PMCID: PMC4820869 DOI: 10.1186/s13040-016-0090-8] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2015] [Accepted: 03/22/2016] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Gene isoforms are commonly found in both prokaryotes and eukaryotes. Since each isoform may perform a specific function in response to changing environmental conditions, studying the dynamics of gene isoforms is important in understanding biological processes and disease conditions. However, genome-wide identification of gene isoforms is technically challenging due to the high degree of sequence identity among isoforms. Traditional targeted sequencing approach, involving Sanger sequencing of plasmid-cloned PCR products, has low throughput and is very tedious and time-consuming. Next-generation sequencing technologies such as Illumina and 454 achieve high throughput but their short read lengths are a critical barrier to accurate assembly of highly similar gene isoforms, and may result in ambiguities and false joining during sequence assembly. More recently, the third generation sequencer represented by the PacBio platform offers sufficient throughput and long reads covering the full length of typical genes, thus providing a potential to reliably profile gene isoforms. However, the PacBio long reads are error-prone and cannot be effectively analyzed by traditional assembly programs. RESULTS We present a clustering-based analysis pipeline integrated with PacBio sequencing data for profiling highly similar gene isoforms. This approach was first evaluated in comparison to de novo assembly of 454 reads using a benchmark admixture containing 10 known, cloned msg genes encoding the major surface glycoprotein of Pneumocystis jirovecii. All 10 msg isoforms were successfully reconstructed with the expected length (~1.5 kb) and correct sequence by the new approach, while 454 reads could not be correctly assembled using various assembly programs. When using an additional benchmark admixture containing 22 known P. jirovecii msg isoforms, this approach accurately reconstructed all but 4 these isoforms in their full-length (~3 kb); these 4 isoforms were present in low concentrations in the admixture. Finally, when applied to the original clinical sample from which the 22 known msg isoforms were cloned, this approach successfully identified not only all known isoforms accurately (~3 kb each) but also 48 novel isoforms. CONCLUSIONS PacBio sequencing integrated with the clustering-based analysis pipeline achieves high-throughput and high-resolution discrimination of highly similar sequences, and can serve as a new approach for genome-wide characterization of gene isoforms and other highly repetitive sequences.
Collapse
Affiliation(s)
- Ma Liang
- />Critical Care Medicine Department, Clinical Center, Frederick, MD USA
| | - Castle Raley
- />Leidos BioMedical Research, Inc., Frederick National Laboratory for Cancer Research, NIH, Frederick, MD USA
| | - Xin Zheng
- />Leidos BioMedical Research, Inc., Frederick National Laboratory for Cancer Research, NIH, Frederick, MD USA
| | - Geetha Kutty
- />Critical Care Medicine Department, Clinical Center, Frederick, MD USA
| | - Emile Gogineni
- />Critical Care Medicine Department, Clinical Center, Frederick, MD USA
| | - Brad T. Sherman
- />Leidos BioMedical Research, Inc., Frederick National Laboratory for Cancer Research, NIH, Frederick, MD USA
| | - Qiang Sun
- />Leidos BioMedical Research, Inc., Frederick National Laboratory for Cancer Research, NIH, Frederick, MD USA
| | - Xiongfong Chen
- />Leidos BioMedical Research, Inc., Frederick National Laboratory for Cancer Research, NIH, Frederick, MD USA
| | - Thomas Skelly
- />Leidos BioMedical Research, Inc., Frederick National Laboratory for Cancer Research, NIH, Frederick, MD USA
| | - Kristine Jones
- />Leidos BioMedical Research, Inc., Frederick National Laboratory for Cancer Research, NIH, Frederick, MD USA
| | - Robert Stephens
- />Leidos BioMedical Research, Inc., Frederick National Laboratory for Cancer Research, NIH, Frederick, MD USA
| | - Bin Zhou
- />Center of Information Technology, National Institutes of Health (NIH), Bethesda, MD USA
| | - William Lau
- />Center of Information Technology, National Institutes of Health (NIH), Bethesda, MD USA
| | - Calvin Johnson
- />Center of Information Technology, National Institutes of Health (NIH), Bethesda, MD USA
| | - Tomozumi Imamichi
- />Leidos BioMedical Research, Inc., Frederick National Laboratory for Cancer Research, NIH, Frederick, MD USA
| | - Minkang Jiang
- />Leidos BioMedical Research, Inc., Frederick National Laboratory for Cancer Research, NIH, Frederick, MD USA
| | - Robin Dewar
- />Leidos BioMedical Research, Inc., Frederick National Laboratory for Cancer Research, NIH, Frederick, MD USA
| | - Richard A. Lempicki
- />Leidos BioMedical Research, Inc., Frederick National Laboratory for Cancer Research, NIH, Frederick, MD USA
| | - Bao Tran
- />Leidos BioMedical Research, Inc., Frederick National Laboratory for Cancer Research, NIH, Frederick, MD USA
| | - Joseph A. Kovacs
- />Critical Care Medicine Department, Clinical Center, Frederick, MD USA
| | - Da Wei Huang
- />Leidos BioMedical Research, Inc., Frederick National Laboratory for Cancer Research, NIH, Frederick, MD USA
- />Current Affiliation: National Cancer Institute, NIH, Bethesda, MD USA
| |
Collapse
|
34
|
Huang DW, Raley C, Jiang MK, Zheng X, Liang D, Rehman MT, Highbarger HC, Jiao X, Sherman B, Ma L, Chen X, Skelly T, Troyer J, Stephens R, Imamichi T, Pau A, Lempicki RA, Tran B, Nissley D, Lane HC, Dewar RL. Towards Better Precision Medicine: PacBio Single-Molecule Long Reads Resolve the Interpretation of HIV Drug Resistant Mutation Profiles at Explicit Quasispecies (Haplotype) Level. JOURNAL OF DATA MINING IN GENOMICS & PROTEOMICS 2016; 7:182. [PMID: 26949565 PMCID: PMC4775093 DOI: 10.4172/2153-0602.1000182] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Development of HIV-1 drug resistance mutations (HDRMs) is one of the major reasons for the clinical failure of antiretroviral therapy. Treatment success rates can be improved by applying personalized anti-HIV regimens based on a patient's HDRM profile. However, the sensitivity and specificity of the HDRM profile is limited by the methods used for detection. Sanger-based sequencing technology has traditionally been used for determining HDRM profiles at the single nucleotide variant (SNV) level, but with a sensitivity of only ≥ 20% in the HIV population of a patient. Next Generation Sequencing (NGS) technologies offer greater detection sensitivity (~ 1%) and larger scope (hundreds of samples per run). However, NGS technologies produce reads that are too short to enable the detection of the physical linkages of individual SNVs across the haplotype of each HIV strain present. In this article, we demonstrate that the single-molecule long reads generated using the Third Generation Sequencer (TGS), PacBio RS II, along with the appropriate bioinformatics analysis method, can resolve the HDRM profile at a more advanced quasispecies level. The case studies on patients' HIV samples showed that the quasispecies view produced using the PacBio method offered greater detection sensitivity and was more comprehensive for understanding HDRM situations, which is complement to both Sanger and NGS technologies. In conclusion, the PacBio method, providing a promising new quasispecies level of HDRM profiling, may effect an important change in the field of HIV drug resistance research.
Collapse
Affiliation(s)
- Da Wei Huang
- Applied and Developmental Research Directorate, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, MD 21702, USA
- National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Castle Raley
- Cancer Research Technology Program, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, MD 21702, USA
| | - Min Kang Jiang
- Applied and Developmental Research Directorate, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, MD 21702, USA
| | - Xin Zheng
- Applied and Developmental Research Directorate, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, MD 21702, USA
| | - Dun Liang
- Applied and Developmental Research Directorate, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, MD 21702, USA
| | - M Tauseef Rehman
- Applied and Developmental Research Directorate, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, MD 21702, USA
| | - Helene C. Highbarger
- Applied and Developmental Research Directorate, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, MD 21702, USA
| | - Xiaoli Jiao
- Applied and Developmental Research Directorate, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, MD 21702, USA
| | - Brad Sherman
- Applied and Developmental Research Directorate, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, MD 21702, USA
| | - Liang Ma
- Critical Care Medicine Department, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Xiaofeng Chen
- Advanced Biomedical Computing Center, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, MD 21702, USA
| | - Thomas Skelly
- Advanced Biomedical Computing Center, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, MD 21702, USA
| | - Jennifer Troyer
- Cancer Research Technology Program, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, MD 21702, USA
- National Human Genome Research Institute, National Institutes of Health, Rockville, MD, 20852, USA
| | - Robert Stephens
- Cancer Research Technology Program, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, MD 21702, USA
| | - Tomozumi Imamichi
- Applied and Developmental Research Directorate, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, MD 21702, USA
| | - Alice Pau
- Division of Clinical Research, National Institute of Allergy & Infectious Diseases, USA
| | - Richard A Lempicki
- Applied and Developmental Research Directorate, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, MD 21702, USA
| | - Bao Tran
- Cancer Research Technology Program, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, MD 21702, USA
| | - Dwight Nissley
- Cancer Research Technology Program, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, MD 21702, USA
| | - H Clifford Lane
- Division of Clinical Research, National Institute of Allergy & Infectious Diseases, USA
| | - Robin L. Dewar
- Applied and Developmental Research Directorate, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, MD 21702, USA
| |
Collapse
|
35
|
Ode H, Matsuda M, Matsuoka K, Hachiya A, Hattori J, Kito Y, Yokomaku Y, Iwatani Y, Sugiura W. Quasispecies Analyses of the HIV-1 Near-full-length Genome With Illumina MiSeq. Front Microbiol 2015; 6:1258. [PMID: 26617593 PMCID: PMC4641896 DOI: 10.3389/fmicb.2015.01258] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2015] [Accepted: 10/29/2015] [Indexed: 12/29/2022] Open
Abstract
Human immunodeficiency virus type-1 (HIV-1) exhibits high between-host genetic diversity and within-host heterogeneity, recognized as quasispecies. Because HIV-1 quasispecies fluctuate in terms of multiple factors, such as antiretroviral exposure and host immunity, analyzing the HIV-1 genome is critical for selecting effective antiretroviral therapy and understanding within-host viral coevolution mechanisms. Here, to obtain HIV-1 genome sequence information that includes minority variants, we sought to develop a method for evaluating quasispecies throughout the HIV-1 near-full-length genome using the Illumina MiSeq benchtop deep sequencer. To ensure the reliability of minority mutation detection, we applied an analysis method of sequence read mapping onto a consensus sequence derived from de novo assembly followed by iterative mapping and subsequent unique error correction. Deep sequencing analyses of aHIV-1 clone showed that the analysis method reduced erroneous base prevalence below 1% in each sequence position and discarded only < 1% of all collected nucleotides, maximizing the usage of the collected genome sequences. Further, we designed primer sets to amplify the HIV-1 near-full-length genome from clinical plasma samples. Deep sequencing of 92 samples in combination with the primer sets and our analysis method provided sufficient coverage to identify >1%-frequency sequences throughout the genome. When we evaluated sequences of pol genes from 18 treatment-naïve patients' samples, the deep sequencing results were in agreement with Sanger sequencing and identified numerous additional minority mutations. The results suggest that our deep sequencing method would be suitable for identifying within-host viral population dynamics throughout the genome.
Collapse
Affiliation(s)
- Hirotaka Ode
- Department of Infectious Diseases and Immunology, Clinical Research Center, National Hospital Organization Nagoya Medical Center Nagoya, Japan
| | - Masakazu Matsuda
- Department of Infectious Diseases and Immunology, Clinical Research Center, National Hospital Organization Nagoya Medical Center Nagoya, Japan
| | - Kazuhiro Matsuoka
- Department of Infectious Diseases and Immunology, Clinical Research Center, National Hospital Organization Nagoya Medical Center Nagoya, Japan
| | - Atsuko Hachiya
- Department of Infectious Diseases and Immunology, Clinical Research Center, National Hospital Organization Nagoya Medical Center Nagoya, Japan
| | - Junko Hattori
- Department of Infectious Diseases and Immunology, Clinical Research Center, National Hospital Organization Nagoya Medical Center Nagoya, Japan
| | - Yumiko Kito
- Department of Infectious Diseases and Immunology, Clinical Research Center, National Hospital Organization Nagoya Medical Center Nagoya, Japan
| | - Yoshiyuki Yokomaku
- Department of Infectious Diseases and Immunology, Clinical Research Center, National Hospital Organization Nagoya Medical Center Nagoya, Japan
| | - Yasumasa Iwatani
- Department of Infectious Diseases and Immunology, Clinical Research Center, National Hospital Organization Nagoya Medical Center Nagoya, Japan ; Department of AIDS Research, Graduate School of Medicine, Nagoya University Nagoya, Japan
| | - Wataru Sugiura
- Department of Infectious Diseases and Immunology, Clinical Research Center, National Hospital Organization Nagoya Medical Center Nagoya, Japan ; Department of AIDS Research, Graduate School of Medicine, Nagoya University Nagoya, Japan
| |
Collapse
|
36
|
Wu SH, Rodrigo AG. Estimation of evolutionary parameters using short, random and partial sequences from mixed samples of anonymous individuals. BMC Bioinformatics 2015; 16:357. [PMID: 26536860 PMCID: PMC4634753 DOI: 10.1186/s12859-015-0810-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2015] [Accepted: 10/30/2015] [Indexed: 11/17/2022] Open
Abstract
Background Over the last decade, next generation sequencing (NGS) has become widely available, and is now the sequencing technology of choice for most researchers. Nonetheless, NGS presents a challenge for the evolutionary biologists who wish to estimate evolutionary genetic parameters from a mixed sample of unlabelled or untagged individuals, especially when the reconstruction of full length haplotypes can be unreliable. We propose two novel approaches, least squares estimation (LS) and Approximate Bayesian Computation Markov chain Monte Carlo estimation (ABC-MCMC), to infer evolutionary genetic parameters from a collection of short-read sequences obtained from a mixed sample of anonymous DNA using the frequencies of nucleotides at each site only without reconstructing the full-length alignment nor the phylogeny. Results We used simulations to evaluate the performance of these algorithms, and our results demonstrate that LS performs poorly because bootstrap 95 % Confidence Intervals (CIs) tend to under- or over-estimate the true values of the parameters. In contrast, ABC-MCMC 95 % Highest Posterior Density (HPD) intervals recovered from ABC-MCMC enclosed the true parameter values with a rate approximately equivalent to that obtained using BEAST, a program that implements a Bayesian MCMC estimation of evolutionary parameters using full-length sequences. Because there is a loss of information with the use of sitewise nucleotide frequencies alone, the ABC-MCMC 95 % HPDs are larger than those obtained by BEAST. Conclusion We propose two novel algorithms to estimate evolutionary genetic parameters based on the proportion of each nucleotide. The LS method cannot be recommended as a standalone method for evolutionary parameter estimation. On the other hand, parameters recovered by ABC-MCMC are comparable to those obtained using BEAST, but with larger 95 % HPDs. One major advantage of ABC-MCMC is that computational time scales linearly with the number of short-read sequences, and is independent of the number of full-length sequences in the original data. This allows us to perform the analysis on NGS datasets with large numbers of short read fragments. The source code for ABC-MCMC is available at https://github.com/stevenhwu/SF-ABC.
Collapse
Affiliation(s)
- Steven H Wu
- Biodesign Institute, Arizona State University, Tempe, AZ, 85287, USA. .,Department of Biology, Duke University, Box 90338, Durham, NC, 27708, USA.
| | - Allen G Rodrigo
- Department of Biology, Duke University, Box 90338, Durham, NC, 27708, USA. .,The National Evolutionary Synthesis Center, Durham, NC, 27705, USA.
| |
Collapse
|
37
|
Montoya V, Olmstead AD, Janjua NZ, Tang P, Grebely J, Cook D, Richard Harrigan P, Krajden M. Differentiation of acute from chronic hepatitis C virus infection by nonstructural 5B deep sequencing: a population-level tool for incidence estimation. Hepatology 2015; 61:1842-50. [PMID: 25645961 DOI: 10.1002/hep.27734] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/28/2014] [Accepted: 01/28/2015] [Indexed: 01/19/2023]
Abstract
UNLABELLED The ability to classify acute versus chronic hepatitis C virus (HCV) infections at the time of diagnosis is desirable to improve the quality of surveillance information. The aim of this study was to differentiate acute from chronic HCV infections utilizing deep sequencing. HCV nonstructural 5B (NS5B) amplicons (n = 94) were generated from 77 individuals (13 acute and 64 chronic HCV infections) in British Columbia, Canada, with documented seroconversion time frames. Amplicons were deep sequenced and HCV genomic diversity was measured by Shannon entropy (SE) and a single nucleotide variant (SNV) analysis. The relationship between each diversity measure and the estimated days since infection was assessed using linear mixed models, and the ability of each diversity measure to differentiate acute from chronic infections was assessed using generalized estimating equations. Both SE and the SNV diversity measures were significantly different for acute versus chronic infections (P < 0.009). NS5B nucleotide diversity continued to increase for at least 3 years postinfection. Among individuals with the least uncertainty with regard to duration of infection (n = 39), the area under the receiver operating characteristic curve (AUROC) was high (0.96 for SE; 0.98 for SNV). Although the AUROCs were lower (0.86 for SE; 0.80 for SNV) when data for all individuals were included, they remain sufficiently high for epidemiological purposes. Synonymous mutations were the primary discriminatory variable accounting for over 78% of the measured genetic diversity. CONCLUSIONS NS5B sequence diversity assessed by deep sequencing can differentiate acute from chronic HCV infections and, with further validation, could become a powerful population-level surveillance tool for incidence estimation.
Collapse
Affiliation(s)
- Vincent Montoya
- BC Center for Disease Control, Vancouver, British Columbia, Canada.,University of British Columbia, Vancouver, British Columbia, Canada
| | - Andrea D Olmstead
- BC Center for Disease Control, Vancouver, British Columbia, Canada.,University of British Columbia, Vancouver, British Columbia, Canada
| | - Naveed Z Janjua
- BC Center for Disease Control, Vancouver, British Columbia, Canada.,University of British Columbia, Vancouver, British Columbia, Canada
| | - Patrick Tang
- BC Center for Disease Control, Vancouver, British Columbia, Canada.,University of British Columbia, Vancouver, British Columbia, Canada
| | - Jason Grebely
- The Kirby Institute, University of New South Wales, Sydney, New South Wales, Australia
| | - Darrel Cook
- BC Center for Disease Control, Vancouver, British Columbia, Canada
| | - P Richard Harrigan
- BC Center for Excellence in HIV/AIDS, St Paul's Hospital, Vancouver, British Columbia, Canada
| | - Mel Krajden
- BC Center for Disease Control, Vancouver, British Columbia, Canada.,University of British Columbia, Vancouver, British Columbia, Canada
| |
Collapse
|
38
|
Černi S, Prpić J, Jemeršić L, Škorić D. The application of single strand conformation polymorphism (SSCP) analysis in determining Hepatitis E virus intra-host diversity. J Virol Methods 2015; 221:46-50. [PMID: 25920567 DOI: 10.1016/j.jviromet.2015.04.020] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2014] [Revised: 04/14/2015] [Accepted: 04/16/2015] [Indexed: 01/23/2023]
Abstract
Genetic heterogeneity of RNA populations influences virus pathogenesis, epidemiology and evolution. Therefore, accurate information regarding virus genetic structure is highly important for both diagnostic and scientific purposes. For the Hepatitis E virus (HEV), the causal agent of hepatitis in humans, the intra-host population structure has been poorly investigated, mainly using the less sensitive RFLP-based approach. The objective of this study was to assess the suitability and the accuracy of single strand conformation polymorphism (SSCP) analysis, a well-established tool in genetic variation research, for the characterization of HEV quasispecies. The analysis was conducted on 50 clones of five swine isolates and 30 clones of three human HEV isolates. To identify and quantify the sequence variants present in each HEV isolate, 348bp long fragments of the amplified conserved ORF2 region were separated by cloning. Ten clones per isolate were subjected to SSCP and sequenced in a parallel experiment. The results show a high correlation of SSCP haplotype profiling with the sequencing results, confirming the sensitivity and reliability of this simple, rapid and low cost approach in the characterization of HEV quasispecies.
Collapse
Affiliation(s)
- S Černi
- University of Zagreb, Faculty of Science, Department of Biology, Marulićev trg 9A, Zagreb, Croatia
| | - J Prpić
- Croatian Veterinary Institute, Department of Virology, Savska cesta 143, Zagreb, Croatia
| | - L Jemeršić
- Croatian Veterinary Institute, Department of Virology, Savska cesta 143, Zagreb, Croatia
| | - D Škorić
- University of Zagreb, Faculty of Science, Department of Biology, Marulićev trg 9A, Zagreb, Croatia.
| |
Collapse
|
39
|
Ogishi M, Yotsuyanagi H, Tsutsumi T, Gatanaga H, Ode H, Sugiura W, Moriya K, Oka S, Kimura S, Koike K. Deconvoluting the composition of low-frequency hepatitis C viral quasispecies: comparison of genotypes and NS3 resistance-associated variants between HCV/HIV coinfected hemophiliacs and HCV monoinfected patients in Japan. PLoS One 2015; 10:e0119145. [PMID: 25748426 PMCID: PMC4351984 DOI: 10.1371/journal.pone.0119145] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2014] [Accepted: 01/09/2015] [Indexed: 12/16/2022] Open
Abstract
Pre-existing low-frequency resistance-associated variants (RAVs) may jeopardize successful sustained virological responses (SVR) to HCV treatment with direct-acting antivirals (DAAs). However, the potential impact of low-frequency (∼0.1%) mutations, concatenated mutations (haplotypes), and their association with genotypes (Gts) on the treatment outcome has not yet been elucidated, most probably owing to the difficulty in detecting pre-existing minor haplotypes with sufficient length and accuracy. Herein, we characterize a methodological framework based on Illumina MiSeq next-generation sequencing (NGS) coupled with bioinformatics of quasispecies reconstruction (QSR) to realize highly accurate variant calling and genotype-haplotype detection. The core-to-NS3 protease coding sequences in 10 HCV monoinfected patients, 5 of whom had a history of blood transfusion, and 11 HCV/HIV coinfected patients with hemophilia, were studied. Simulation experiments showed that, for minor variants constituting more than 1%, our framework achieved a positive predictive value (PPV) of 100% and sensitivities of 91.7–100% for genotyping and 80.6% for RAV screening. Genotyping analysis indicated the prevalence of dominant Gt1a infection in coinfected patients (6/11 vs 0/10, p = 0.01). For clinical samples, minor genotype overlapping infection was prevalent in HCV/HIV coinfected hemophiliacs (10/11) and patients who experienced whole-blood transfusion (4/5) but none in patients without exposure to blood (0/5). As for RAV screening, the Q80K/R and S122K/R variants were particularly prevalent among minor RAVs observed, detected in 12/21 and 6/21 cases, respectively. Q80K was detected only in coinfected patients, whereas Q80R was predominantly detected in monoinfected patients (1/11 vs 7/10, p < 0.01). Multivariate interdependence analysis revealed the previously unrecognized prevalence of Gt1b-Q80K, in HCV/HIV coinfected hemophiliacs [Odds ratio = 13.4 (3.48–51.9), p < 0.01]. Our study revealed the distinct characteristics of viral quasispecies between the subgroups specified above and the feasibility of NGS and QSR-based genetic deconvolution of pre-existing minor Gts, RAVs, and their interrelationships.
Collapse
Affiliation(s)
- Masato Ogishi
- Department of Internal Medicine, Graduate School of Medicine, University of Tokyo, Bunkyo, Tokyo, Japan
| | - Hiroshi Yotsuyanagi
- Department of Internal Medicine, Graduate School of Medicine, University of Tokyo, Bunkyo, Tokyo, Japan
- * E-mail:
| | - Takeya Tsutsumi
- Department of Internal Medicine, Graduate School of Medicine, University of Tokyo, Bunkyo, Tokyo, Japan
| | - Hiroyuki Gatanaga
- AIDS Clinical Center, National Center for Global Health and Medicine, Shinjuku, Tokyo, Japan
| | - Hirotaka Ode
- Department of Infectious Diseases and Immunology, Clinical Research Center, Nagoya Medical Center, Nagoya, Japan
| | - Wataru Sugiura
- Department of Infectious Diseases and Immunology, Clinical Research Center, Nagoya Medical Center, Nagoya, Japan
| | - Kyoji Moriya
- Department of Internal Medicine, Graduate School of Medicine, University of Tokyo, Bunkyo, Tokyo, Japan
| | - Shinichi Oka
- AIDS Clinical Center, National Center for Global Health and Medicine, Shinjuku, Tokyo, Japan
| | - Satoshi Kimura
- Director, Tokyo Teishin Hospital, Tokyo, Japan; President, Tokyo Health Care University, Tokyo, Japan
| | - Kazuhiko Koike
- Department of Internal Medicine, Graduate School of Medicine, University of Tokyo, Bunkyo, Tokyo, Japan
| |
Collapse
|
40
|
Verbist B, Clement L, Reumers J, Thys K, Vapirev A, Talloen W, Wetzels Y, Meys J, Aerssens J, Bijnens L, Thas O. ViVaMBC: estimating viral sequence variation in complex populations from illumina deep-sequencing data using model-based clustering. BMC Bioinformatics 2015; 16:59. [PMID: 25887734 PMCID: PMC4369097 DOI: 10.1186/s12859-015-0458-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2014] [Accepted: 12/16/2014] [Indexed: 11/10/2022] Open
Abstract
Background Deep-sequencing allows for an in-depth characterization of sequence variation in complex populations. However, technology associated errors may impede a powerful assessment of low-frequency mutations. Fortunately, base calls are complemented with quality scores which are derived from a quadruplet of intensities, one channel for each nucleotide type for Illumina sequencing. The highest intensity of the four channels determines the base that is called. Mismatch bases can often be corrected by the second best base, i.e. the base with the second highest intensity in the quadruplet. A virus variant model-based clustering method, ViVaMBC, is presented that explores quality scores and second best base calls for identifying and quantifying viral variants. ViVaMBC is optimized to call variants at the codon level (nucleotide triplets) which enables immediate biological interpretation of the variants with respect to their antiviral drug responses. Results Using mixtures of HCV plasmids we show that our method accurately estimates frequencies down to 0.5%. The estimates are unbiased when average coverages of 25,000 are reached. A comparison with the SNP-callers V-Phaser2, ShoRAH, and LoFreq shows that ViVaMBC has a superb sensitivity and specificity for variants with frequencies above 0.4%. Unlike the competitors, ViVaMBC reports a higher number of false-positive findings with frequencies below 0.4% which might partially originate from picking up artificial variants introduced by errors in the sample and library preparation step. Conclusions ViVaMBC is the first method to call viral variants directly at the codon level. The strength of the approach lies in modeling the error probabilities based on the quality scores. Although the use of second best base calls appeared very promising in our data exploration phase, their utility was limited. They provided a slight increase in sensitivity, which however does not warrant the additional computational cost of running the offline base caller. Apparently a lot of information is already contained in the quality scores enabling the model based clustering procedure to adjust the majority of the sequencing errors. Overall the sensitivity of ViVaMBC is such that technical constraints like PCR errors start to form the bottleneck for low frequency variant detection. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0458-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Bie Verbist
- Department of Mathematical Modeling, Statistics and Bioinformatics, Ghent University, Coupure Links 653, Gent, 9000, Belgium.
| | - Lieven Clement
- Department of Applied Mathematics, Informatics and Statistics, Ghent University, Krijgslaan 281 S9, Gent, 9000, Belgium.
| | - Joke Reumers
- Janssen R&D, Janssen Pharmaceutical Companies of J&J, Turnhoutseweg 30, Beerse, 2340, Belgium.
| | - Kim Thys
- Janssen R&D, Janssen Pharmaceutical Companies of J&J, Turnhoutseweg 30, Beerse, 2340, Belgium.
| | - Alexander Vapirev
- Janssen R&D, Janssen Pharmaceutical Companies of J&J, Turnhoutseweg 30, Beerse, 2340, Belgium. .,ExaScience Life Lab, Kapeldreef 75, Leuven, 3001, Belgium.
| | - Willem Talloen
- Janssen R&D, Janssen Pharmaceutical Companies of J&J, Turnhoutseweg 30, Beerse, 2340, Belgium.
| | - Yves Wetzels
- Janssen R&D, Janssen Pharmaceutical Companies of J&J, Turnhoutseweg 30, Beerse, 2340, Belgium.
| | - Joris Meys
- Department of Mathematical Modeling, Statistics and Bioinformatics, Ghent University, Coupure Links 653, Gent, 9000, Belgium.
| | - Jeroen Aerssens
- Janssen R&D, Janssen Pharmaceutical Companies of J&J, Turnhoutseweg 30, Beerse, 2340, Belgium.
| | - Luc Bijnens
- Janssen R&D, Janssen Pharmaceutical Companies of J&J, Turnhoutseweg 30, Beerse, 2340, Belgium.
| | - Olivier Thas
- Department of Mathematical Modeling, Statistics and Bioinformatics, Ghent University, Coupure Links 653, Gent, 9000, Belgium. .,University of Wollongong, National Institute for Applied Statistics Research Australia (NIASRA), School of Mathematics and Applied Statistics, NSW, 2522, Australia.
| |
Collapse
|
41
|
Schirmer M, Ijaz UZ, D'Amore R, Hall N, Sloan WT, Quince C. Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform. Nucleic Acids Res 2015; 43:e37. [PMID: 25586220 PMCID: PMC4381044 DOI: 10.1093/nar/gku1341] [Citation(s) in RCA: 472] [Impact Index Per Article: 47.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2014] [Accepted: 12/12/2014] [Indexed: 12/13/2022] Open
Abstract
With read lengths of currently up to 2 × 300 bp, high throughput and low sequencing costs Illumina's MiSeq is becoming one of the most utilized sequencing platforms worldwide. The platform is manageable and affordable even for smaller labs. This enables quick turnaround on a broad range of applications such as targeted gene sequencing, metagenomics, small genome sequencing and clinical molecular diagnostics. However, Illumina error profiles are still poorly understood and programs are therefore not designed for the idiosyncrasies of Illumina data. A better knowledge of the error patterns is essential for sequence analysis and vital if we are to draw valid conclusions. Studying true genetic variation in a population sample is fundamental for understanding diseases, evolution and origin. We conducted a large study on the error patterns for the MiSeq based on 16S rRNA amplicon sequencing data. We tested state-of-the-art library preparation methods for amplicon sequencing and showed that the library preparation method and the choice of primers are the most significant sources of bias and cause distinct error patterns. Furthermore we tested the efficiency of various error correction strategies and identified quality trimming (Sickle) combined with error correction (BayesHammer) followed by read overlapping (PANDAseq) as the most successful approach, reducing substitution error rates on average by 93%.
Collapse
Affiliation(s)
| | - Umer Z Ijaz
- School of Engineering, University of Glasgow, Glasgow, UK
| | - Rosalinda D'Amore
- Functional and Comparative Genomics, University of Liverpool, Liverpool, UK
| | - Neil Hall
- Functional and Comparative Genomics, University of Liverpool, Liverpool, UK
| | | | | |
Collapse
|
42
|
Jayasundara D, Saeed I, Maheswararajah S, Chang B, Tang SL, Halgamuge SK. ViQuaS: an improved reconstruction pipeline for viral quasispecies spectra generated by next-generation sequencing. Bioinformatics 2014; 31:886-96. [DOI: 10.1093/bioinformatics/btu754] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
|
43
|
Lu ZH, Archibald AL, Ait-Ali T. Beyond the whole genome consensus: unravelling of PRRSV phylogenomics using next generation sequencing technologies. Virus Res 2014; 194:167-74. [PMID: 25312450 PMCID: PMC4275598 DOI: 10.1016/j.virusres.2014.10.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2014] [Revised: 10/01/2014] [Accepted: 10/01/2014] [Indexed: 02/05/2023]
Abstract
NGS allows the whole genome sequencing of PRRSV without any prior knowledge. Low frequency variants within the co-evolving quasispecies can be detected. Both macro- and micro-evolutionary events can be followed using NGS.
The highly heterogeneous porcine reproductive and respiratory syndrome virus (PRRSV) is the causative agent responsible for an economically important pig disease with the characteristic symptoms of reproductive losses in breeding sows and respiratory illnesses in young piglets. The virus can be broadly divided into the European and North American-like genotype 1 and 2 respectively. In addition to this intra-strains variability, the impact of coexisting viral quasispecies on disease development has recently gained much attention; owing very much to the advent of the next-generation sequencing (NGS) technologies. Genomic data produced from the massive sequencing capacities of NGS have enabled the study of PRRSV at an unprecedented rate and details. Unlike conventional sequencing methods which require knowledge of conserved regions, NGS allows de novo assembly of the full viral genomes. Evolutionary variations gained from different genotypic strains provide valuable insights into functionally important regions of the virus. Together with the advancement of sophisticated bioinformatics tools, ultra-deep NGS technologies make the detection of low frequency co-evolving quasispecies possible. This short review gives an overview, including a proposed workflow, on the use of NGS to explore the genetic diversity of PRRSV at both macro- and micro-evolutionary levels.
Collapse
Affiliation(s)
- Zen H Lu
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, EH25 9RG Midlothian, United Kingdom.
| | - Alan L Archibald
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, EH25 9RG Midlothian, United Kingdom
| | - Tahar Ait-Ali
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, EH25 9RG Midlothian, United Kingdom.
| |
Collapse
|
44
|
Pandit A, de Boer RJ. Reliable reconstruction of HIV-1 whole genome haplotypes reveals clonal interference and genetic hitchhiking among immune escape variants. Retrovirology 2014; 11:56. [PMID: 24996694 PMCID: PMC4227095 DOI: 10.1186/1742-4690-11-56] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2014] [Accepted: 06/24/2014] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND Following transmission, HIV-1 evolves into a diverse population, and next generation sequencing enables us to detect variants occurring at low frequencies. Studying viral evolution at the level of whole genomes was hitherto not possible because next generation sequencing delivers relatively short reads. RESULTS We here provide a proof of principle that whole HIV-1 genomes can be reliably reconstructed from short reads, and use this to study the selection of immune escape mutations at the level of whole genome haplotypes. Using realistically simulated HIV-1 populations, we demonstrate that reconstruction of complete genome haplotypes is feasible with high fidelity. We do not reconstruct all genetically distinct genomes, but each reconstructed haplotype represents one or more of the quasispecies in the HIV-1 population. We then reconstruct 30 whole genome haplotypes from published short sequence reads sampled longitudinally from a single HIV-1 infected patient. We confirm the reliability of the reconstruction by validating our predicted haplotype genes with single genome amplification sequences, and by comparing haplotype frequencies with observed epitope escape frequencies. CONCLUSIONS Phylogenetic analysis shows that the HIV-1 population undergoes selection driven evolution, with successive replacement of the viral population by novel dominant strains. We demonstrate that immune escape mutants evolve in a dependent manner with various mutations hitchhiking along with others. As a consequence of this clonal interference, selection coefficients have to be estimated for complete haplotypes and not for individual immune escapes.
Collapse
Affiliation(s)
- Aridaman Pandit
- Theoretical Biology and Bioinformatics, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
| | - Rob J de Boer
- Theoretical Biology and Bioinformatics, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
| |
Collapse
|
45
|
Giallonardo FD, Töpfer A, Rey M, Prabhakaran S, Duport Y, Leemann C, Schmutz S, Campbell NK, Joos B, Lecca MR, Patrignani A, Däumer M, Beisel C, Rusert P, Trkola A, Günthard HF, Roth V, Beerenwinkel N, Metzner KJ. Full-length haplotype reconstruction to infer the structure of heterogeneous virus populations. Nucleic Acids Res 2014; 42:e115. [PMID: 24972832 PMCID: PMC4132706 DOI: 10.1093/nar/gku537] [Citation(s) in RCA: 111] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Next-generation sequencing (NGS) technologies enable new insights into the diversity of virus populations within their hosts. Diversity estimation is currently restricted to single-nucleotide variants or to local fragments of no more than a few hundred nucleotides defined by the length of sequence reads. To study complex heterogeneous virus populations comprehensively, novel methods are required that allow for complete reconstruction of the individual viral haplotypes. Here, we show that assembly of whole viral genomes of ∼8600 nucleotides length is feasible from mixtures of heterogeneous HIV-1 strains derived from defined combinations of cloned virus strains and from clinical samples of an HIV-1 superinfected individual. Haplotype reconstruction was achieved using optimized experimental protocols and computational methods for amplification, sequencing and assembly. We comparatively assessed the performance of the three NGS platforms 454 Life Sciences/Roche, Illumina and Pacific Biosciences for this task. Our results prove and delineate the feasibility of NGS-based full-length viral haplotype reconstruction and provide new tools for studying evolution and pathogenesis of viruses.
Collapse
Affiliation(s)
- Francesca Di Giallonardo
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, 8091 Zurich, Switzerland Life Science Zurich Graduate School, University of Zurich, 8057 Zurich, Switzerland
| | - Armin Töpfer
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland SIB Swiss Institute of Bioinformatics, 4058 Basel, Switzerland
| | - Melanie Rey
- Department of Mathematics and Computer Science, University of Basel, 4056 Basel, Switzerland
| | - Sandhya Prabhakaran
- Department of Mathematics and Computer Science, University of Basel, 4056 Basel, Switzerland
| | - Yannick Duport
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, 8091 Zurich, Switzerland
| | - Christine Leemann
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, 8091 Zurich, Switzerland
| | - Stefan Schmutz
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, 8091 Zurich, Switzerland
| | - Nottania K Campbell
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, 8091 Zurich, Switzerland Life Science Zurich Graduate School, University of Zurich, 8057 Zurich, Switzerland
| | - Beda Joos
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, 8091 Zurich, Switzerland
| | - Maria Rita Lecca
- Functional Genomics Center Zurich, University of Zurich, ETH Zurich, 8057 Zurich, Switzerland
| | - Andrea Patrignani
- Functional Genomics Center Zurich, University of Zurich, ETH Zurich, 8057 Zurich, Switzerland
| | - Martin Däumer
- Institut für Immunologie und Genetik, 67655 Kaiserslautern, Germany
| | - Christian Beisel
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland
| | - Peter Rusert
- Institute of Medical Virology, University of Zurich, 8057 Zurich, Switzerland
| | - Alexandra Trkola
- Institute of Medical Virology, University of Zurich, 8057 Zurich, Switzerland
| | - Huldrych F Günthard
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, 8091 Zurich, Switzerland
| | - Volker Roth
- Department of Mathematics and Computer Science, University of Basel, 4056 Basel, Switzerland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland SIB Swiss Institute of Bioinformatics, 4058 Basel, Switzerland
| | - Karin J Metzner
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, 8091 Zurich, Switzerland
| |
Collapse
|
46
|
A bioinformatics pipeline for the analyses of viral escape dynamics and host immune responses during an infection. BIOMED RESEARCH INTERNATIONAL 2014; 2014:264519. [PMID: 25013771 PMCID: PMC4072169 DOI: 10.1155/2014/264519] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/31/2014] [Accepted: 05/08/2014] [Indexed: 01/21/2023]
Abstract
Rapidly mutating viruses, such as hepatitis C virus (HCV) and HIV, have adopted evolutionary strategies that allow escape from the host immune response via genomic mutations. Recent advances in high-throughput sequencing are reshaping the field of immuno-virology of viral infections, as these allow fast and cheap generation of genomic data. However, due to the large volumes of data generated, a thorough understanding of the biological and immunological significance of such information is often difficult. This paper proposes a pipeline that allows visualization and statistical analysis of viral mutations that are associated with immune escape. Taking next generation sequencing data from longitudinal analysis of HCV viral genomes during a single HCV infection, along with antigen specific T-cell responses detected from the same subject, we demonstrate the applicability of these tools in the context of primary HCV infection. We provide a statistical and visual explanation of the relationship between cooccurring mutations on the viral genome and the parallel adaptive immune response against HCV.
Collapse
|
47
|
Lu ZH, Brown A, Wilson AD, Calvert JG, Balasch M, Fuentes-Utrilla P, Loecherbach J, Turner F, Talbot R, Archibald AL, Ait-Ali T. Genomic variation in macrophage-cultured European porcine reproductive and respiratory syndrome virus Olot/91 revealed using ultra-deep next generation sequencing. Virol J 2014; 11:42. [PMID: 24588855 PMCID: PMC3945042 DOI: 10.1186/1743-422x-11-42] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2013] [Accepted: 02/24/2014] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Porcine Reproductive and Respiratory Syndrome (PRRS) is a disease of major economic impact worldwide. The etiologic agent of this disease is the PRRS virus (PRRSV). Increasing evidence suggest that microevolution within a coexisting quasispecies population can give rise to high sequence heterogeneity in PRRSV. FINDINGS We developed a pipeline based on the ultra-deep next generation sequencing approach to first construct the complete genome of a European PRRSV, strain Olot/9, cultured on macrophages and then capture the rare variants representative of the mixed quasispecies population. Olot/91 differs from the reference Lelystad strain by about 5% and a total of 88 variants, with frequencies as low as 1%, were detected in the mixed population. These variants included 16 non-synonymous variants concentrated in the genes encoding structural and nonstructural proteins; including Glycoprotein 2a and 5. CONCLUSION Using an ultra-deep sequencing methodology, the complete genome of Olot/91 was constructed without any prior knowledge of the sequence. Rare variants that constitute minor fractions of the heterogeneous PRRSV population could successfully be detected to allow further exploration of microevolutionary events.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | - Tahar Ait-Ali
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Edinburgh EH25 9RG, UK.
| |
Collapse
|
48
|
Töpfer A, Marschall T, Bull RA, Luciani F, Schönhuth A, Beerenwinkel N. Viral quasispecies assembly via maximal clique enumeration. PLoS Comput Biol 2014; 10:e1003515. [PMID: 24675810 PMCID: PMC3967922 DOI: 10.1371/journal.pcbi.1003515] [Citation(s) in RCA: 76] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2013] [Accepted: 01/31/2014] [Indexed: 11/25/2022] Open
Abstract
Virus populations can display high genetic diversity within individual hosts. The intra-host collection of viral haplotypes, called viral quasispecies, is an important determinant of virulence, pathogenesis, and treatment outcome. We present HaploClique, a computational approach to reconstruct the structure of a viral quasispecies from next-generation sequencing data as obtained from bulk sequencing of mixed virus samples. We develop a statistical model for paired-end reads accounting for mutations, insertions, and deletions. Using an iterative maximal clique enumeration approach, read pairs are assembled into haplotypes of increasing length, eventually enabling global haplotype assembly. The performance of our quasispecies assembly method is assessed on simulated data for varying population characteristics and sequencing technology parameters. Owing to its paired-end handling, HaploClique compares favorably to state-of-the-art haplotype inference methods. It can reconstruct error-free full-length haplotypes from low coverage samples and detect large insertions and deletions at low frequencies. We applied HaploClique to sequencing data derived from a clinical hepatitis C virus population of an infected patient and discovered a novel deletion of length 357±167 bp that was validated by two independent long-read sequencing experiments. HaploClique is available at https://github.com/armintoepfer/haploclique. A summary of this paper appears in the proceedings of the RECOMB 2014 conference, April 2-5.
Collapse
Affiliation(s)
- Armin Töpfer
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | | | - Rowena A. Bull
- Inflammation and Infection Research Centre, School of Medical Sciences, UNSW, Sydney, Australia
| | - Fabio Luciani
- Inflammation and Infection Research Centre, School of Medical Sciences, UNSW, Sydney, Australia
| | | | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| |
Collapse
|
49
|
Heo Y, Wu XL, Chen D, Ma J, Hwu WM. BLESS: bloom filter-based error correction solution for high-throughput sequencing reads. ACTA ACUST UNITED AC 2014; 30:1354-62. [PMID: 24451628 DOI: 10.1093/bioinformatics/btu030] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Rapid advances in next-generation sequencing (NGS) technology have led to exponential increase in the amount of genomic information. However, NGS reads contain far more errors than data from traditional sequencing methods, and downstream genomic analysis results can be improved by correcting the errors. Unfortunately, all the previous error correction methods required a large amount of memory, making it unsuitable to process reads from large genomes with commodity computers. RESULTS We present a novel algorithm that produces accurate correction results with much less memory compared with previous solutions. The algorithm, named BLoom-filter-based Error correction Solution for high-throughput Sequencing reads (BLESS), uses a single minimum-sized Bloom filter, and is also able to tolerate a higher false-positive rate, thus allowing us to correct errors with a 40× memory usage reduction on average compared with previous methods. Meanwhile, BLESS can extend reads like DNA assemblers to correct errors at the end of reads. Evaluations using real and simulated reads showed that BLESS could generate more accurate results than existing solutions. After errors were corrected using BLESS, 69% of initially unaligned reads could be aligned correctly. Additionally, de novo assembly results became 50% longer with 66% fewer assembly errors. AVAILABILITY AND IMPLEMENTATION Freely available at http://sourceforge.net/p/bless-ec CONTACT dchen@illinois.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yun Heo
- Department of Electrical and Computer Engineering, Department of Bioengineering and Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | | | | | | | | |
Collapse
|
50
|
Cruz-Rivera M, Forbi JC, Yamasaki LHT, Vazquez-Chacon CA, Martinez-Guarneros A, Carpio-Pedroza JC, Escobar-Gutiérrez A, Ruiz-Tovar K, Fonseca-Coronado S, Vaughan G. Molecular epidemiology of viral diseases in the era of next generation sequencing. J Clin Virol 2013; 57:378-380. [PMID: 23726419 DOI: 10.1016/j.jcv.2013.04.021] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2013] [Revised: 04/22/2013] [Accepted: 04/24/2013] [Indexed: 12/17/2022]
Affiliation(s)
- Mayra Cruz-Rivera
- Facultad de Medicina, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | | | | | | | | | | | | | | | | | | |
Collapse
|