1
|
Tsai YY, Cheng D, Huang SW, Hung SJ, Wang YF, Lin YJ, Tsai HP, Chu JJH, Wang JR. The molecular epidemiology of a dengue virus outbreak in Taiwan: population wide versus infrapopulation mutation analysis. PLoS Negl Trop Dis 2024; 18:e0012268. [PMID: 38870242 DOI: 10.1371/journal.pntd.0012268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 06/03/2024] [Indexed: 06/15/2024] Open
Abstract
Dengue virus (DENV) causes approximately 390 million dengue infections worldwide every year. There were 22,777 reported DENV infections in Tainan, Taiwan in 2015. In this study, we sequenced the C-prM-E genes from 45 DENV 2015 strains, and phylogenetic analysis based on C-prM-E genes revealed that all strains were classified as DENV serotype 2 Cosmopolitan genotype. Sequence analysis comparing different DENV-2 genotypes and Cosmopolitan DENV-2 sequences prior to 2015 showed a clade replacement event in the DENV-2 Cosmopolitan genotype. Additionally, a major substitution C-A314G (K73R) was found in the capsid region which may have contributed to the clade replacement event. Reverse genetics virus rgC-A314G (K73R) showed slower replication in BHK-21 and C6/36 cells compared to wildtype virus, as well as a decrease in NS1 production in BHK-21-infected cells. After a series of passaging, the C-A314G (K73R) mutation reverted to wildtype and was thus considered to be unstable. Next generation sequencing (NGS) of three sera collected from a single DENV2-infected patient at 1-, 2-, and 5-days post-admission was employed to examine the genetic diversity over-time and mutations that may work in conjunction with C-A314G (K73R). Results showed that the number of haplotypes decreased with time in the DENV-infected patient. On the fifth day after admission, two new haplotypes emerged, and a single non-synonymous NS4A-L115I mutation was identified. Therefore, we have identified a persistent mutation C-A314G (K73R) in all of the DENV-2 isolates, and during the course of an infection, a single new non-synonymous mutation in the NS4A region appears in the virus population within a single host. The C-A314G (K73R) thus may have played a role in the DENV-2 2015 outbreak while the NS4A-L115I may be advantageous during DENV infection within the host.
Collapse
Affiliation(s)
- You-Yuan Tsai
- Department of Medical Laboratory Science and Biotechnology, College of Medicine, National Cheng Kung University, Tainan, Taiwan
- Department of Pathology, National Cheng Kung University Hospital, Tainan, Taiwan
| | - Dayna Cheng
- Institute of Basic Medical Sciences, College of Medicine, National Cheng Kung University, Tainan, Taiwan
| | - Sheng-Wen Huang
- National Mosquito-Borne Diseases Control Research Center, National Health Research Institutes, Tainan, Taiwan
| | - Su-Jhen Hung
- National Mosquito-Borne Diseases Control Research Center, National Health Research Institutes, Tainan, Taiwan
| | - Ya-Fang Wang
- National Mosquito-Borne Diseases Control Research Center, National Health Research Institutes, Tainan, Taiwan
| | - Yih-Jyh Lin
- Division of General Surgery, Department of Surgery, College of Medicine, National Cheng Kung University Hospital, Tainan, Taiwan
| | - Huey-Pin Tsai
- Department of Medical Laboratory Science and Biotechnology, College of Medicine, National Cheng Kung University, Tainan, Taiwan
- Department of Pathology, National Cheng Kung University Hospital, Tainan, Taiwan
| | - Justin Jang Hann Chu
- Infectious Diseases Translational Research Program and Department of Microbiology and Immunology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Jen-Ren Wang
- Department of Medical Laboratory Science and Biotechnology, College of Medicine, National Cheng Kung University, Tainan, Taiwan
- Department of Pathology, National Cheng Kung University Hospital, Tainan, Taiwan
- Institute of Basic Medical Sciences, College of Medicine, National Cheng Kung University, Tainan, Taiwan
- Center of Infectious Disease and Signaling Research, National Cheng Kung University, Tainan, Taiwan
| |
Collapse
|
2
|
Jörimann L, Tschumi J, Zeeb M, Leemann C, Schenkel CD, Neumann K, Chaudron SE, Zaheri M, Frischknecht P, Neuner-Jehle N, Kuster H, Braun DL, Grube C, Kouyos R, Metzner KJ, Günthard HF. Absence of Proviral Human Immunodeficiency Virus (HIV) Type 1 Evolution in Early-Treated Individuals With HIV Switching to Dolutegravir Monotherapy During 48 Weeks. J Infect Dis 2023; 228:907-918. [PMID: 37498738 PMCID: PMC10547464 DOI: 10.1093/infdis/jiad292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 07/07/2023] [Accepted: 07/25/2023] [Indexed: 07/29/2023] Open
Abstract
Human immunodeficiency virus type 1 (HIV-1) infection is treated with antiretroviral therapy (ART), usually consisting of 2-3 different drugs, referred to as combination ART (cART). Our recent randomized clinical trial comparing a switch to dolutegravir monotherapy with continuation of cART in early-treated individuals demonstrated sustained virological suppression over 48 weeks. Here, we characterize the longitudinal landscape of the HIV-1 reservoir in these participants, with particular attention to potential differences between treatment groups regarding evidence of evolution as a proxy for low-level replication. Near full-length HIV-1 proviral polymerase chain reaction and next-generation sequencing was applied to longitudinal peripheral blood mononuclear cell samples to assess proviral evolution and the potential emergence of drug resistance mutations (DRMs). Neither an increase in genetic distance nor diversity over time was detected in participants of both treatment groups. Single proviral analysis showed high proportions of defective proviruses and low DRM numbers. No evidence for evolution during dolutegravir monotherapy was found in these early-treated individuals.
Collapse
Affiliation(s)
- Lisa Jörimann
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich
- Institute of Medical Virology, University of Zurich, Switzerland
| | - Jasmin Tschumi
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich
- Institute of Medical Virology, University of Zurich, Switzerland
| | - Marius Zeeb
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich
- Institute of Medical Virology, University of Zurich, Switzerland
| | - Christine Leemann
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich
- Institute of Medical Virology, University of Zurich, Switzerland
| | - Corinne D Schenkel
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich
- Institute of Medical Virology, University of Zurich, Switzerland
| | - Kathrin Neumann
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich
- Institute of Medical Virology, University of Zurich, Switzerland
| | - Sandra E Chaudron
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich
- Institute of Medical Virology, University of Zurich, Switzerland
| | - Maryam Zaheri
- Institute of Medical Virology, University of Zurich, Switzerland
| | - Paul Frischknecht
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich
| | - Nadia Neuner-Jehle
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich
- Institute of Medical Virology, University of Zurich, Switzerland
| | - Herbert Kuster
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich
- Institute of Medical Virology, University of Zurich, Switzerland
| | - Dominique L Braun
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich
- Institute of Medical Virology, University of Zurich, Switzerland
| | - Christina Grube
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich
| | - Roger Kouyos
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich
- Institute of Medical Virology, University of Zurich, Switzerland
| | - Karin J Metzner
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich
- Institute of Medical Virology, University of Zurich, Switzerland
| | - Huldrych F Günthard
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich
- Institute of Medical Virology, University of Zurich, Switzerland
| | | |
Collapse
|
3
|
Jaha B, Schenkel CD, Jörimann L, Huber M, Zaheri M, Neumann K, Leemann C, Calmy A, Cavassini M, Kouyos RD, Günthard HF, Metzner KJ. Prevalence of HIV-1 drug resistance mutations in proviral DNA in the Swiss HIV Cohort Study, a retrospective study from 1995 to 2018. J Antimicrob Chemother 2023; 78:2323-2334. [PMID: 37545164 PMCID: PMC10477134 DOI: 10.1093/jac/dkad240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 07/19/2023] [Indexed: 08/08/2023] Open
Abstract
BACKGROUND Genotypic resistance testing (GRT) is routinely performed upon diagnosis of HIV-1 infection or during virological failure using plasma viral RNA. An alternative source for GRT could be cellular HIV-1 DNA. OBJECTIVES A substantial number of participants in the Swiss HIV Cohort Study (SHCS) never received GRT. We applied a method that enables access to the near full-length proviral HIV-1 genome without requiring detectable viraemia. METHODS Nine hundred and sixty-two PBMC specimens were received. Our two-step nested PCR protocol was applied to generate two overlapping long-range amplicons of the HIV-1 genome, sequenced by next-generation sequencing (NGS) and analysed by MinVar, a pipeline to detect drug resistance mutations (DRMs). RESULTS Six hundred and eighty-one (70.8%) of the samples were successfully amplified, sequenced and analysed by MinVar. Only partial information of the pol gene was contained in 82/681 (12%), probably due to naturally occurring deletions in the proviral sequence. All common HIV-1 subtypes were successfully sequenced. We detected at least one major DRM at high frequency (≥15%) in 331/599 (55.3%) individuals. Excluding APOBEC-signature (G-to-A mutation) DRMs, 145/599 (24.2%) individuals carried at least one major DRM. RT-inhibitor DRMs were most prevalent. The experienced time on ART was significantly longer in DRM carriers (P = 0.001) independent of inclusion or exclusion of APOBEC-signature DRMs. CONCLUSIONS We successfully applied a reliable and efficient method to analyse near full-length HIV-1 proviral DNA and investigated DRMs in individuals with undetectable or low viraemia. Additionally, our data underscore the need for new computational tools to exclude APOBEC-related hypermutated NGS sequence reads for reporting DRMs.
Collapse
Affiliation(s)
- Bashkim Jaha
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, 8091 Zurich, Switzerland
| | - Corinne D Schenkel
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, 8091 Zurich, Switzerland
| | - Lisa Jörimann
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, 8091 Zurich, Switzerland
- Institute of Medical Virology, University of Zurich, 8057 Zurich, Switzerland
| | - Michael Huber
- Institute of Medical Virology, University of Zurich, 8057 Zurich, Switzerland
| | - Maryam Zaheri
- Institute of Medical Virology, University of Zurich, 8057 Zurich, Switzerland
| | - Kathrin Neumann
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, 8091 Zurich, Switzerland
- Institute of Medical Virology, University of Zurich, 8057 Zurich, Switzerland
| | - Christine Leemann
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, 8091 Zurich, Switzerland
- Institute of Medical Virology, University of Zurich, 8057 Zurich, Switzerland
| | - Alexandra Calmy
- Division of Infectious Diseases, University Hospital Geneva, University of Geneva, Geneva, Switzerland
| | - Matthias Cavassini
- Division of Infectious Diseases, University Hospital Lausanne, University of Lausanne, Lausanne, Switzerland
| | - Roger D Kouyos
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, 8091 Zurich, Switzerland
- Institute of Medical Virology, University of Zurich, 8057 Zurich, Switzerland
| | - Huldrych F Günthard
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, 8091 Zurich, Switzerland
- Institute of Medical Virology, University of Zurich, 8057 Zurich, Switzerland
| | - Karin J Metzner
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, 8091 Zurich, Switzerland
- Institute of Medical Virology, University of Zurich, 8057 Zurich, Switzerland
| |
Collapse
|
4
|
Balakrishna S, Loosli T, Zaheri M, Frischknecht P, Huber M, Kusejko K, Yerly S, Leuzinger K, Perreau M, Ramette A, Wymant C, Fraser C, Kellam P, Gall A, Hirsch HH, Stoeckle M, Rauch A, Cavassini M, Bernasconi E, Notter J, Calmy A, Günthard HF, Metzner KJ, Kouyos RD. Frequency matters: comparison of drug resistance mutation detection by Sanger and next-generation sequencing in HIV-1. J Antimicrob Chemother 2023; 78:656-664. [PMID: 36738248 DOI: 10.1093/jac/dkac430] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 11/18/2022] [Indexed: 02/05/2023] Open
Abstract
BACKGROUND Next-generation sequencing (NGS) is gradually replacing Sanger sequencing (SS) as the primary method for HIV genotypic resistance testing. However, there are limited systematic data on comparability of these methods in a clinical setting for the presence of low-abundance drug resistance mutations (DRMs) and their dependency on the variant-calling thresholds. METHODS To compare the HIV-DRMs detected by SS and NGS, we included participants enrolled in the Swiss HIV Cohort Study (SHCS) with SS and NGS sequences available with sample collection dates ≤7 days apart. We tested for the presence of HIV-DRMs and compared the agreement between SS and NGS at different variant-calling thresholds. RESULTS We included 594 pairs of SS and NGS from 527 SHCS participants. Males accounted for 80.5% of the participants, 76.3% were ART naive at sample collection and 78.1% of the sequences were subtype B. Overall, we observed a good agreement (Cohen's kappa >0.80) for HIV-DRMs for variant-calling thresholds ≥5%. We observed an increase in low-abundance HIV-DRMs detected at lower thresholds [28/417 (6.7%) at 10%-25% to 293/812 (36.1%) at 1%-2% threshold]. However, such low-abundance HIV-DRMs were overrepresented in ART-naive participants and were in most cases not detected in previously sampled sequences suggesting high sequencing error for thresholds <3%. CONCLUSIONS We found high concordance between SS and NGS but also a substantial number of low-abundance HIV-DRMs detected only by NGS at lower variant-calling thresholds. Our findings suggest that a substantial fraction of the low-abundance HIV-DRMs detected at thresholds <3% may represent sequencing errors and hence should not be overinterpreted in clinical practice.
Collapse
Affiliation(s)
- Suraj Balakrishna
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, Zurich, Switzerland.,Institute of Medical Virology, University of Zurich, Zurich, Switzerland
| | - Tom Loosli
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, Zurich, Switzerland.,Institute of Medical Virology, University of Zurich, Zurich, Switzerland
| | - Maryam Zaheri
- Institute of Medical Virology, University of Zurich, Zurich, Switzerland.,Swiss National Center for Retroviruses, University of Zurich, Zurich, Switzerland
| | - Paul Frischknecht
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, Zurich, Switzerland
| | - Michael Huber
- Institute of Medical Virology, University of Zurich, Zurich, Switzerland.,Swiss National Center for Retroviruses, University of Zurich, Zurich, Switzerland
| | - Katharina Kusejko
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, Zurich, Switzerland.,Institute of Medical Virology, University of Zurich, Zurich, Switzerland
| | - Sabine Yerly
- Laboratory of Virology, University Hospital Geneva, University of Geneva, Geneva, Switzerland
| | - Karoline Leuzinger
- Clinical Virology Division, Laboratory Medicine, University Hospital Basel, Basel, Switzerland
| | - Matthieu Perreau
- Division of Immunology and Allergy, University Hospital Lausanne, University of Lausanne, Lausanne, Switzerland
| | - Alban Ramette
- Institute for Infectious Diseases, University of Bern, Bern, Switzerland
| | - Chris Wymant
- Nuffield Department of Medicine, Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
| | - Christophe Fraser
- Nuffield Department of Medicine, Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK.,Nuffield Department of Medicine, Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Paul Kellam
- Department of Infectious Diseases, Faculty of Medicine, Imperial College London, London, UK
| | - Astrid Gall
- Excellence in Life Sciences (EMBO), Heidelberg, Germany
| | - Hans H Hirsch
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Marcel Stoeckle
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Andri Rauch
- Department of Infectious Diseases, Bern University Hospital, University of Bern, Bern, Switzerland
| | - Matthias Cavassini
- Division of Infectious Diseases, Lausanne University Hospital, University of Lausanne, Lausanne, Switzerland
| | - Enos Bernasconi
- Division of Infectious Diseases, Regional Hospital Lugano, Lugano, Switzerland
| | - Julia Notter
- Division of Infectious Diseases and Hospital Epidemiology, Cantonal Hospital St Gallen, St Gallen, Switzerland
| | - Alexandra Calmy
- Division of Infectious Diseases, University Hospital Geneva, University of Geneva, Geneva, Switzerland
| | - Huldrych F Günthard
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, Zurich, Switzerland.,Institute of Medical Virology, University of Zurich, Zurich, Switzerland
| | - Karin J Metzner
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, Zurich, Switzerland.,Institute of Medical Virology, University of Zurich, Zurich, Switzerland
| | - Roger D Kouyos
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, Zurich, Switzerland.,Institute of Medical Virology, University of Zurich, Zurich, Switzerland
| |
Collapse
|
5
|
Freire B, Ladra S, Parama JR, Salmela L. ViQUF: De Novo Viral Quasispecies Reconstruction Using Unitig-Based Flow Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1550-1562. [PMID: 35853050 DOI: 10.1109/tcbb.2022.3190282] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
During viral infection, intrahost mutation and recombination can lead to significant evolution, resulting in a population of viruses that harbor multiple haplotypes. The task of reconstructing these haplotypes from short-read sequencing data is called viral quasispecies assembly, and it can be categorized as a multiassembly problem. We consider the de novo version of the problem, where no reference is available. We present ViQUF, a de novo viral quasispecies assembler that addresses haplotype assembly and quantification. ViQUF obtains a first draft of the assembly graph from a de Bruijn graph. Then, solving a min-cost flow over a flow network built for each pair of adjacent vertices based on their paired-end information creates an approximate paired assembly graph with suggested frequency values as edge labels, which is the first frequency estimation. Then, original haplotypes are obtained through a greedy path reconstruction guided by a min-cost flow solution in the approximate paired assembly graph. ViQUF outputs the contigs with their frequency estimations. Results on real and simulated data show that ViQUF is at least four times faster using at most half of the memory than previous methods, while maintaining, and in some cases outperforming, the high quality of assembly and frequency estimation of overlap graph-based methodologies, which are known to be more accurate but slower than the de Bruijn graph-based approaches.
Collapse
|
6
|
Yu R, Cai D, Sun Y. AccuVIR: an ACCUrate VIRal genome assembly tool for third-generation sequencing data. Bioinformatics 2023; 39:6969105. [PMID: 36610711 PMCID: PMC9825286 DOI: 10.1093/bioinformatics/btac827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 11/24/2022] [Accepted: 12/24/2022] [Indexed: 12/27/2022] Open
Abstract
MOTIVATION RNA viruses tend to mutate constantly. While many of the variants are neutral, some can lead to higher transmissibility or virulence. Accurate assembly of complete viral genomes enables the identification of underlying variants, which are essential for studying virus evolution and elucidating the relationship between genotypes and virus properties. Recently, third-generation sequencing platforms such as Nanopore sequencers have been used for real-time virus sequencing for Ebola, Zika, coronavirus disease 2019, etc. However, their high per-base error rate prevents the accurate reconstruction of the viral genome. RESULTS In this work, we introduce a new tool, AccuVIR, for viral genome assembly and polishing using error-prone long reads. It can better distinguish sequencing errors from true variants based on the key observation that sequencing errors can disrupt the gene structures of viruses, which usually have a high density of coding regions. Our experimental results on both simulated and real third-generation sequencing data demonstrated its superior performance on generating more accurate viral genomes than generic assembly or polish tools. AVAILABILITY AND IMPLEMENTATION The source code and the documentation of AccuVIR are available at https://github.com/rainyrubyzhou/AccuVIR. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Runzhou Yu
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, Hong Kong SAR 000000, China
| | - Dehan Cai
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, Hong Kong SAR 000000, China
| | - Yanni Sun
- To whom correspondence should be addressed.
| |
Collapse
|
7
|
Martin S, Ayling M, Patrono L, Caccamo M, Murcia P, Leggett RM. Capturing variation in metagenomic assembly graphs with MetaCortex. Bioinformatics 2023; 39:6986127. [PMID: 36722204 PMCID: PMC9889960 DOI: 10.1093/bioinformatics/btad020] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 11/10/2022] [Accepted: 01/11/2023] [Indexed: 01/13/2023] Open
Abstract
MOTIVATION The assembly of contiguous sequence from metagenomic samples presents a particular challenge, due to the presence of multiple species, often closely related, at varying levels of abundance. Capturing diversity within species, for example, viral haplotypes, or bacterial strain-level diversity, is even more challenging. RESULTS We present MetaCortex, a metagenome assembler that captures intra-species diversity by searching for signatures of local variation along assembled sequences in the underlying assembly graph and outputting these sequences in sequence graph format. We show that MetaCortex produces accurate assemblies with higher genome coverage and contiguity than other popular metagenomic assemblers on mock viral communities with high levels of strain-level diversity and on simulated communities containing simulated strains. AVAILABILITY AND IMPLEMENTATION Source code is freely available to download from https://github.com/SR-Martin/metacortex, is implemented in C and supported on MacOS and Linux. The version used for the results presented in this article is available at doi.org/10.5281/zenodo.7273627. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | | | | | - Pablo Murcia
- MRC-University of Glasgow Centre for Virus Research, Glasgow G61 1QH, UK
| | | |
Collapse
|
8
|
VeChat: correcting errors in long reads using variation graphs. Nat Commun 2022; 13:6657. [PMID: 36333324 PMCID: PMC9636371 DOI: 10.1038/s41467-022-34381-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Accepted: 10/24/2022] [Indexed: 11/06/2022] Open
Abstract
Error correction is the canonical first step in long-read sequencing data analysis. Current self-correction methods, however, are affected by consensus sequence induced biases that mask true variants in haplotypes of lower frequency showing in mixed samples. Unlike consensus sequence templates, graph-based reference systems are not affected by such biases, so do not mistakenly mask true variants as errors. We present VeChat, as an approach to implement this idea: VeChat is based on variation graphs, as a popular type of data structure for pangenome reference systems. Extensive benchmarking experiments demonstrate that long reads corrected by VeChat contain 4 to 15 (Pacific Biosciences) and 1 to 10 times (Oxford Nanopore Technologies) less errors than when being corrected by state of the art approaches. Further, using VeChat prior to long-read assembly significantly improves the haplotype awareness of the assemblies. VeChat is an easy-to-use open-source tool and publicly available at https://github.com/HaploKit/vechat .
Collapse
|
9
|
Cai D, Shang J, Sun Y. HaploDMF: viral haplotype reconstruction from long reads via deep matrix factorization. Bioinformatics 2022; 38:5360-5367. [PMID: 36308467 PMCID: PMC9750122 DOI: 10.1093/bioinformatics/btac708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 10/06/2022] [Accepted: 10/25/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Lacking strict proofreading mechanisms, many RNA viruses can generate progeny with slightly changed genomes. Being able to characterize highly similar genomes (i.e. haplotypes) in one virus population helps study the viruses' evolution and their interactions with the host/other microbes. High-throughput sequencing data has become the major source for characterizing viral populations. However, the inherent limitation on read length by next-generation sequencing makes complete haplotype reconstruction difficult. RESULTS In this work, we present a new tool named HaploDMF that can construct complete haplotypes using third-generation sequencing (TGS) data. HaploDMF utilizes a deep matrix factorization model with an adapted loss function to learn latent features from aligned reads automatically. The latent features are then used to cluster reads of the same haplotype. Unlike existing tools whose performance can be affected by the overlap size between reads, HaploDMF is able to achieve highly robust performance on data with different coverage, haplotype number and error rates. In particular, it can generate more complete haplotypes even when the sequencing coverage drops in the middle. We benchmark HaploDMF against the state-of-the-art tools on simulated and real sequencing TGS data on different viruses. The results show that HaploDMF competes favorably against all others. AVAILABILITY AND IMPLEMENTATION The source code and the documentation of HaploDMF are available at https://github.com/dhcai21/HaploDMF. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dehan Cai
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, Hong Kong SAR, China
| | - Jiayu Shang
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, Hong Kong SAR, China
| | - Yanni Sun
- To whom correspondence should be addressed.
| |
Collapse
|
10
|
Kumar S, Kumar GS, Maitra SS, Malý P, Bharadwaj S, Sharma P, Dwivedi VD. Viral informatics: bioinformatics-based solution for managing viral infections. Brief Bioinform 2022; 23:6659740. [PMID: 35947964 DOI: 10.1093/bib/bbac326] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2022] [Revised: 06/26/2022] [Accepted: 07/18/2022] [Indexed: 11/13/2022] Open
Abstract
Several new viral infections have emerged in the human population and establishing as global pandemics. With advancements in translation research, the scientific community has developed potential therapeutics to eradicate or control certain viral infections, such as smallpox and polio, responsible for billions of disabilities and deaths in the past. Unfortunately, some viral infections, such as dengue virus (DENV) and human immunodeficiency virus-1 (HIV-1), are still prevailing due to a lack of specific therapeutics, while new pathogenic viral strains or variants are emerging because of high genetic recombination or cross-species transmission. Consequently, to combat the emerging viral infections, bioinformatics-based potential strategies have been developed for viral characterization and developing new effective therapeutics for their eradication or management. This review attempts to provide a single platform for the available wide range of bioinformatics-based approaches, including bioinformatics methods for the identification and management of emerging or evolved viral strains, genome analysis concerning the pathogenicity and epidemiological analysis, computational methods for designing the viral therapeutics, and consolidated information in the form of databases against the known pathogenic viruses. This enriched review of the generally applicable viral informatics approaches aims to provide an overview of available resources capable of carrying out the desired task and may be utilized to expand additional strategies to improve the quality of translation viral informatics research.
Collapse
Affiliation(s)
- Sanjay Kumar
- School of Biotechnology, Jawaharlal Nehru University, New Delhi, India.,Center for Bioinformatics, Computational and Systems Biology, Pathfinder Research and Training Foundation, Greater Noida, India
| | - Geethu S Kumar
- Department of Life Science, School of Basic Science and Research, Sharda University, Greater Noida, Uttar Pradesh, India.,Center for Bioinformatics, Computational and Systems Biology, Pathfinder Research and Training Foundation, Greater Noida, India
| | | | - Petr Malý
- Laboratory of Ligand Engineering, Institute of Biotechnology of the Czech Academy of Sciences v.v.i., BIOCEV Research Center, Vestec, Czech Republic
| | - Shiv Bharadwaj
- Laboratory of Ligand Engineering, Institute of Biotechnology of the Czech Academy of Sciences v.v.i., BIOCEV Research Center, Vestec, Czech Republic
| | - Pradeep Sharma
- Department of Biophysics, All India Institute of Medical Sciences, New Delhi, India
| | - Vivek Dhar Dwivedi
- Center for Bioinformatics, Computational and Systems Biology, Pathfinder Research and Training Foundation, Greater Noida, India.,Institute of Advanced Materials, IAAM, 59053 Ulrika, Sweden
| |
Collapse
|
11
|
Hufsky F, Abecasis A, Agudelo-Romero P, Bletsa M, Brown K, Claus C, Deinhardt-Emmer S, Deng L, Friedel CC, Gismondi MI, Kostaki EG, Kühnert D, Kulkarni-Kale U, Metzner KJ, Meyer IM, Miozzi L, Nishimura L, Paraskevopoulou S, Pérez-Cataluña A, Rahlff J, Thomson E, Tumescheit C, van der Hoek L, Van Espen L, Vandamme AM, Zaheri M, Zuckerman N, Marz M. Women in the European Virus Bioinformatics Center. Viruses 2022; 14:1522. [PMID: 35891501 PMCID: PMC9319252 DOI: 10.3390/v14071522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 07/05/2022] [Accepted: 07/07/2022] [Indexed: 02/01/2023] Open
Abstract
Viruses are the cause of a considerable burden to human, animal and plant health, while on the other hand playing an important role in regulating entire ecosystems. The power of new sequencing technologies combined with new tools for processing "Big Data" offers unprecedented opportunities to answer fundamental questions in virology. Virologists have an urgent need for virus-specific bioinformatics tools. These developments have led to the formation of the European Virus Bioinformatics Center, a network of experts in virology and bioinformatics who are joining forces to enable extensive exchange and collaboration between these research areas. The EVBC strives to provide talented researchers with a supportive environment free of gender bias, but the gender gap in science, especially in math-intensive fields such as computer science, persists. To bring more talented women into research and keep them there, we need to highlight role models to spark their interest, and we need to ensure that female scientists are not kept at lower levels but are given the opportunity to lead the field. Here we showcase the work of the EVBC and highlight the achievements of some outstanding women experts in virology and viral bioinformatics.
Collapse
Affiliation(s)
- Franziska Hufsky
- European Virus Bioinformatics Center, 07743 Jena, Germany; (A.A.); (P.A.-R.); (M.B.); (K.B.); (C.C.); (S.D.-E.); (L.D.); (C.C.F.); (M.I.G.); (E.G.K.); (D.K.); (U.K.-K.); (K.J.M.); (I.M.M.); (L.M.); (L.N.); (S.P.); (A.P.-C.); (J.R.); (E.T.); (C.T.); (L.v.d.H.); (L.V.E.); (A.-M.V.); (M.Z.); (N.Z.)
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, 07743 Jena, Germany
| | - Ana Abecasis
- European Virus Bioinformatics Center, 07743 Jena, Germany; (A.A.); (P.A.-R.); (M.B.); (K.B.); (C.C.); (S.D.-E.); (L.D.); (C.C.F.); (M.I.G.); (E.G.K.); (D.K.); (U.K.-K.); (K.J.M.); (I.M.M.); (L.M.); (L.N.); (S.P.); (A.P.-C.); (J.R.); (E.T.); (C.T.); (L.v.d.H.); (L.V.E.); (A.-M.V.); (M.Z.); (N.Z.)
- Global Health and Tropical Medicine, Institute of Hygiene and Tropical Medicine, New University of Lisbon, 1349-008 Lisbon, Portugal
| | - Patricia Agudelo-Romero
- European Virus Bioinformatics Center, 07743 Jena, Germany; (A.A.); (P.A.-R.); (M.B.); (K.B.); (C.C.); (S.D.-E.); (L.D.); (C.C.F.); (M.I.G.); (E.G.K.); (D.K.); (U.K.-K.); (K.J.M.); (I.M.M.); (L.M.); (L.N.); (S.P.); (A.P.-C.); (J.R.); (E.T.); (C.T.); (L.v.d.H.); (L.V.E.); (A.-M.V.); (M.Z.); (N.Z.)
- Wal-Yan Respiratory Research Centre, Telethon Kids Institute, University of Western Australia, Nedlands, WA 6009, Australia
| | - Magda Bletsa
- European Virus Bioinformatics Center, 07743 Jena, Germany; (A.A.); (P.A.-R.); (M.B.); (K.B.); (C.C.); (S.D.-E.); (L.D.); (C.C.F.); (M.I.G.); (E.G.K.); (D.K.); (U.K.-K.); (K.J.M.); (I.M.M.); (L.M.); (L.N.); (S.P.); (A.P.-C.); (J.R.); (E.T.); (C.T.); (L.v.d.H.); (L.V.E.); (A.-M.V.); (M.Z.); (N.Z.)
- Department of Hygiene, Epidemiology and Medical Statistics, Medical School, National and Kapodistrian University of Athens, 115 27 Athens, Greece
- Department of Microbiology, Immunology and Transplantation, Rega Institute, Katholieke Universiteit Leuven, B-3000 Leuven, Belgium
| | - Katherine Brown
- European Virus Bioinformatics Center, 07743 Jena, Germany; (A.A.); (P.A.-R.); (M.B.); (K.B.); (C.C.); (S.D.-E.); (L.D.); (C.C.F.); (M.I.G.); (E.G.K.); (D.K.); (U.K.-K.); (K.J.M.); (I.M.M.); (L.M.); (L.N.); (S.P.); (A.P.-C.); (J.R.); (E.T.); (C.T.); (L.v.d.H.); (L.V.E.); (A.-M.V.); (M.Z.); (N.Z.)
- Division of Virology, Department of Pathology, University of Cambridge, Cambridge CB2 1TN, UK
| | - Claudia Claus
- European Virus Bioinformatics Center, 07743 Jena, Germany; (A.A.); (P.A.-R.); (M.B.); (K.B.); (C.C.); (S.D.-E.); (L.D.); (C.C.F.); (M.I.G.); (E.G.K.); (D.K.); (U.K.-K.); (K.J.M.); (I.M.M.); (L.M.); (L.N.); (S.P.); (A.P.-C.); (J.R.); (E.T.); (C.T.); (L.v.d.H.); (L.V.E.); (A.-M.V.); (M.Z.); (N.Z.)
- Institute of Medical Microbiology and Virology, Medical Faculty, Leipzig University, 04103 Leipzig, Germany
| | - Stefanie Deinhardt-Emmer
- European Virus Bioinformatics Center, 07743 Jena, Germany; (A.A.); (P.A.-R.); (M.B.); (K.B.); (C.C.); (S.D.-E.); (L.D.); (C.C.F.); (M.I.G.); (E.G.K.); (D.K.); (U.K.-K.); (K.J.M.); (I.M.M.); (L.M.); (L.N.); (S.P.); (A.P.-C.); (J.R.); (E.T.); (C.T.); (L.v.d.H.); (L.V.E.); (A.-M.V.); (M.Z.); (N.Z.)
- Institute of Medical Microbiology, Jena University Hospital, 07747 Jena, Germany
| | - Li Deng
- European Virus Bioinformatics Center, 07743 Jena, Germany; (A.A.); (P.A.-R.); (M.B.); (K.B.); (C.C.); (S.D.-E.); (L.D.); (C.C.F.); (M.I.G.); (E.G.K.); (D.K.); (U.K.-K.); (K.J.M.); (I.M.M.); (L.M.); (L.N.); (S.P.); (A.P.-C.); (J.R.); (E.T.); (C.T.); (L.v.d.H.); (L.V.E.); (A.-M.V.); (M.Z.); (N.Z.)
- Institute of Virology, Helmholtz Centre Munich-German Research Center for Environmental Health, 85764 Neuherberg, Germany
- Microbial Disease Prevention, School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| | - Caroline C. Friedel
- European Virus Bioinformatics Center, 07743 Jena, Germany; (A.A.); (P.A.-R.); (M.B.); (K.B.); (C.C.); (S.D.-E.); (L.D.); (C.C.F.); (M.I.G.); (E.G.K.); (D.K.); (U.K.-K.); (K.J.M.); (I.M.M.); (L.M.); (L.N.); (S.P.); (A.P.-C.); (J.R.); (E.T.); (C.T.); (L.v.d.H.); (L.V.E.); (A.-M.V.); (M.Z.); (N.Z.)
- Institute of Informatics, Ludwig-Maximilians-Universität München, 80333 Munich, Germany
| | - María Inés Gismondi
- European Virus Bioinformatics Center, 07743 Jena, Germany; (A.A.); (P.A.-R.); (M.B.); (K.B.); (C.C.); (S.D.-E.); (L.D.); (C.C.F.); (M.I.G.); (E.G.K.); (D.K.); (U.K.-K.); (K.J.M.); (I.M.M.); (L.M.); (L.N.); (S.P.); (A.P.-C.); (J.R.); (E.T.); (C.T.); (L.v.d.H.); (L.V.E.); (A.-M.V.); (M.Z.); (N.Z.)
- Institute of Agrobiotechnology and Molecular Biology (IABIMO), National Institute for Agriculture Technology (INTA), National Research Council (CONICET), Hurlingham B1686IGC, Argentina
- Department of Basic Sciences, National University of Luján, Luján B6702MZP, Argentina
| | - Evangelia Georgia Kostaki
- European Virus Bioinformatics Center, 07743 Jena, Germany; (A.A.); (P.A.-R.); (M.B.); (K.B.); (C.C.); (S.D.-E.); (L.D.); (C.C.F.); (M.I.G.); (E.G.K.); (D.K.); (U.K.-K.); (K.J.M.); (I.M.M.); (L.M.); (L.N.); (S.P.); (A.P.-C.); (J.R.); (E.T.); (C.T.); (L.v.d.H.); (L.V.E.); (A.-M.V.); (M.Z.); (N.Z.)
- Department of Hygiene, Epidemiology and Medical Statistics, Medical School, National and Kapodistrian University of Athens, 115 27 Athens, Greece
| | - Denise Kühnert
- European Virus Bioinformatics Center, 07743 Jena, Germany; (A.A.); (P.A.-R.); (M.B.); (K.B.); (C.C.); (S.D.-E.); (L.D.); (C.C.F.); (M.I.G.); (E.G.K.); (D.K.); (U.K.-K.); (K.J.M.); (I.M.M.); (L.M.); (L.N.); (S.P.); (A.P.-C.); (J.R.); (E.T.); (C.T.); (L.v.d.H.); (L.V.E.); (A.-M.V.); (M.Z.); (N.Z.)
- Transmission, Infection, Diversification and Evolution Group, Max Planck Institute for the Science of Human History, 07745 Jena, Germany
| | - Urmila Kulkarni-Kale
- European Virus Bioinformatics Center, 07743 Jena, Germany; (A.A.); (P.A.-R.); (M.B.); (K.B.); (C.C.); (S.D.-E.); (L.D.); (C.C.F.); (M.I.G.); (E.G.K.); (D.K.); (U.K.-K.); (K.J.M.); (I.M.M.); (L.M.); (L.N.); (S.P.); (A.P.-C.); (J.R.); (E.T.); (C.T.); (L.v.d.H.); (L.V.E.); (A.-M.V.); (M.Z.); (N.Z.)
- Bioinformatics Centre, Savitribai Phule Pune University, Pune 411007, India
| | - Karin J. Metzner
- European Virus Bioinformatics Center, 07743 Jena, Germany; (A.A.); (P.A.-R.); (M.B.); (K.B.); (C.C.); (S.D.-E.); (L.D.); (C.C.F.); (M.I.G.); (E.G.K.); (D.K.); (U.K.-K.); (K.J.M.); (I.M.M.); (L.M.); (L.N.); (S.P.); (A.P.-C.); (J.R.); (E.T.); (C.T.); (L.v.d.H.); (L.V.E.); (A.-M.V.); (M.Z.); (N.Z.)
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, 8091 Zurich, Switzerland
- Institute of Medical Virology, University of Zurich, 8057 Zurich, Switzerland
| | - Irmtraud M. Meyer
- European Virus Bioinformatics Center, 07743 Jena, Germany; (A.A.); (P.A.-R.); (M.B.); (K.B.); (C.C.); (S.D.-E.); (L.D.); (C.C.F.); (M.I.G.); (E.G.K.); (D.K.); (U.K.-K.); (K.J.M.); (I.M.M.); (L.M.); (L.N.); (S.P.); (A.P.-C.); (J.R.); (E.T.); (C.T.); (L.v.d.H.); (L.V.E.); (A.-M.V.); (M.Z.); (N.Z.)
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 10115 Berlin, Germany
- Institute of Chemistry and Biochemistry, Department of Biology, Chemistry and Pharmacy, Freie Universität Berlin, 14195 Berlin, Germany
- Faculty of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany
| | - Laura Miozzi
- European Virus Bioinformatics Center, 07743 Jena, Germany; (A.A.); (P.A.-R.); (M.B.); (K.B.); (C.C.); (S.D.-E.); (L.D.); (C.C.F.); (M.I.G.); (E.G.K.); (D.K.); (U.K.-K.); (K.J.M.); (I.M.M.); (L.M.); (L.N.); (S.P.); (A.P.-C.); (J.R.); (E.T.); (C.T.); (L.v.d.H.); (L.V.E.); (A.-M.V.); (M.Z.); (N.Z.)
- Institute for Sustainable Plant Protection, National Research Council of Italy, 10135 Torino, Italy
| | - Luca Nishimura
- European Virus Bioinformatics Center, 07743 Jena, Germany; (A.A.); (P.A.-R.); (M.B.); (K.B.); (C.C.); (S.D.-E.); (L.D.); (C.C.F.); (M.I.G.); (E.G.K.); (D.K.); (U.K.-K.); (K.J.M.); (I.M.M.); (L.M.); (L.N.); (S.P.); (A.P.-C.); (J.R.); (E.T.); (C.T.); (L.v.d.H.); (L.V.E.); (A.-M.V.); (M.Z.); (N.Z.)
- Department of Genetics, School of Life Science, The Graduate University for Advanced Studies (SOKENDAI), Mishima 411-8540, Japan
- Human Genetics Laboratory, National Institute of Genetics, Mishima 411-8540, Japan
| | - Sofia Paraskevopoulou
- European Virus Bioinformatics Center, 07743 Jena, Germany; (A.A.); (P.A.-R.); (M.B.); (K.B.); (C.C.); (S.D.-E.); (L.D.); (C.C.F.); (M.I.G.); (E.G.K.); (D.K.); (U.K.-K.); (K.J.M.); (I.M.M.); (L.M.); (L.N.); (S.P.); (A.P.-C.); (J.R.); (E.T.); (C.T.); (L.v.d.H.); (L.V.E.); (A.-M.V.); (M.Z.); (N.Z.)
- Methods Development and Research Infrastructure, Bioinformatics and Systems Biology, Robert Koch Institute, 13353 Berlin, Germany
| | - Alba Pérez-Cataluña
- European Virus Bioinformatics Center, 07743 Jena, Germany; (A.A.); (P.A.-R.); (M.B.); (K.B.); (C.C.); (S.D.-E.); (L.D.); (C.C.F.); (M.I.G.); (E.G.K.); (D.K.); (U.K.-K.); (K.J.M.); (I.M.M.); (L.M.); (L.N.); (S.P.); (A.P.-C.); (J.R.); (E.T.); (C.T.); (L.v.d.H.); (L.V.E.); (A.-M.V.); (M.Z.); (N.Z.)
- VISAFELab, Department of Preservation and Food Safety Technologies, Institute of Agrochemistry and Food Technology, IATA-CSIC, 46980 Valencia, Spain
| | - Janina Rahlff
- European Virus Bioinformatics Center, 07743 Jena, Germany; (A.A.); (P.A.-R.); (M.B.); (K.B.); (C.C.); (S.D.-E.); (L.D.); (C.C.F.); (M.I.G.); (E.G.K.); (D.K.); (U.K.-K.); (K.J.M.); (I.M.M.); (L.M.); (L.N.); (S.P.); (A.P.-C.); (J.R.); (E.T.); (C.T.); (L.v.d.H.); (L.V.E.); (A.-M.V.); (M.Z.); (N.Z.)
- Centre for Ecology and Evolution in Microbial Model Systems (EEMiS), Department of Biology and Environmental Science, Linneaus University, 391 82 Kalmar, Sweden
| | - Emma Thomson
- European Virus Bioinformatics Center, 07743 Jena, Germany; (A.A.); (P.A.-R.); (M.B.); (K.B.); (C.C.); (S.D.-E.); (L.D.); (C.C.F.); (M.I.G.); (E.G.K.); (D.K.); (U.K.-K.); (K.J.M.); (I.M.M.); (L.M.); (L.N.); (S.P.); (A.P.-C.); (J.R.); (E.T.); (C.T.); (L.v.d.H.); (L.V.E.); (A.-M.V.); (M.Z.); (N.Z.)
- Queen Elizabeth University Hospital, NHS Greater Glasgow and Clyde, Glasgow G51 4TF, UK
- MRC-University of Glasgow Centre for Virus Research, Glasgow G61 1QH, UK
| | - Charlotte Tumescheit
- European Virus Bioinformatics Center, 07743 Jena, Germany; (A.A.); (P.A.-R.); (M.B.); (K.B.); (C.C.); (S.D.-E.); (L.D.); (C.C.F.); (M.I.G.); (E.G.K.); (D.K.); (U.K.-K.); (K.J.M.); (I.M.M.); (L.M.); (L.N.); (S.P.); (A.P.-C.); (J.R.); (E.T.); (C.T.); (L.v.d.H.); (L.V.E.); (A.-M.V.); (M.Z.); (N.Z.)
- School of Biological Sciences, Seoul National University, Seoul 08826, Korea
| | - Lia van der Hoek
- European Virus Bioinformatics Center, 07743 Jena, Germany; (A.A.); (P.A.-R.); (M.B.); (K.B.); (C.C.); (S.D.-E.); (L.D.); (C.C.F.); (M.I.G.); (E.G.K.); (D.K.); (U.K.-K.); (K.J.M.); (I.M.M.); (L.M.); (L.N.); (S.P.); (A.P.-C.); (J.R.); (E.T.); (C.T.); (L.v.d.H.); (L.V.E.); (A.-M.V.); (M.Z.); (N.Z.)
- Laboratory of Experimental Virology, Department of Medical Microbiology and Infection Prevention, Amsterdam UMC, University of Amsterdam, 1012 WX Amsterdam, The Netherlands
- Amsterdam Institute for Infection and Immunity, 1100 DD Amsterdam, The Netherlands
| | - Lore Van Espen
- European Virus Bioinformatics Center, 07743 Jena, Germany; (A.A.); (P.A.-R.); (M.B.); (K.B.); (C.C.); (S.D.-E.); (L.D.); (C.C.F.); (M.I.G.); (E.G.K.); (D.K.); (U.K.-K.); (K.J.M.); (I.M.M.); (L.M.); (L.N.); (S.P.); (A.P.-C.); (J.R.); (E.T.); (C.T.); (L.v.d.H.); (L.V.E.); (A.-M.V.); (M.Z.); (N.Z.)
- Department of Microbiology, Immunology and Transplantation, Rega Institute, Katholieke Universiteit Leuven, B-3000 Leuven, Belgium
| | - Anne-Mieke Vandamme
- European Virus Bioinformatics Center, 07743 Jena, Germany; (A.A.); (P.A.-R.); (M.B.); (K.B.); (C.C.); (S.D.-E.); (L.D.); (C.C.F.); (M.I.G.); (E.G.K.); (D.K.); (U.K.-K.); (K.J.M.); (I.M.M.); (L.M.); (L.N.); (S.P.); (A.P.-C.); (J.R.); (E.T.); (C.T.); (L.v.d.H.); (L.V.E.); (A.-M.V.); (M.Z.); (N.Z.)
- Department of Microbiology, Immunology and Transplantation, Rega Institute, Katholieke Universiteit Leuven, B-3000 Leuven, Belgium
- Global Health and Tropical Medicine, Instituto de Higiene e Medicina Tropical, Universidade Nova de Lisboa, 1349-008 Lisbon, Portugal
- Institute for the Future, Katholieke Universiteit Leuven, B-3000 Leuven, Belgium
| | - Maryam Zaheri
- European Virus Bioinformatics Center, 07743 Jena, Germany; (A.A.); (P.A.-R.); (M.B.); (K.B.); (C.C.); (S.D.-E.); (L.D.); (C.C.F.); (M.I.G.); (E.G.K.); (D.K.); (U.K.-K.); (K.J.M.); (I.M.M.); (L.M.); (L.N.); (S.P.); (A.P.-C.); (J.R.); (E.T.); (C.T.); (L.v.d.H.); (L.V.E.); (A.-M.V.); (M.Z.); (N.Z.)
- Institute of Medical Virology, University of Zurich, 8057 Zurich, Switzerland
| | - Neta Zuckerman
- European Virus Bioinformatics Center, 07743 Jena, Germany; (A.A.); (P.A.-R.); (M.B.); (K.B.); (C.C.); (S.D.-E.); (L.D.); (C.C.F.); (M.I.G.); (E.G.K.); (D.K.); (U.K.-K.); (K.J.M.); (I.M.M.); (L.M.); (L.N.); (S.P.); (A.P.-C.); (J.R.); (E.T.); (C.T.); (L.v.d.H.); (L.V.E.); (A.-M.V.); (M.Z.); (N.Z.)
- Central Virology Laboratory, Public Health Services, Ministry of Health and Sheba Medical Center, Ramat Gan 52621, Israel
| | - Manja Marz
- European Virus Bioinformatics Center, 07743 Jena, Germany; (A.A.); (P.A.-R.); (M.B.); (K.B.); (C.C.); (S.D.-E.); (L.D.); (C.C.F.); (M.I.G.); (E.G.K.); (D.K.); (U.K.-K.); (K.J.M.); (I.M.M.); (L.M.); (L.N.); (S.P.); (A.P.-C.); (J.R.); (E.T.); (C.T.); (L.v.d.H.); (L.V.E.); (A.-M.V.); (M.Z.); (N.Z.)
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, 07743 Jena, Germany
| |
Collapse
|
12
|
Cai D, Sun Y. Reconstructing viral haplotypes using long reads. Bioinformatics 2022; 38:2127-2134. [PMID: 35157018 DOI: 10.1093/bioinformatics/btac089] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Revised: 01/19/2022] [Accepted: 02/08/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Most RNA viruses lack strict proofreading during replication. Coupled with a high replication rate, some RNA viruses can form a virus population containing a group of genetically related but different haplotypes. Characterizing the haplotype composition in a virus population is thus important to understand viruses' evolution. Many attempts have been made to reconstruct viral haplotypes using next-generation sequencing (NGS) reads. However, the short length of NGS reads cannot cover distant single-nucleotide variants, making it difficult to reconstruct complete or near-complete haplotypes. Given the fast developments of third-generation sequencing technologies, a new opportunity has arisen for reconstructing full-length haplotypes with long reads. RESULTS In this work, we developed a new tool, RVHaplo to reconstruct haplotypes for known viruses from long reads. We tested it rigorously on both simulated and real viral sequencing data and compared it against other popular haplotype reconstruction tools. The results demonstrated that RVHaplo outperforms the state-of-the-art tools for viral haplotype reconstruction from long reads. Especially, RVHaplo can reconstruct the rare (1% abundance) haplotypes that other tools usually missed. AVAILABILITY AND IMPLEMENTATION The source code and the documentation of RVHaplo are available at https://github.com/dhcai21/RVHaplo. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dehan Cai
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, Hong Kong SAR, China
| | - Yanni Sun
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, Hong Kong SAR, China
| |
Collapse
|
13
|
Guang A, Howison M, Ledingham L, D’Antuono M, Chan PA, Lawrence C, Dunn CW, Kantor R. Incorporating Within-Host Diversity in Phylogenetic Analyses for Detecting Clusters of New HIV Diagnoses. Front Microbiol 2022; 12:803190. [PMID: 35250908 PMCID: PMC8891961 DOI: 10.3389/fmicb.2021.803190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Accepted: 12/22/2021] [Indexed: 11/29/2022] Open
Abstract
Background Phylogenetic analyses of HIV sequences are used to detect clusters and inform public health interventions. Conventional approaches summarize within-host HIV diversity with a single consensus sequence per host of the pol gene, obtained from Sanger or next-generation sequencing (NGS). There is growing recognition that this approach discards potentially important information about within-host sequence variation, which can impact phylogenetic inference. However, whether alternative summary methods that incorporate intra-host variation impact phylogenetic inference of transmission network features is unknown. Methods We introduce profile sampling, a method to incorporate within-host NGS sequence diversity into phylogenetic HIV cluster inference. We compare this approach to Sanger- and NGS-derived pol and near-whole-genome consensus sequences and evaluate its potential benefits in identifying molecular clusters among all newly-HIV-diagnosed individuals over six months at the largest HIV center in Rhode Island. Results Profile sampling cluster inference demonstrated that within-host viral diversity impacts phylogenetic inference across individuals, and that consensus sequence approaches can obscure both magnitude and effect of these impacts. Clustering differed between Sanger- and NGS-derived consensus and profile sampling sequences, and across gene regions. Discussion Profile sampling can incorporate within-host HIV diversity captured by NGS into phylogenetic analyses. This additional information can improve robustness of cluster detection.
Collapse
Affiliation(s)
- August Guang
- Center for Computational Biology of Human Disease, Brown University, Providence, RI, United States
- Center for Computation and Visualization, Brown University, Providence, RI, United States
- *Correspondence: August Guang,
| | - Mark Howison
- Research Improving People’s Lives, Providence, RI, United States
| | - Lauren Ledingham
- Division of Infectious Diseases, The Alpert Medical School, Brown University, Providence, RI, United States
| | - Matthew D’Antuono
- Division of Infectious Diseases, The Alpert Medical School, Brown University, Providence, RI, United States
| | - Philip A. Chan
- Division of Infectious Diseases, The Alpert Medical School, Brown University, Providence, RI, United States
| | - Charles Lawrence
- Division of Applied Mathematics, Brown University, Providence, RI, United States
| | - Casey W. Dunn
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, United States
| | - Rami Kantor
- Division of Infectious Diseases, The Alpert Medical School, Brown University, Providence, RI, United States
| |
Collapse
|
14
|
Detecting Selection in the HIV-1 Genome during Sexual Transmission Events. Viruses 2022; 14:v14020406. [PMID: 35215999 PMCID: PMC8876189 DOI: 10.3390/v14020406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 02/04/2022] [Accepted: 02/11/2022] [Indexed: 01/27/2023] Open
Abstract
Little is known about whether and how variation in the HIV-1 genome affects its transmissibility. Assessing which genomic features of HIV-1 are under positive or negative selection during transmission is challenging, because very few virus particles are typically transmitted, and random genetic drift can dilute genetic signals in the recipient virus population. We analyzed 30 transmitter–recipient pairs from the Zurich Primary HIV Infection Study and the Swiss HIV Cohort Study using near full-length HIV-1 genomes. We developed a new statistical test to detect selection during transmission, called Selection Test in Transmission (SeTesT), based on comparing the transmitter and recipient virus population and accounting for the transmission bottleneck. We performed extensive simulations and found that sensitivity of detecting selection during transmission is limited by the strong population bottleneck of few transmitted virions. When pooling individual test results across patients, we found two candidate HIV-1 genomic features for affecting transmission, namely amino acid positions 3 and 18 of Vpu, which were significant before but not after correction for multiple testing. In summary, SeTesT provides a general framework for detecting selection based on genomic sequencing data of transmitted viruses. Our study shows that a higher number of transmitter–recipient pairs is required to improve sensitivity of detecting selection.
Collapse
|
15
|
Liao H, Cai D, Sun Y. VirStrain: a strain identification tool for RNA viruses. Genome Biol 2022; 23:38. [PMID: 35101081 PMCID: PMC8801933 DOI: 10.1186/s13059-022-02609-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2021] [Accepted: 01/12/2022] [Indexed: 12/18/2022] Open
Abstract
Viruses change constantly during replication, leading to high intra-species diversity. Although many changes are neutral or deleterious, some can confer on the virus different biological properties such as better adaptability. In addition, viral genotypes often have associated metadata, such as host residence, which can help with inferring viral transmission during pandemics. Thus, subspecies analysis can provide important insights into virus characterization. Here, we present VirStrain, a tool taking short reads as input with viral strain composition as output. We rigorously test VirStrain on multiple simulated and real virus sequencing datasets. VirStrain outperforms the state-of-the-art tools in both sensitivity and accuracy.
Collapse
Affiliation(s)
- Herui Liao
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, China
| | - Dehan Cai
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, China
| | - Yanni Sun
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, China.
| |
Collapse
|
16
|
Luo X, Kang X, Schönhuth A. Strainline: full-length de novo viral haplotype reconstruction from noisy long reads. Genome Biol 2022; 23:29. [PMID: 35057847 PMCID: PMC8771625 DOI: 10.1186/s13059-021-02587-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Accepted: 12/17/2021] [Indexed: 12/02/2022] Open
Abstract
Haplotype-resolved de novo assembly of highly diverse virus genomes is critical in prevention, control and treatment of viral diseases. Current methods either can handle only relatively accurate short read data, or collapse haplotype-specific variations into consensus sequence. Here, we present Strainline, a novel approach to assemble viral haplotypes from noisy long reads without a reference genome. Strainline is the first approach to provide strain-resolved, full-length de novo assemblies of viral quasispecies from noisy third-generation sequencing data. Benchmarking on simulated and real datasets of varying complexity and diversity confirm this novelty and demonstrate the superiority of Strainline.
Collapse
|
17
|
Utilizing the VirIdAl Pipeline to Search for Viruses in the Metagenomic Data of Bat Samples. Viruses 2021; 13:v13102006. [PMID: 34696436 PMCID: PMC8541124 DOI: 10.3390/v13102006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 09/30/2021] [Accepted: 10/02/2021] [Indexed: 12/27/2022] Open
Abstract
According to various estimates, only a small percentage of existing viruses have been discovered, naturally much less being represented in the genomic databases. High-throughput sequencing technologies develop rapidly, empowering large-scale screening of various biological samples for the presence of pathogen-associated nucleotide sequences, but many organisms are yet to be attributed specific loci for identification. This problem particularly impedes viral screening, due to vast heterogeneity in viral genomes. In this paper, we present a new bioinformatic pipeline, VirIdAl, for detecting and identifying viral pathogens in sequencing data. We also demonstrate the utility of the new software by applying it to viral screening of the feces of bats collected in the Moscow region, which revealed a significant variety of viruses associated with bats, insects, plants, and protozoa. The presence of alpha and beta coronavirus reads, including the MERS-like bat virus, deserves a special mention, as it once again indicates that bats are indeed reservoirs for many viral pathogens. In addition, it was shown that alignment-based methods were unable to identify the taxon for a large proportion of reads, and we additionally applied other approaches, showing that they can further reveal the presence of viral agents in sequencing data. However, the incompleteness of viral databases remains a significant problem in the studies of viral diversity, and therefore necessitates the use of combined approaches, including those based on machine learning methods.
Collapse
|
18
|
Knyazev S, Tsyvina V, Shankar A, Melnyk A, Artyomenko A, Malygina T, Porozov YB, Campbell EM, Switzer WM, Skums P, Mangul S, Zelikovsky A. Accurate assembly of minority viral haplotypes from next-generation sequencing through efficient noise reduction. Nucleic Acids Res 2021; 49:e102. [PMID: 34214168 PMCID: PMC8464054 DOI: 10.1093/nar/gkab576] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 05/25/2021] [Accepted: 06/18/2021] [Indexed: 12/21/2022] Open
Abstract
Rapidly evolving RNA viruses continuously produce minority haplotypes that can become dominant if they are drug-resistant or can better evade the immune system. Therefore, early detection and identification of minority viral haplotypes may help to promptly adjust the patient’s treatment plan preventing potential disease complications. Minority haplotypes can be identified using next-generation sequencing, but sequencing noise hinders accurate identification. The elimination of sequencing noise is a non-trivial task that still remains open. Here we propose CliqueSNV based on extracting pairs of statistically linked mutations from noisy reads. This effectively reduces sequencing noise and enables identifying minority haplotypes with the frequency below the sequencing error rate. We comparatively assess the performance of CliqueSNV using an in vitro mixture of nine haplotypes that were derived from the mutation profile of an existing HIV patient. We show that CliqueSNV can accurately assemble viral haplotypes with frequencies as low as 0.1% and maintains consistent performance across short and long bases sequencing platforms.
Collapse
Affiliation(s)
- Sergey Knyazev
- Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA.,Division of HIV Prevention, Centers for Disease Control and Prevention, Atlanta, GA 30333, USA.,Oak Ridge Institute for Science and Education, Oak Ridge, TN 37830, USA
| | - Viachaslau Tsyvina
- Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA
| | - Anupama Shankar
- Division of HIV Prevention, Centers for Disease Control and Prevention, Atlanta, GA 30333, USA
| | - Andrew Melnyk
- Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA
| | | | - Tatiana Malygina
- International Scientific and Research Institute of Bioengineering, ITMO University, St. Petersburg 197101, Russia
| | - Yuri B Porozov
- World-Class Research Center "Digital biodesign and personalized healthcare", I.M. Sechenov First Moscow State Medical University, Moscow 119991, Russia.,Department of Computational Biology, Sirius University of Science and Technology, Sochi 354340, Russia
| | - Ellsworth M Campbell
- Division of HIV Prevention, Centers for Disease Control and Prevention, Atlanta, GA 30333, USA
| | - William M Switzer
- Division of HIV Prevention, Centers for Disease Control and Prevention, Atlanta, GA 30333, USA
| | - Pavel Skums
- Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA
| | - Serghei Mangul
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, Los Angeles, CA 90089, USA
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA.,World-Class Research Center "Digital biodesign and personalized healthcare", I.M. Sechenov First Moscow State Medical University, Moscow 119991, Russia
| |
Collapse
|
19
|
Fuhrmann L, Jablonski KP, Beerenwinkel N. Quantitative measures of within-host viral genetic diversity. Curr Opin Virol 2021; 49:157-163. [PMID: 34153841 DOI: 10.1016/j.coviro.2021.06.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Revised: 06/03/2021] [Accepted: 06/07/2021] [Indexed: 12/22/2022]
Abstract
The genetic diversity of virus populations within their hosts is known to influence disease progression, treatment outcome, drug resistance, cell tropism, and transmission risk, and the study of dynamic changes of genetic heterogeneity can provide insights into the evolution of viruses. Several measures to quantify within-host genetic diversity capturing different aspects of diversity patterns in a sample or population are used, based on incidence, relative frequencies, pairwise distances, or phylogenetic trees. Here, we review and compare several of these measures.
Collapse
Affiliation(s)
- Lara Fuhrmann
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, 4058, Switzerland; SIB Swiss Institute of Bioinformatics, Basel, 4058, Switzerland
| | - Kim Philipp Jablonski
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, 4058, Switzerland; SIB Swiss Institute of Bioinformatics, Basel, 4058, Switzerland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, 4058, Switzerland; SIB Swiss Institute of Bioinformatics, Basel, 4058, Switzerland.
| |
Collapse
|
20
|
Freire B, Ladra S, Paramá JR, Salmela L. Inference of viral quasispecies with a paired de Bruijn graph. Bioinformatics 2021; 37:473-481. [PMID: 32926162 DOI: 10.1093/bioinformatics/btaa782] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2019] [Revised: 03/11/2020] [Accepted: 09/02/2020] [Indexed: 12/28/2022] Open
Abstract
MOTIVATION RNA viruses exhibit a high mutation rate and thus they exist in infected cells as a population of closely related strains called viral quasispecies. The viral quasispecies assembly problem asks to characterize the quasispecies present in a sample from high-throughput sequencing data. We study the de novo version of the problem, where reference sequences of the quasispecies are not available. Current methods for assembling viral quasispecies are either based on overlap graphs or on de Bruijn graphs. Overlap graph-based methods tend to be accurate but slow, whereas de Bruijn graph-based methods are fast but less accurate. RESULTS We present viaDBG, which is a fast and accurate de Bruijn graph-based tool for de novo assembly of viral quasispecies. We first iteratively correct sequencing errors in the reads, which allows us to use large k-mers in the de Bruijn graph. To incorporate the paired-end information in the graph, we also adapt the paired de Bruijn graph for viral quasispecies assembly. These features enable the use of long-range information in contig construction without compromising the speed of de Bruijn graph-based approaches. Our experimental results show that viaDBG is both accurate and fast, whereas previous methods are either fast or accurate but not both. In particular, viaDBG has comparable or better accuracy than SAVAGE, while being at least nine times faster. Furthermore, the speed of viaDBG is comparable to PEHaplo but viaDBG is able to retrieve also low abundance quasispecies, which are often missed by PEHaplo. AVAILABILITY AND IMPLEMENTATION viaDBG is implemented in C++ and it is publicly available at https://bitbucket.org/bfreirec1/viadbg. All datasets used in this article are publicly available at https://bitbucket.org/bfreirec1/data-viadbg/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Borja Freire
- Department of Computer Science and Information Technologies, Facultade de Informática, Universidade da Coruña, Centro de investigación CITIC, A Coruña, Spain
| | - Susana Ladra
- Department of Computer Science and Information Technologies, Facultade de Informática, Universidade da Coruña, Centro de investigación CITIC, A Coruña, Spain
| | - Jose R Paramá
- Department of Computer Science and Information Technologies, Facultade de Informática, Universidade da Coruña, Centro de investigación CITIC, A Coruña, Spain
| | - Leena Salmela
- Department of Computer Science, Helsinki Institute for Information Technology, University of Helsinki, Helsinki, Finland
| |
Collapse
|
21
|
Detecting and phasing minor single-nucleotide variants from long-read sequencing data. Nat Commun 2021; 12:3032. [PMID: 34031367 PMCID: PMC8144375 DOI: 10.1038/s41467-021-23289-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Accepted: 04/15/2021] [Indexed: 02/04/2023] Open
Abstract
Cellular genetic heterogeneity is common in many biological conditions including cancer, microbiome, and co-infection of multiple pathogens. Detecting and phasing minor variants play an instrumental role in deciphering cellular genetic heterogeneity, but they are still difficult tasks because of technological limitations. Recently, long-read sequencing technologies, including those by Pacific Biosciences and Oxford Nanopore, provide an opportunity to tackle these challenges. However, high error rates make it difficult to take full advantage of these technologies. To fill this gap, we introduce iGDA, an open-source tool that can accurately detect and phase minor single-nucleotide variants (SNVs), whose frequencies are as low as 0.2%, from raw long-read sequencing data. We also demonstrate that iGDA can accurately reconstruct haplotypes in closely related strains of the same species (divergence ≥0.011%) from long-read metagenomic data.
Collapse
|
22
|
Cao C, He L, Tian Y, Qin Y, Sun H, Ding W, Gui L, Wu P. Molecular epidemiology analysis of early variants of SARS-CoV-2 reveals the potential impact of mutations P504L and Y541C (NSP13) in the clinical COVID-19 outcomes. INFECTION GENETICS AND EVOLUTION 2021; 92:104831. [PMID: 33798758 PMCID: PMC8010360 DOI: 10.1016/j.meegid.2021.104831] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/24/2020] [Revised: 03/08/2021] [Accepted: 03/28/2021] [Indexed: 12/25/2022]
Abstract
Since severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused global pandemic with alarming speed, comprehensively analyzing the mutation and evolution of early SARS-CoV-2 strains contributes to detect and prevent such virus. Here, we explored 1962 high-quality genomes of early SARS-CoV-2 strains obtained from 42 countries before April 2020. The changing trends of genetic variations in SARS-CoV-2 strains over time and country were subsequently identified. In addition, viral genotype mapping and phylogenetic analysis were performed to identify the variation features of SARS-CoV-2. Results showed that 57.89% of genetic variations involved in ORF1ab, most of which (68.85%) were nonsynonymous. Haplotype maps and phylogenetic tree analysis showed that amino acid variations in ORF1ab (p.5828P > L and p.5865Y > C, also NSP13: P504L and NSP13: Y541C) were the important characteristics of such clade. Furthermore, these variants showed more significant aggregation in the United States (P = 2.92E-66, 95%) than in Australia or Canada, especially in strains from Washington State (P = 1.56E-23, 77.65%). Further analysis demonstrated that the report date of the variants was associated with the date of increased infections and the date of recovery and fatality rate change in the United States. More importantly, the fatality rate in Washington State was higher (4.13%) and showed poorer outcomes (P = 4.12E-21 in fatality rate, P = 3.64E-29 in death and recovered cases) than found in other states containing a small proportion of strains with such variants. Using sequence alignment, we found that variations at the 504 and 541 sites had functional effects on NSP13. In this study, we comprehensively analyzed genetic variations in SARS-CoV-2, gaining insights into amino acid variations in ORF1ab and COVID-19 outcomes.
Collapse
Affiliation(s)
- Canhui Cao
- Cancer Biology Research Center (Key Laboratory of the Ministry of Education), Tongji Medical College, Tongji Hospital, Huazhong University of Science and Technology, Wuhan 430030, China; Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Shenzhen Hospital, Shenzhen, Guangdong 518036, China
| | - Liang He
- Cancer Biology Research Center (Key Laboratory of the Ministry of Education), Tongji Medical College, Tongji Hospital, Huazhong University of Science and Technology, Wuhan 430030, China; Department of Gynecologic Oncology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Yuan Tian
- Cancer Biology Research Center (Key Laboratory of the Ministry of Education), Tongji Medical College, Tongji Hospital, Huazhong University of Science and Technology, Wuhan 430030, China; Department of Gynecologic Oncology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Yu Qin
- Cancer Biology Research Center (Key Laboratory of the Ministry of Education), Tongji Medical College, Tongji Hospital, Huazhong University of Science and Technology, Wuhan 430030, China; Department of Gynecologic Oncology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Haiyin Sun
- Cancer Biology Research Center (Key Laboratory of the Ministry of Education), Tongji Medical College, Tongji Hospital, Huazhong University of Science and Technology, Wuhan 430030, China; Department of Gynecologic Oncology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Wencheng Ding
- Cancer Biology Research Center (Key Laboratory of the Ministry of Education), Tongji Medical College, Tongji Hospital, Huazhong University of Science and Technology, Wuhan 430030, China; Department of Gynecologic Oncology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Lingli Gui
- Department of Anesthesiology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China.
| | - Peng Wu
- Cancer Biology Research Center (Key Laboratory of the Ministry of Education), Tongji Medical College, Tongji Hospital, Huazhong University of Science and Technology, Wuhan 430030, China; Department of Gynecologic Oncology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China.
| |
Collapse
|
23
|
Ramazzotti D, Angaroni F, Maspero D, Gambacorti-Passerini C, Antoniotti M, Graudenzi A, Piazza R. VERSO: A comprehensive framework for the inference of robust phylogenies and the quantification of intra-host genomic diversity of viral samples. PATTERNS (NEW YORK, N.Y.) 2021; 2:100212. [PMID: 33728416 PMCID: PMC7953447 DOI: 10.1016/j.patter.2021.100212] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Revised: 11/30/2020] [Accepted: 01/22/2021] [Indexed: 12/22/2022]
Abstract
We introduce VERSO, a two-step framework for the characterization of viral evolution from sequencing data of viral genomes, which is an improvement on phylogenomic approaches for consensus sequences. VERSO exploits an efficient algorithmic strategy to return robust phylogenies from clonal variant profiles, also in conditions of sampling limitations. It then leverages variant frequency patterns to characterize the intra-host genomic diversity of samples, revealing undetected infection chains and pinpointing variants likely involved in homoplasies. On simulations, VERSO outperforms state-of-the-art tools for phylogenetic inference. Notably, the application to 6,726 amplicon and RNA sequencing samples refines the estimation of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) evolution, while co-occurrence patterns of minor variants unveil undetected infection paths, which are validated with contact tracing data. Finally, the analysis of SARS-CoV-2 mutational landscape uncovers a temporal increase of overall genomic diversity and highlights variants transiting from minor to clonal state and homoplastic variants, some of which fall on the spike gene. Available at: https://github.com/BIMIB-DISCo/VERSO.
Collapse
Affiliation(s)
- Daniele Ramazzotti
- Department of Medicine and Surgery, Università degli Studi di Milano-Bicocca, Monza, Italy
| | - Fabrizio Angaroni
- Department of Informatics, Systems and Communication, Università degli Studi di Milano-Bicocca, Milan, Italy
| | - Davide Maspero
- Department of Informatics, Systems and Communication, Università degli Studi di Milano-Bicocca, Milan, Italy
- Inst. of Molecular Bioimaging and Physiology, Consiglio Nazionale delle Ricerche (IBFM-CNR), Segrate, Milan, Italy
| | | | - Marco Antoniotti
- Department of Informatics, Systems and Communication, Università degli Studi di Milano-Bicocca, Milan, Italy
- Bicocca Bioinformatics, Biostatistics and Bioimaging Centre – B4, Milan, Italy
| | - Alex Graudenzi
- Inst. of Molecular Bioimaging and Physiology, Consiglio Nazionale delle Ricerche (IBFM-CNR), Segrate, Milan, Italy
- Bicocca Bioinformatics, Biostatistics and Bioimaging Centre – B4, Milan, Italy
| | - Rocco Piazza
- Department of Medicine and Surgery, Università degli Studi di Milano-Bicocca, Monza, Italy
| |
Collapse
|
24
|
Riaz N, Leung P, Barton K, Smith MA, Carswell S, Bull R, Lloyd AR, Rodrigo C. Adaptation of Oxford Nanopore technology for hepatitis C whole genome sequencing and identification of within-host viral variants. BMC Genomics 2021; 22:148. [PMID: 33653280 PMCID: PMC7923462 DOI: 10.1186/s12864-021-07460-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2020] [Accepted: 02/19/2021] [Indexed: 01/23/2023] Open
Abstract
Background Hepatitis C (HCV) and many other RNA viruses exist as rapidly mutating quasi-species populations in a single infected host. High throughput characterization of full genome, within-host variants is still not possible despite advances in next generation sequencing. This limitation constrains viral genomic studies that depend on accurate identification of hemi-genome or whole genome, within-host variants, especially those occurring at low frequencies. With the advent of third generation long read sequencing technologies, including Oxford Nanopore Technology (ONT) and PacBio platforms, this problem is potentially surmountable. ONT is particularly attractive in this regard due to the portable nature of the MinION sequencer, which makes real-time sequencing in remote and resource-limited locations possible. However, this technology (termed here ‘nanopore sequencing’) has a comparatively high technical error rate. The present study aimed to assess the utility, accuracy and cost-effectiveness of nanopore sequencing for HCV genomes. We also introduce a new bioinformatics tool (Nano-Q) to differentiate within-host variants from nanopore sequencing. Results The Nanopore platform, when the coverage exceeded 300 reads, generated comparable consensus sequences to Illumina sequencing. Using HCV Envelope plasmids (~ 1800 nt) mixed in known proportions, the capacity of nanopore sequencing to reliably identify variants with an abundance as low as 0.1% was demonstrated, provided the autologous reference sequence was available to identify the matching reads. Successful pooling and nanopore sequencing of 52 samples from patients with HCV infection demonstrated its cost effectiveness (AUD$ 43 per sample with nanopore sequencing versus $100 with paired-end short read technology). The Nano-Q tool successfully separated between-host sequences, including those from the same subtype, by bulk sorting and phylogenetic clustering without an autologous reference sequence (using only a subtype-specific generic reference). The pipeline also identified within-host viral variants and their abundance when the parameters were appropriately adjusted. Conclusion Cost effective HCV whole genome sequencing and within-host variant identification without haplotype reconstruction are potential advantages of nanopore sequencing. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07460-1.
Collapse
Affiliation(s)
- Nasir Riaz
- Kirby Institute, UNSW Sydney, Sydney, NSW, 2052, Australia.,Department of Microbiology, Hazara University, KPK, Maneshra, 21120, Pakistan
| | - Preston Leung
- Kirby Institute, UNSW Sydney, Sydney, NSW, 2052, Australia
| | - Kirston Barton
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Sydney, Australia
| | - Martin A Smith
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Sydney, Australia
| | - Shaun Carswell
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Sydney, Australia
| | - Rowena Bull
- Kirby Institute, UNSW Sydney, Sydney, NSW, 2052, Australia.,Department of Pathology, School of Medical Sciences, UNSW Sydney, Sydney, NSW, 2052, Australia
| | - Andrew R Lloyd
- Kirby Institute, UNSW Sydney, Sydney, NSW, 2052, Australia
| | - Chaturaka Rodrigo
- Kirby Institute, UNSW Sydney, Sydney, NSW, 2052, Australia. .,Department of Pathology, School of Medical Sciences, UNSW Sydney, Sydney, NSW, 2052, Australia.
| |
Collapse
|
25
|
Graudenzi A, Maspero D, Angaroni F, Piazza R, Ramazzotti D. Mutational signatures and heterogeneous host response revealed via large-scale characterization of SARS-CoV-2 genomic diversity. iScience 2021; 24:102116. [PMID: 33532709 PMCID: PMC7842190 DOI: 10.1016/j.isci.2021.102116] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Revised: 11/09/2020] [Accepted: 01/22/2021] [Indexed: 01/03/2023] Open
Abstract
To dissect the mechanisms underlying the inflation of variants in the Severe Acute Respiratory Syndrome CoronaVirus 2 (SARS-CoV-2) genome, we present a large-scale analysis of intra-host genomic diversity, which reveals that most samples exhibit heterogeneous genomic architectures, due to the interplay between host-related mutational processes and transmission dynamics. The decomposition of minor variants profiles unveils three non-overlapping mutational signatures related to nucleotide substitutions and likely ruled by APOlipoprotein B Editing Complex (APOBEC), Reactive Oxygen Species (ROS), and Adenosine Deaminase Acting on RNA (ADAR), highlighting heterogeneous host responses to SARS-CoV-2 infections. A corrected-for-signatures dN/dS analysis demonstrates that such mutational processes are affected by purifying selection, with important exceptions. In fact, several mutations appear to transit toward clonality, defining new clonal genotypes that increase the overall genomic diversity. Furthermore, the phylogenomic analysis shows the presence of homoplasies and supports the hypothesis of transmission of minor variants. This study paves the way for the integrated analysis of intra-host genomic diversity and clinical outcomes of SARS-CoV-2 infections.
Collapse
Affiliation(s)
- Alex Graudenzi
- Inst. of Molecular Bioimaging and Physiology, Consiglio Nazionale delle Ricerche (IBFM-CNR), Segrate, Milan, Italy
- Bicocca Bioinformatics, Biostatistics and Bioimaging Centre – B4, Milan, Italy
| | - Davide Maspero
- Inst. of Molecular Bioimaging and Physiology, Consiglio Nazionale delle Ricerche (IBFM-CNR), Segrate, Milan, Italy
- Department of Informatics, Systems and Communication, Univ. of Milan-Bicocca, Milan, Italy
| | - Fabrizio Angaroni
- Department of Informatics, Systems and Communication, Univ. of Milan-Bicocca, Milan, Italy
| | - Rocco Piazza
- Department of Medicine and Surgery, Univ. of Milan-Bicocca, Monza, Italy
| | - Daniele Ramazzotti
- Department of Medicine and Surgery, Univ. of Milan-Bicocca, Monza, Italy
| |
Collapse
|
26
|
Cao C, He J, Mak L, Perera D, Kwok D, Wang J, Li M, Mourier T, Gavriliuc S, Greenberg M, Morrissy AS, Sycuro LK, Yang G, Jeffares DC, Long Q. Reconstruction of Microbial Haplotypes by Integration of Statistical and Physical Linkage in Scaffolding. Mol Biol Evol 2021; 38:2660-2672. [PMID: 33547786 PMCID: PMC8136496 DOI: 10.1093/molbev/msab037] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
DNA sequencing technologies provide unprecedented opportunities to analyze within-host evolution of microorganism populations. Often, within-host populations are analyzed via pooled sequencing of the population, which contains multiple individuals or "haplotypes." However, current next-generation sequencing instruments, in conjunction with single-molecule barcoded linked-reads, cannot distinguish long haplotypes directly. Computational reconstruction of haplotypes from pooled sequencing has been attempted in virology, bacterial genomics, metagenomics, and human genetics, using algorithms based on either cross-host genetic sharing or within-host genomic reads. Here, we describe PoolHapX, a flexible computational approach that integrates information from both genetic sharing and genomic sequencing. We demonstrated that PoolHapX outperforms state-of-the-art tools tailored to specific organismal systems, and is robust to within-host evolution. Importantly, together with barcoded linked-reads, PoolHapX can infer whole-chromosome-scale haplotypes from 50 pools each containing 12 different haplotypes. By analyzing real data, we uncovered dynamic variations in the evolutionary processes of within-patient HIV populations previously unobserved in single position-based analysis.
Collapse
Affiliation(s)
- Chen Cao
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada
| | - Jingni He
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada,Department of Cardiology, Xiangya Hospital, Central South University, Changsha, China
| | - Lauren Mak
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada,Present address: Tri-Institutional Computational Biology & Medicine Program, Weill Cornell Medicine of Cornell University, New York, NY, USA
| | - Deshan Perera
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada
| | - Devin Kwok
- Department of Mathematics & Statistics, University of Calgary, Calgary, AB, Canada
| | - Jia Wang
- Electrical and Computer Engineering, Illinois Institute of Technology, Chicago, IL, USA
| | - Minghao Li
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada
| | - Tobias Mourier
- Pathogen Genomics Laboratory, Biological and Environmental Sciences and Engineering (BESE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Stefan Gavriliuc
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada
| | - Matthew Greenberg
- Department of Mathematics & Statistics, University of Calgary, Calgary, AB, Canada
| | - A Sorana Morrissy
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada
| | - Laura K Sycuro
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada,Department of Microbiology, Immunology, and Infectious Diseases, Snyder Institute for Chronic Diseases, University of Calgary, Calgary, AB, Canada
| | - Guang Yang
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada,Department of Medical Genetics, University of Calgary, Calgary, AB, Canada
| | - Daniel C Jeffares
- Department of Biology, York Biomedical Research Institute, University of York, York, United Kingdom
| | - Quan Long
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada,Department of Mathematics & Statistics, University of Calgary, Calgary, AB, Canada,Department of Medical Genetics, University of Calgary, Calgary, AB, Canada,Hotchkiss Brain Institute, O’Brien Institute for Public Health, University of Calgary, Calgary, AB, Canada,Corresponding author: E-mail:
| |
Collapse
|
27
|
Cao C, Greenberg M, Long Q. WgLink: reconstructing whole-genome viral haplotypes using L0+L1-regularization. Bioinformatics 2021; 37:2744-2746. [PMID: 33532820 DOI: 10.1093/bioinformatics/btab076] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 12/23/2020] [Accepted: 01/29/2021] [Indexed: 12/24/2022] Open
Abstract
SUMMARY Many tools can reconstruct viral sequences based on next generation sequencing reads. Although existing tools effectively recover local regions, their accuracy suffers when reconstructing the whole viral genomes (strains). Moreover, they consume significant memory when the sequencing coverage is high or when the genome size is large. We present WgLink to meet this challenge. WgLink takes local reconstructions produced by other tools as input and patches the resulting segments together into coherent whole-genome strains. We accomplish this using an L0+L1-regularized regression synthesizing variant allele frequency data with physical linkage between multiple variants spanning multiple regions simultaneously. WgLink achieves higher accuracy than existing tools both on simulated and real data sets while using significantly less memory (RAM) and fewer CPU hours. AVAILABILITY Source code and binaries are freely available at https://github.com/theLongLab/wglink. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chen Cao
- Department of Biochemistry & Molecular Biology, Alberta Children's Hospital Research Institute, Calgary, AB, T2N 4N1, Canada
| | - Matthew Greenberg
- Department of Mathematics & Statistics, Calgary, AB, T2N 4N1, Canada
| | - Quan Long
- Department of Biochemistry & Molecular Biology, Alberta Children's Hospital Research Institute, Calgary, AB, T2N 4N1, Canada.,Department of Mathematics & Statistics, Calgary, AB, T2N 4N1, Canada.,Department of Medical Genetics, Hotchkiss Brain Institute, University of Calgary, Calgary, AB, T2N 4N1, Canada
| |
Collapse
|
28
|
Posada-Céspedes S, Seifert D, Topolsky I, Jablonski KP, Metzner KJ, Beerenwinkel N. V-pipe: a computational pipeline for assessing viral genetic diversity from high-throughput data. Bioinformatics 2021; 37:1673-1680. [PMID: 33471068 PMCID: PMC8289377 DOI: 10.1093/bioinformatics/btab015] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Revised: 12/09/2020] [Accepted: 01/08/2021] [Indexed: 12/30/2022] Open
Abstract
Motivation High-throughput sequencing technologies are used increasingly not only in viral genomics research but also in clinical surveillance and diagnostics. These technologies facilitate the assessment of the genetic diversity in intra-host virus populations, which affects transmission, virulence and pathogenesis of viral infections. However, there are two major challenges in analysing viral diversity. First, amplification and sequencing errors confound the identification of true biological variants, and second, the large data volumes represent computational limitations. Results To support viral high-throughput sequencing studies, we developed V-pipe, a bioinformatics pipeline combining various state-of-the-art statistical models and computational tools for automated end-to-end analyses of raw sequencing reads. V-pipe supports quality control, read mapping and alignment, low-frequency mutation calling, and inference of viral haplotypes. For generating high-quality read alignments, we developed a novel method, called ngshmmalign, based on profile hidden Markov models and tailored to small and highly diverse viral genomes. V-pipe also includes benchmarking functionality providing a standardized environment for comparative evaluations of different pipeline configurations. We demonstrate this capability by assessing the impact of three different read aligners (Bowtie 2, BWA MEM, ngshmmalign) and two different variant callers (LoFreq, ShoRAH) on the performance of calling single-nucleotide variants in intra-host virus populations. V-pipe supports various pipeline configurations and is implemented in a modular fashion to facilitate adaptations to the continuously changing technology landscape. Availabilityand implementation V-pipe is freely available at https://github.com/cbg-ethz/V-pipe. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Susana Posada-Céspedes
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, 4058, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, 4058, Switzerland
| | - David Seifert
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, 4058, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, 4058, Switzerland
| | - Ivan Topolsky
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, 4058, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, 4058, Switzerland
| | - Kim Philipp Jablonski
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, 4058, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, 4058, Switzerland
| | - Karin J Metzner
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, Zurich, 8091, Switzerland.,4 Institute of Medical Virology, University of Zurich, Zurich, 8091, Switzerland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, 4058, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, 4058, Switzerland
| |
Collapse
|
29
|
Bons E, Leemann C, Metzner KJ, Regoes RR. Long-term experimental evolution of HIV-1 reveals effects of environment and mutational history. PLoS Biol 2020; 18:e3001010. [PMID: 33370289 PMCID: PMC7793244 DOI: 10.1371/journal.pbio.3001010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2020] [Revised: 01/08/2021] [Accepted: 11/30/2020] [Indexed: 11/21/2022] Open
Abstract
An often-returning question for not only HIV-1, but also other organisms, is how predictable evolutionary paths are. The environment, mutational history, and random processes can all impact the exact evolutionary paths, but to which extent these factors contribute to the evolutionary dynamics of a particular system is an open question. Especially in a virus like HIV-1, with a large mutation rate and large population sizes, evolution is expected to be highly predictable if the impact of environment and history is low, and evolution is not neutral. We investigated the effect of environment and mutational history by analyzing sequences from a long-term evolution experiment, in which HIV-1 was passaged on 2 different cell types in 8 independent evolutionary lines and 8 derived lines, 4 of which involved a switch of the environment. The experiments lasted for 240–300 passages, corresponding to approximately 400–600 generations or almost 3 years. The sequences show signs of extensive parallel evolution—the majority of mutations that are shared between independent lines appear in both cell types, but we also find that both environment and mutational history significantly impact the evolutionary paths. We conclude that HIV-1 evolution is robust to small changes in the environment, similar to a transmission event in the absence of an immune response or drug pressure. We also find that the fitness landscape of HIV-1 is largely smooth, although we find some evidence for both positive and negative epistatic interactions between mutations. Analysis of the longest evolutionary experiment with HIV-1 to-date reveals continuous viral adaptation over several years. The authors quantify the environment-specific mutations that arise and determine the fraction of mutations that co-occur with significantly different frequencies than expected by chance.
Collapse
Affiliation(s)
- Eva Bons
- Department of Environmental Systems Sciences, Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
| | - Christine Leemann
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, Zurich, Switzerland
- Institute of Medical Virology, University of Zurich, Zurich, Switzerland
| | - Karin J. Metzner
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, Zurich, Switzerland
- Institute of Medical Virology, University of Zurich, Zurich, Switzerland
- * E-mail: (KJM); (RRR)
| | - Roland R. Regoes
- Department of Environmental Systems Sciences, Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
- * E-mail: (KJM); (RRR)
| |
Collapse
|
30
|
Adikari TN, Riaz N, Sigera C, Leung P, Valencia BM, Barton K, Smith MA, Bull RA, Li H, Luciani F, Weeratunga P, Thein TL, Lim VWX, Leo YS, Rajapakse S, Fink K, Lloyd AR, Fernando D, Rodrigo C. Single molecule, near full-length genome sequencing of dengue virus. Sci Rep 2020; 10:18196. [PMID: 33097792 PMCID: PMC7584602 DOI: 10.1038/s41598-020-75374-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Accepted: 10/12/2020] [Indexed: 01/23/2023] Open
Abstract
Current methods for dengue virus (DENV) genome amplification, amplify parts of the genome in at least 5 overlapping segments and then combine the output to characterize a full genome. This process is laborious, costly and requires at least 10 primers per serotype, thus increasing the likelihood of PCR bias. We introduce an assay to amplify near full-length dengue virus genomes as intact molecules, sequence these amplicons with third generation “nanopore” technology without fragmenting and use the sequence data to differentiate within-host viral variants with a bioinformatics tool (Nano-Q). The new assay successfully generated near full-length amplicons from DENV serotypes 1, 2 and 3 samples which were sequenced with nanopore technology. Consensus DENV sequences generated by nanopore sequencing had over 99.5% pairwise sequence similarity to Illumina generated counterparts provided the coverage was > 100 with both platforms. Maximum likelihood phylogenetic trees generated from nanopore consensus sequences were able to reproduce the exact trees made from Illumina sequencing with a conservative 99% bootstrapping threshold (after 1000 replicates and 10% burn-in). Pairwise genetic distances of within host variants identified from the Nano-Q tool were less than that of between host variants, thus enabling the phylogenetic segregation of variants from the same host.
Collapse
Affiliation(s)
- Thiruni N Adikari
- School of Medical Sciences, University of New South Wales, Sydney, Australia.,Institute for Combinatorial Advanced Research and Education, Sir John Kotelawala Defence University, Ratmalana, Sri Lanka
| | - Nasir Riaz
- Kirby Institute, University of New South Wales, Sydney, Australia.,Department of Microbiology, Hazara University, Mansehra, KPK, Pakistan
| | - Chathurani Sigera
- Department of Parasitology, Faculty of Medicine, University of Colombo, Colombo, Sri Lanka
| | - Preston Leung
- Kirby Institute, University of New South Wales, Sydney, Australia
| | | | - Kirston Barton
- Garvan Institute of Medical Research, Sydney, Australia and St-Vincent's Clinical School, Faculty of Medicine, UNSW, Sydney, Australia
| | - Martin A Smith
- Garvan Institute of Medical Research, Sydney, Australia and St-Vincent's Clinical School, Faculty of Medicine, UNSW, Sydney, Australia.,CHU Sainte-Justine Research Centre, Montreal, Canada.,Department of Biochemistry and Molecular Medicine, Université de Montréal, Montreal, Canada
| | - Rowena A Bull
- School of Medical Sciences, University of New South Wales, Sydney, Australia.,Kirby Institute, University of New South Wales, Sydney, Australia
| | - Hui Li
- Kirby Institute, University of New South Wales, Sydney, Australia
| | - Fabio Luciani
- School of Medical Sciences, University of New South Wales, Sydney, Australia.,Kirby Institute, University of New South Wales, Sydney, Australia
| | - Praveen Weeratunga
- Department of Clinical Medicine, Faculty of Medicine, University of Colombo, Colombo, Sri Lanka
| | - Tun-Linn Thein
- National Centre for Infectious Diseases, Singapore, Singapore
| | - Vanessa W X Lim
- National Centre for Infectious Diseases, Singapore, Singapore
| | - Yee-Sin Leo
- National Centre for Infectious Diseases, Singapore, Singapore.,Department of Infectious Diseases, Tan Tock Seng Hospital, Singapore, Singapore.,Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.,Lee Kong Chian School of Medicine, Singapore, Singapore
| | - Senaka Rajapakse
- Department of Clinical Medicine, Faculty of Medicine, University of Colombo, Colombo, Sri Lanka
| | - Katja Fink
- Agency for Science, Technology and Research, Singapore, Singapore
| | - Andrew R Lloyd
- Kirby Institute, University of New South Wales, Sydney, Australia
| | - Deepika Fernando
- Department of Parasitology, Faculty of Medicine, University of Colombo, Colombo, Sri Lanka
| | - Chaturaka Rodrigo
- School of Medical Sciences, University of New South Wales, Sydney, Australia. .,Kirby Institute, University of New South Wales, Sydney, Australia.
| |
Collapse
|
31
|
Eliseev A, Gibson KM, Avdeyev P, Novik D, Bendall ML, Pérez-Losada M, Alexeev N, Crandall KA. Evaluation of haplotype callers for next-generation sequencing of viruses. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2020; 82:104277. [PMID: 32151775 PMCID: PMC7293574 DOI: 10.1016/j.meegid.2020.104277] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/04/2019] [Revised: 03/04/2020] [Accepted: 03/06/2020] [Indexed: 01/30/2023]
Abstract
Currently, the standard practice for assembling next-generation sequencing (NGS) reads of viral genomes is to summarize thousands of individual short reads into a single consensus sequence, thus confounding useful intra-host diversity information for molecular phylodynamic inference. It is hypothesized that a few viral strains may dominate the intra-host genetic diversity with a variety of lower frequency strains comprising the rest of the population. Several software tools currently exist to convert NGS sequence variants into haplotypes. Previous benchmarks of viral haplotype reconstruction programs used simulation scenarios that are useful from a mathematical perspective but do not reflect viral evolution and epidemiology. Here, we tested twelve NGS haplotype reconstruction methods using viral populations simulated under realistic evolutionary dynamics. We simulated coalescent-based populations that spanned known levels of viral genetic diversity, including mutation rates, sample size and effective population size, to test the limits of the haplotype reconstruction methods and to ensure coverage of predicted intra-host viral diversity levels (especially HIV-1). All twelve investigated haplotype callers showed variable performance and produced drastically different results that were mainly driven by differences in mutation rate and, to a lesser extent, in effective population size. Most methods were able to accurately reconstruct haplotypes when genetic diversity was low. However, under higher levels of diversity (e.g., those seen intra-host HIV-1 infections), haplotype reconstruction quality was highly variable and, on average, poor. All haplotype reconstruction tools, except QuasiRecomb and ShoRAH, greatly underestimated intra-host diversity and the true number of haplotypes. PredictHaplo outperformed, in regard to highest precision, recall, and lowest UniFrac distance values, the other haplotype reconstruction tools followed by CliqueSNV, which, given more computational time, may have outperformed PredictHaplo. Here, we present an extensive comparison of available viral haplotype reconstruction tools and provide insights for future improvements in haplotype reconstruction tools using both short-read and long-read technologies.
Collapse
Affiliation(s)
- Anton Eliseev
- Computer Technologies Laboratory, ITMO University, Saint-Petersburg, Russia
| | - Keylie M Gibson
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA.
| | - Pavel Avdeyev
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA; Department of Mathematics, George Washington University, Washington, DC, USA
| | - Dmitry Novik
- Computer Technologies Laboratory, ITMO University, Saint-Petersburg, Russia
| | - Matthew L Bendall
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA
| | - Marcos Pérez-Losada
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA; Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, George Washington University, Washington, DC, USA; CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Campus Agrário de Vairão, Vairão, Portugal
| | - Nikita Alexeev
- Computer Technologies Laboratory, ITMO University, Saint-Petersburg, Russia
| | - Keith A Crandall
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA; Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, George Washington University, Washington, DC, USA
| |
Collapse
|
32
|
Inferring Transmission Bottleneck Size from Viral Sequence Data Using a Novel Haplotype Reconstruction Method. J Virol 2020; 94:JVI.00014-20. [PMID: 32295920 PMCID: PMC7307158 DOI: 10.1128/jvi.00014-20] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Accepted: 04/08/2020] [Indexed: 12/12/2022] Open
Abstract
Viral populations undergo a repeated cycle of within-host growth followed by transmission. Viral evolution is affected by each stage of this cycle. The number of viral particles transmitted from one host to another, known as the transmission bottleneck, is an important factor in determining how the evolutionary dynamics of the population play out, restricting the extent to which the evolved diversity of the population can be passed from one host to another. Previous study of viral sequence data has suggested that the transmission bottleneck size for influenza A transmission between human hosts is small. Reevaluating these data using a novel and improved method, we largely confirm this result, albeit that we infer a slightly higher bottleneck size in some cases, of between 1 and 13 virions. While a tight bottleneck operates in human influenza transmission, it is not extreme in nature; some diversity can be meaningfully retained between hosts. The transmission bottleneck is defined as the number of viral particles that transmit from one host to establish an infection in another. Genome sequence data have been used to evaluate the size of the transmission bottleneck between humans infected with the influenza virus; however, the methods used to make these estimates have some limitations. Specifically, viral allele frequencies, which form the basis of many calculations, may not fully capture a process which involves the transmission of entire viral genomes. Here, we set out a novel approach for inferring viral transmission bottlenecks; our method combines an algorithm for haplotype reconstruction with maximum likelihood methods for bottleneck inference. This approach allows for rapid calculation and performs well when applied to data from simulated transmission events; errors in the haplotype reconstruction step did not adversely affect inferences of the population bottleneck. Applied to data from a previous household transmission study of influenza A infection, we confirm the result that the majority of transmission events involve a small number of viruses, albeit with slightly looser bottlenecks being inferred, with between 1 and 13 particles transmitted in the majority of cases. While influenza A transmission involves a tight population bottleneck, the bottleneck is not so tight as to universally prevent the transmission of within-host viral diversity. IMPORTANCE Viral populations undergo a repeated cycle of within-host growth followed by transmission. Viral evolution is affected by each stage of this cycle. The number of viral particles transmitted from one host to another, known as the transmission bottleneck, is an important factor in determining how the evolutionary dynamics of the population play out, restricting the extent to which the evolved diversity of the population can be passed from one host to another. Previous study of viral sequence data has suggested that the transmission bottleneck size for influenza A transmission between human hosts is small. Reevaluating these data using a novel and improved method, we largely confirm this result, albeit that we infer a slightly higher bottleneck size in some cases, of between 1 and 13 virions. While a tight bottleneck operates in human influenza transmission, it is not extreme in nature; some diversity can be meaningfully retained between hosts.
Collapse
|
33
|
Howison M, Coetzer M, Kantor R. Measurement error and variant-calling in deep Illumina sequencing of HIV. Bioinformatics 2020; 35:2029-2035. [PMID: 30407489 DOI: 10.1093/bioinformatics/bty919] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2018] [Revised: 09/21/2018] [Accepted: 11/06/2018] [Indexed: 01/23/2023] Open
Abstract
MOTIVATION Next-generation deep sequencing of viral genomes, particularly on the Illumina platform, is increasingly applied in HIV research. Yet, there is no standard protocol or method used by the research community to account for measurement errors that arise during sample preparation and sequencing. Correctly calling high and low-frequency variants while controlling for erroneous variants is an important precursor to downstream interpretation, such as studying the emergence of HIV drug-resistance mutations, which in turn has clinical applications and can improve patient care. RESULTS We developed a new variant-calling pipeline, hivmmer, for Illumina sequences from HIV viral genomes. First, we validated hivmmer by comparing it to other variant-calling pipelines on real HIV plasmid datasets. We found that hivmmer achieves a lower rate of erroneous variants, and that all methods agree on the frequency of correctly called variants. Next, we compared the methods on an HIV plasmid dataset that was sequenced using Primer ID, an amplicon-tagging protocol, which is designed to reduce errors and amplification bias during library preparation. We show that the Primer ID consensus exhibits fewer erroneous variants compared to the variant-calling pipelines, and that hivmmer more closely approaches this low error rate compared to the other pipelines. The frequency estimates from the Primer ID consensus do not differ significantly from those of the variant-calling pipelines. AVAILABILITY AND IMPLEMENTATION hivmmer is freely available for non-commercial use from https://github.com/kantorlab/hivmmer. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mark Howison
- Watson Institute for International and Public Affairs
| | - Mia Coetzer
- Division of Infectious Diseases, The Alpert Medical School, Brown University, Providence, RI, USA
| | - Rami Kantor
- Division of Infectious Diseases, The Alpert Medical School, Brown University, Providence, RI, USA
| |
Collapse
|
34
|
Mitchell K, Brito JJ, Mandric I, Wu Q, Knyazev S, Chang S, Martin LS, Karlsberg A, Gerasimov E, Littman R, Hill BL, Wu NC, Yang HT, Hsieh K, Chen L, Littman E, Shabani T, Enik G, Yao D, Sun R, Schroeder J, Eskin E, Zelikovsky A, Skums P, Pop M, Mangul S. Benchmarking of computational error-correction methods for next-generation sequencing data. Genome Biol 2020; 21:71. [PMID: 32183840 PMCID: PMC7079412 DOI: 10.1186/s13059-020-01988-3] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2019] [Accepted: 03/06/2020] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Recent advancements in next-generation sequencing have rapidly improved our ability to study genomic material at an unprecedented scale. Despite substantial improvements in sequencing technologies, errors present in the data still risk confounding downstream analysis and limiting the applicability of sequencing technologies in clinical tools. Computational error correction promises to eliminate sequencing errors, but the relative accuracy of error correction algorithms remains unknown. RESULTS In this paper, we evaluate the ability of error correction algorithms to fix errors across different types of datasets that contain various levels of heterogeneity. We highlight the advantages and limitations of computational error correction techniques across different domains of biology, including immunogenomics and virology. To demonstrate the efficacy of our technique, we apply the UMI-based high-fidelity sequencing protocol to eliminate sequencing errors from both simulated data and the raw reads. We then perform a realistic evaluation of error-correction methods. CONCLUSIONS In terms of accuracy, we find that method performance varies substantially across different types of datasets with no single method performing best on all types of examined data. Finally, we also identify the techniques that offer a good balance between precision and sensitivity.
Collapse
Affiliation(s)
- Keith Mitchell
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Jaqueline J Brito
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Los Angeles, CA, 90089, USA
| | - Igor Mandric
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
- Department of Computer Science, Georgia State University, 1 Park Place, Atlanta, GA, 30303, USA
| | - Qiaozhen Wu
- Department of Mathematics, University of California Los Angeles, 520 Portola Plaza, Los Angeles, CA, 90095, USA
| | - Sergey Knyazev
- Department of Computer Science, Georgia State University, 1 Park Place, Atlanta, GA, 30303, USA
| | - Sei Chang
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Lana S Martin
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Los Angeles, CA, 90089, USA
| | - Aaron Karlsberg
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Los Angeles, CA, 90089, USA
| | - Ekaterina Gerasimov
- Department of Computer Science, Georgia State University, 1 Park Place, Atlanta, GA, 30303, USA
| | - Russell Littman
- UCLA Bioinformatics, 621 Charles E Young Dr S, Los Angeles, CA, 90024, USA
| | - Brian L Hill
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Nicholas C Wu
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Harry Taegyun Yang
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Kevin Hsieh
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Linus Chen
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Eli Littman
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Taylor Shabani
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - German Enik
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Douglas Yao
- Department of Molecular, Cell, and Developmental Biology, University of California Los Angeles, 650 Charles E. Young Drive South, Los Angeles, CA, 90095, USA
| | - Ren Sun
- Department of Molecular and Medical Pharmacology, University of California Los Angeles, 650 Charles E. Young Drive South, Los Angeles, CA, 90095, USA
| | - Jan Schroeder
- Epigenetics & Reprogramming Laboratory, Monash University, 15 Innovation Walk, Melbourne, VIC, 3800, Australia
| | - Eleazar Eskin
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, 1 Park Place, Atlanta, GA, 30303, USA
- The Laboratory of Bioinformatics, I.M, Sechenov First Moscow State Medical University, Moscow, Russia, 119991
| | - Pavel Skums
- Department of Computer Science, Georgia State University, 1 Park Place, Atlanta, GA, 30303, USA
| | - Mihai Pop
- Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, 20742, USA
| | - Serghei Mangul
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Los Angeles, CA, 90089, USA.
| |
Collapse
|
35
|
Carlisle LA, Turk T, Kusejko K, Metzner KJ, Leemann C, Schenkel CD, Bachmann N, Posada S, Beerenwinkel N, Böni J, Yerly S, Klimkait T, Perreau M, Braun DL, Rauch A, Calmy A, Cavassini M, Battegay M, Vernazza P, Bernasconi E, Günthard HF, Kouyos RD. Viral Diversity Based on Next-Generation Sequencing of HIV-1 Provides Precise Estimates of Infection Recency and Time Since Infection. J Infect Dis 2020; 220:254-265. [PMID: 30835266 DOI: 10.1093/infdis/jiz094] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2018] [Accepted: 03/01/2019] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND Human immunodeficiency virus type 1 (HIV-1) genetic diversity increases over the course of infection and can be used to infer the time since infection and, consequently, infection recency, which are crucial for HIV-1 surveillance and the understanding of viral pathogenesis. METHODS We considered 313 HIV-infected individuals for whom reliable estimates of infection dates and next-generation sequencing (NGS)-derived nucleotide frequency data were available. Fractions of ambiguous nucleotides, obtained by population sequencing, were available for 207 samples. We assessed whether the average pairwise diversity calculated using NGS sequences provided a more exact prediction of the time since infection and classification of infection recency (<1 year after infection), compared with the fraction of ambiguous nucleotides. RESULTS NGS-derived average pairwise diversity classified an infection as recent with a sensitivity of 88% and a specificity of 85%. When considering only the 207 samples for which fractions of ambiguous nucleotides were available, the NGS-derived average pairwise diversity exhibited a higher sensitivity (90% vs 78%) and specificity (95% vs 67%) than the fraction of ambiguous nucleotides. Additionally, the average pairwise diversity could be used to estimate the time since infection with a mean absolute error of 0.84 years, compared with 1.03 years for the fraction of ambiguous nucleotides. CONCLUSIONS Viral diversity based on NGS data is more precise than that based on population sequencing in its ability to predict infection recency and provides an estimated time since infection that has a mean absolute error of <1 year.
Collapse
Affiliation(s)
- Louisa A Carlisle
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich.,Institute of Medical Virology, University of Zurich, Zurich
| | - Teja Turk
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich.,Institute of Medical Virology, University of Zurich, Zurich
| | - Katharina Kusejko
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich.,Institute of Medical Virology, University of Zurich, Zurich
| | - Karin J Metzner
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich.,Institute of Medical Virology, University of Zurich, Zurich
| | - Christine Leemann
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich.,Institute of Medical Virology, University of Zurich, Zurich
| | - Corinne D Schenkel
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich.,Institute of Medical Virology, University of Zurich, Zurich
| | - Nadine Bachmann
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich.,Institute of Medical Virology, University of Zurich, Zurich
| | - Susana Posada
- Department of Biosystems Science and Engineering, ETH Zurich.,SIB Swiss Institute of Bioinformatics, University of Basel, Basel
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich.,SIB Swiss Institute of Bioinformatics, University of Basel, Basel
| | - Jürg Böni
- Institute of Medical Virology, University of Zurich, Zurich.,Swiss National Center for Retroviruses, University of Zurich, Zurich
| | - Sabine Yerly
- Laboratory of Virology and Division of Infectious Diseases, Geneva University Hospital, Geneva
| | - Thomas Klimkait
- Molecular Virology, Department of Biomedicine-Petersplatz, University of Basel, Basel
| | - Matthieu Perreau
- Division of Immunology and Allergy, Lausanne University Hospital, Lausanne
| | - Dominique L Braun
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich.,Institute of Medical Virology, University of Zurich, Zurich
| | - Andri Rauch
- Department of Infectious Diseases, Bern University Hospital, University of Bern, Bern
| | - Alexandra Calmy
- Laboratory of Virology and Division of Infectious Diseases, Geneva University Hospital, Geneva
| | | | - Manuel Battegay
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Basel, Basel
| | - Pietro Vernazza
- Division of Infectious Diseases, Cantonal Hospital St. Gallen, St. Gallen
| | - Enos Bernasconi
- Division of Infectious Diseases, Regional Hospital Lugano, Lugano, Switzerland
| | - Huldrych F Günthard
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich.,Institute of Medical Virology, University of Zurich, Zurich
| | - Roger D Kouyos
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich.,Institute of Medical Virology, University of Zurich, Zurich
| | | |
Collapse
|
36
|
Pérez-Losada M, Arenas M, Galán JC, Bracho MA, Hillung J, García-González N, González-Candelas F. High-throughput sequencing (HTS) for the analysis of viral populations. INFECTION GENETICS AND EVOLUTION 2020; 80:104208. [PMID: 32001386 DOI: 10.1016/j.meegid.2020.104208] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Revised: 01/21/2020] [Accepted: 01/24/2020] [Indexed: 12/12/2022]
Abstract
The development of High-Throughput Sequencing (HTS) technologies is having a major impact on the genomic analysis of viral populations. Current HTS platforms can capture nucleic acid variation across millions of genes for both selected amplicons and full viral genomes. HTS has already facilitated the discovery of new viruses, hinted new taxonomic classifications and provided a deeper and broader understanding of their diversity, population and genetic structure. Hence, HTS has already replaced standard Sanger sequencing in basic and applied research fields, but the next step is its implementation as a routine technology for the analysis of viruses in clinical settings. The most likely application of this implementation will be the analysis of viral genomics, because the huge population sizes, high mutation rates and very fast replacement of viral populations have demonstrated the limited information obtained with Sanger technology. In this review, we describe new technologies and provide guidelines for the high-throughput sequencing and genetic and evolutionary analyses of viral populations and metaviromes, including software applications. With the development of new HTS technologies, new and refurbished molecular and bioinformatic tools are also constantly being developed to process and integrate HTS data. These allow assembling viral genomes and inferring viral population diversity and dynamics. Finally, we also present several applications of these approaches to the analysis of viral clinical samples including transmission clusters and outbreak characterization.
Collapse
Affiliation(s)
- Marcos Pérez-Losada
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA; CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Campus Agrário de Vairão, Vairão 4485-661, Portugal
| | - Miguel Arenas
- Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain; Biomedical Research Center (CINBIO), University of Vigo, 36310 Vigo, Spain.
| | - Juan Carlos Galán
- Microbiology Service, Hospital Ramón y Cajal, Madrid, Spain; CIBER in Epidemiology and Public Health, Spain.
| | - Mª Alma Bracho
- CIBER in Epidemiology and Public Health, Spain; Joint Research Unit "Infection and Public Health" FISABIO-University of Valencia, Valencia, Spain.
| | - Julia Hillung
- Joint Research Unit "Infection and Public Health" FISABIO-University of Valencia, Valencia, Spain; Institute for Integrative Systems Biology (I2SysBio), CSIC-University of Valencia, Valencia, Spain.
| | - Neris García-González
- Joint Research Unit "Infection and Public Health" FISABIO-University of Valencia, Valencia, Spain; Institute for Integrative Systems Biology (I2SysBio), CSIC-University of Valencia, Valencia, Spain.
| | - Fernando González-Candelas
- CIBER in Epidemiology and Public Health, Spain; Joint Research Unit "Infection and Public Health" FISABIO-University of Valencia, Valencia, Spain; Institute for Integrative Systems Biology (I2SysBio), CSIC-University of Valencia, Valencia, Spain.
| |
Collapse
|
37
|
Hao M, Qiao J, Qi H. Current and Emerging Methods for the Synthesis of Single-Stranded DNA. Genes (Basel) 2020; 11:E116. [PMID: 31973021 PMCID: PMC7073533 DOI: 10.3390/genes11020116] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Revised: 01/16/2020] [Accepted: 01/18/2020] [Indexed: 12/21/2022] Open
Abstract
Methods for synthesizing arbitrary single-strand DNA (ssDNA) fragments are rapidly becoming fundamental tools for gene editing, DNA origami, DNA storage, and other applications. To meet the rising application requirements, numerous methods have been developed to produce ssDNA. Some approaches allow the synthesis of freely chosen user-defined ssDNA sequences to overcome the restrictions and limitations of different length, purity, and yield. In this perspective, we provide an overview of the representative ssDNA production strategies and their most significant challenges to enable the readers to make informed choices of synthesis methods and enhance the availability of increasingly inexpensive synthetic ssDNA. We also aim to stimulate a broader interest in the continued development of efficient ssDNA synthesis techniques and improve their applications in future research.
Collapse
Affiliation(s)
- Min Hao
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China; (M.H.); (J.Q.)
- Key Laboratory of Systems Bioengineering of Ministry of Education, Tianjin University, Tianjin 300072, China
- SynBio Research Platform, Collaborative Innovation Center of Chemical Science and Engineering, Tianjin University, Tianjin 300072, China
| | - Jianjun Qiao
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China; (M.H.); (J.Q.)
- Key Laboratory of Systems Bioengineering of Ministry of Education, Tianjin University, Tianjin 300072, China
- SynBio Research Platform, Collaborative Innovation Center of Chemical Science and Engineering, Tianjin University, Tianjin 300072, China
| | - Hao Qi
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China; (M.H.); (J.Q.)
- Key Laboratory of Systems Bioengineering of Ministry of Education, Tianjin University, Tianjin 300072, China
- SynBio Research Platform, Collaborative Innovation Center of Chemical Science and Engineering, Tianjin University, Tianjin 300072, China
| |
Collapse
|
38
|
Hirsch HH. Spatiotemporal Virus Surveillance for Severe Acute Respiratory Infections in Resource-limited Settings: How Deep Need We Go? Clin Infect Dis 2020; 68:1126-1128. [PMID: 30099498 PMCID: PMC7108180 DOI: 10.1093/cid/ciy663] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Accepted: 08/06/2018] [Indexed: 12/22/2022] Open
Affiliation(s)
- Hans H Hirsch
- Division of Infection Diagnostics, Department of Biomedicine, University of Basel, Switzerland.,Transplantation and Clinical Virology, Department of Biomedicine, University of Basel, Switzerland.,Infectious Diseases and Hospital Epidemiology, University Hospital Basel, Switzerland
| |
Collapse
|
39
|
Chen J, Shang J, Wang J, Sun Y. A binning tool to reconstruct viral haplotypes from assembled contigs. BMC Bioinformatics 2019; 20:544. [PMID: 31684876 PMCID: PMC6829986 DOI: 10.1186/s12859-019-3138-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Accepted: 10/09/2019] [Indexed: 11/21/2022] Open
Abstract
Background Infections by RNA viruses such as Influenza, HIV still pose a serious threat to human health despite extensive research on viral diseases. One challenge for producing effective prevention and treatment strategies is high intra-species genetic diversity. As different strains may have different biological properties, characterizing the genetic diversity is thus important to vaccine and drug design. Next-generation sequencing technology enables comprehensive characterization of both known and novel strains and has been widely adopted for sequencing viral populations. However, genome-scale reconstruction of haplotypes is still a challenging problem. In particular, haplotype assembly programs often produce contigs rather than full genomes. As a mutation in one gene can mask the phenotypic effects of a mutation at another locus, clustering these contigs into genome-scale haplotypes is still needed. Results We developed a contig binning tool, VirBin, which clusters contigs into different groups so that each group represents a haplotype. Commonly used features based on sequence composition and contig coverage cannot effectively distinguish viral haplotypes because of their high sequence similarity and heterogeneous sequencing coverage for RNA viruses. VirBin applied prototype-based clustering to cluster regions that are more likely to contain mutations specific to a haplotype. The tool was tested on multiple simulated sequencing data with different haplotype abundance distributions and contig sizes, and also on mock quasispecies sequencing data. The benchmark results with other contig binning tools demonstrated the superior sensitivity and precision of VirBin in contig binning for viral haplotype reconstruction. Conclusions In this work, we presented VirBin, a new contig binning tool for distinguishing contigs from different viral haplotypes with high sequence similarity. It competes favorably with other tools on viral contig binning. The source codes are available at: https://github.com/chjiao/VirBin.
Collapse
Affiliation(s)
- Jiao Chen
- Computer Science and Engineering, Michigan State University, East Lansing, 48824, USA
| | - Jiayu Shang
- Electrical Engineering, City University of Hong Kong, Hong Kong, China
| | - Jianrong Wang
- Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, 48824, USA
| | - Yanni Sun
- Electrical Engineering, City University of Hong Kong, Hong Kong, China.
| |
Collapse
|
40
|
Chen J, Zhao Y, Sun Y. De novo haplotype reconstruction in viral quasispecies using paired-end read guided path finding. Bioinformatics 2019; 34:2927-2935. [PMID: 29617936 DOI: 10.1093/bioinformatics/bty202] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2017] [Accepted: 04/02/2018] [Indexed: 12/29/2022] Open
Abstract
Motivation RNA virus populations contain different but genetically related strains, all infecting an individual host. Reconstruction of the viral haplotypes is a fundamental step to characterize the virus population, predict their viral phenotypes and finally provide important information for clinical treatment and prevention. Advances of the next-generation sequencing technologies open up new opportunities to assemble full-length haplotypes. However, error-prone short reads, high similarities between related strains, an unknown number of haplotypes pose computational challenges for reference-free haplotype reconstruction. There is still much room to improve the performance of existing haplotype assembly tools. Results In this work, we developed a de novo haplotype reconstruction tool named PEHaplo, which employs paired-end reads to distinguish highly similar strains for viral quasispecies data. It was applied on both simulated and real quasispecies data, and the results were benchmarked against several recently published de novo haplotype reconstruction tools. The comparison shows that PEHaplo outperforms the benchmarked tools in a comprehensive set of metrics. Availability and implementation The source code and the documentation of PEHaplo are available at https://github.com/chjiao/PEHaplo. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jiao Chen
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, USA
| | - Yingchao Zhao
- School of Computing and Information Sciences, Caritas Institute of Higher Education, Hong Kong, China
| | - Yanni Sun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, USA
| |
Collapse
|
41
|
Bachmann N, von Siebenthal C, Vongrad V, Turk T, Neumann K, Beerenwinkel N, Bogojeska J, Fellay J, Roth V, Kok YL, Thorball CW, Borghesi A, Parbhoo S, Wieser M, Böni J, Perreau M, Klimkait T, Yerly S, Battegay M, Rauch A, Hoffmann M, Bernasconi E, Cavassini M, Kouyos RD, Günthard HF, Metzner KJ. Determinants of HIV-1 reservoir size and long-term dynamics during suppressive ART. Nat Commun 2019; 10:3193. [PMID: 31324762 PMCID: PMC6642170 DOI: 10.1038/s41467-019-10884-9] [Citation(s) in RCA: 95] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2019] [Accepted: 06/05/2019] [Indexed: 12/20/2022] Open
Abstract
The HIV-1 reservoir is the major hurdle to a cure. We here evaluate viral and host characteristics associated with reservoir size and long-term dynamics in 1,057 individuals on suppressive antiretroviral therapy for a median of 5.4 years. At the population level, the reservoir decreases with diminishing differences over time, but increases in 26.6% of individuals. Viral blips and low-level viremia are significantly associated with slower reservoir decay. Initiation of ART within the first year of infection, pretreatment viral load, and ethnicity affect reservoir size, but less so long-term dynamics. Viral blips and low-level viremia are thus relevant for reservoir and cure studies. Here, Bachmann et al. provide data on long-term dynamics of the HIV-1 reservoir in 1,057 individuals on suppressive antiretroviral therapy and show that in 26.6% of individuals the reservoir increases. Viral blips and low-level viremia are significantly associated with a slower reservoir decay.
Collapse
Affiliation(s)
- Nadine Bachmann
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, 8091, Zurich, Switzerland.,Institute of Medical Virology, University of Zurich, 8057, Zurich, Switzerland
| | - Chantal von Siebenthal
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, 8091, Zurich, Switzerland.,Institute of Medical Virology, University of Zurich, 8057, Zurich, Switzerland
| | - Valentina Vongrad
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, 8091, Zurich, Switzerland.,Institute of Medical Virology, University of Zurich, 8057, Zurich, Switzerland
| | - Teja Turk
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, 8091, Zurich, Switzerland.,Institute of Medical Virology, University of Zurich, 8057, Zurich, Switzerland
| | - Kathrin Neumann
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, 8091, Zurich, Switzerland.,Institute of Medical Virology, University of Zurich, 8057, Zurich, Switzerland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, 4058, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, 4057 Basel, Switzerland
| | | | - Jaques Fellay
- School of Life Sciences, EPFL, 1015, Lausanne, Switzerland.,Precision Medicine Unit, Lausanne University Hospital, 1011, Lausanne, Switzerland
| | - Volker Roth
- Department of Mathematics and Computer Science, University of Basel, 4001, Basel, Switzerland
| | - Yik Lim Kok
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, 8091, Zurich, Switzerland.,Institute of Medical Virology, University of Zurich, 8057, Zurich, Switzerland
| | | | - Alessandro Borghesi
- School of Life Sciences, EPFL, 1015, Lausanne, Switzerland.,Neonatal Intensive Care Unit, Fondazione IRCCS Policlinico San Matteo, Pavia, 27100, Italy
| | - Sonali Parbhoo
- Department of Mathematics and Computer Science, University of Basel, 4001, Basel, Switzerland
| | - Mario Wieser
- Department of Mathematics and Computer Science, University of Basel, 4001, Basel, Switzerland
| | - Jürg Böni
- Institute of Medical Virology, University of Zurich, 8057, Zurich, Switzerland
| | - Matthieu Perreau
- Division of Immunology and Allergy, Centre Hospitalier Universitaire Vaudois, University of Lausanne, 1015, Lausanne, Switzerland
| | - Thomas Klimkait
- Division Infection Diagnostics, Department Biomedicine-Petersplatz, University of Basel, 4001, Basel, Switzerland
| | - Sabine Yerly
- Division of Infectious Diseases and Laboratory of Virology, University Hospital Geneva, University of Geneva, 1211, Geneva, Switzerland
| | - Manuel Battegay
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Basel, 4031, Basel, Switzerland
| | - Andri Rauch
- Department of Infectious Diseases, University Hospital Bern, 3010, Bern, Switzerland
| | - Matthias Hoffmann
- Division of Infectious Diseases, Cantonal Hospital of St. Gallen, 9007, St. Gallen, Switzerland
| | - Enos Bernasconi
- Infectious Diseases Service, Regional Hospital, 6900, Lugano, Switzerland
| | - Matthias Cavassini
- Division of Infectious Diseases, Centre Hospitalier Universitaire Vaudois, University of Lausanne, 1015, Lausanne, Switzerland
| | - Roger D Kouyos
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, 8091, Zurich, Switzerland.,Institute of Medical Virology, University of Zurich, 8057, Zurich, Switzerland
| | - Huldrych F Günthard
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, 8091, Zurich, Switzerland. .,Institute of Medical Virology, University of Zurich, 8057, Zurich, Switzerland.
| | - Karin J Metzner
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, 8091, Zurich, Switzerland.,Institute of Medical Virology, University of Zurich, 8057, Zurich, Switzerland
| | | |
Collapse
|
42
|
Cuypers L, Thijssen M, Shakibzadeh A, Sabahi F, Ravanshad M, Pourkarim MR. Next-generation sequencing for the clinical management of hepatitis C virus infections: does one test fits all purposes? Crit Rev Clin Lab Sci 2019; 56:420-434. [PMID: 31317801 DOI: 10.1080/10408363.2019.1637394] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
While the prospect of viral cure is higher than ever for individuals infected with the hepatitis C virus (HCV) due to ground-breaking progress in antiviral treatment, success rates are still negatively influenced by HCV's high genetic variability. This genetic diversity is represented in the circulation of various genotypes and subtypes, mixed infections, recombinant forms and the presence of numerous drug resistant variants among infected individuals. Common misclassifications by commercial genotyping assays in combination with the limitations of currently used targeted population sequencing approaches have encouraged researchers to exploit alternative methods for the clinical management of HCV infections. Next-generation sequencing (NGS), a revolutionary and powerful tool with a variety of applications in clinical virology, can characterize viral diversity and depict viral dynamics in an ultra-wide and ultra-deep manner. The level of detail it provides makes it the method of choice for the diagnosis and clinical assessment of HCV infections. The sequence library provided by NGS is of a higher magnitude and sensitivity than data generated by conventional methods. Therefore, these technologies are helpful to guide clinical practice and at the same time highly valuable for epidemiological studies. The decreasing costs of NGS to determine genotypes, mixed infections, recombinant strains and drug resistant variants will soon make it feasible to employ NGS in clinical laboratories, to assist in the daily care of patients with HCV.
Collapse
Affiliation(s)
- Lize Cuypers
- Laboratory of Clinical and Epidemiological Virology, Department of Microbiology, Immunology and Transplantation, Rega Institute for Medical Research, KU Leuven , Leuven , Belgium
| | - Marijn Thijssen
- Laboratory of Clinical and Epidemiological Virology, Department of Microbiology, Immunology and Transplantation, Rega Institute for Medical Research, KU Leuven , Leuven , Belgium
| | - Arash Shakibzadeh
- Department of Medical Virology, Faculty of Medical Sciences, Tarbiat Modares University , Tehran , Iran
| | - Farzaneh Sabahi
- Department of Medical Virology, Faculty of Medical Sciences, Tarbiat Modares University , Tehran , Iran
| | - Mehrdad Ravanshad
- Department of Medical Virology, Faculty of Medical Sciences, Tarbiat Modares University , Tehran , Iran
| | - Mahmoud Reza Pourkarim
- Laboratory of Clinical and Epidemiological Virology, Department of Microbiology, Immunology and Transplantation, Rega Institute for Medical Research, KU Leuven , Leuven , Belgium.,Health Policy Research Center, Institute of Health, Shiraz University of Medical Sciences , Shiraz , Iran.,Blood Transfusion Research Center, High Institute for Research and Education in Transfusion Medicine , Tehran , Iran
| |
Collapse
|
43
|
Bertels F, Leemann C, Metzner KJ, Regoes R. Parallel evolution of HIV-1 in a long-term experiment. Mol Biol Evol 2019; 36:2400-2414. [PMID: 31251344 PMCID: PMC6805227 DOI: 10.1093/molbev/msz155] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2018] [Revised: 05/06/2019] [Accepted: 06/22/2019] [Indexed: 12/15/2022] Open
Abstract
One of the most intriguing puzzles in biology is the degree to which evolution is repeatable. The repeatability of evolution, or parallel evolution, has been studied in a variety of model systems, but has rarely been investigated with clinically relevant viruses. To investigate parallel evolution of HIV-1, we passaged two replicate HIV-1 populations for almost 1 year in each of two human T-cell lines. For each of the four evolution lines, we determined the genetic composition of the viral population at nine time points by deep sequencing the entire genome. Mutations that were carried by the majority of the viral population accumulated continuously over 1 year in each evolution line. Many majority mutations appeared in more than one evolution line, that is, our experiments showed an extreme degree of parallel evolution. In one of the evolution lines, 62% of the majority mutations also occur in another line. The parallelism impairs our ability to reconstruct the evolutionary history by phylogenetic methods. We show that one can infer the correct phylogenetic topology by including minority mutations in our analysis. We also find that mutation diversity at the beginning of the experiment is predictive of the frequency of majority mutations at the end of the experiment.
Collapse
Affiliation(s)
- Frederic Bertels
- Department of Environmental Systems Sciences, ETH Zurich, Zurich.,Max-Planck-Institute for Evolutionary Biology, Department of Microbial Population Biology
| | - Christine Leemann
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, Zurich.,Insitute of Medical Virology, University of Zurich, Zurich
| | - Karin J Metzner
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, Zurich.,Insitute of Medical Virology, University of Zurich, Zurich
| | - Roland Regoes
- Department of Environmental Systems Sciences, ETH Zurich, Zurich
| |
Collapse
|
44
|
Baaijens JA, Van der Roest B, Köster J, Stougie L, Schönhuth A. Full-length de novo viral quasispecies assembly through variation graph construction. Bioinformatics 2019; 35:5086-5094. [DOI: 10.1093/bioinformatics/btz443] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2018] [Revised: 04/17/2019] [Accepted: 05/27/2019] [Indexed: 11/14/2022] Open
Abstract
Abstract
Motivation
Viruses populate their hosts as a viral quasispecies: a collection of genetically related mutant strains. Viral quasispecies assembly is the reconstruction of strain-specific haplotypes from read data, and predicting their relative abundances within the mix of strains is an important step for various treatment-related reasons. Reference genome independent (‘de novo’) approaches have yielded benefits over reference-guided approaches, because reference-induced biases can become overwhelming when dealing with divergent strains. While being very accurate, extant de novo methods only yield rather short contigs. The remaining challenge is to reconstruct full-length haplotypes together with their abundances from such contigs.
Results
We present Virus-VG as a de novo approach to viral haplotype reconstruction from preassembled contigs. Our method constructs a variation graph from the short input contigs without making use of a reference genome. Then, to obtain paths through the variation graph that reflect the original haplotypes, we solve a minimization problem that yields a selection of maximal-length paths that is, optimal in terms of being compatible with the read coverages computed for the nodes of the variation graph. We output the resulting selection of maximal length paths as the haplotypes, together with their abundances. Benchmarking experiments on challenging simulated and real datasets show significant improvements in assembly contiguity compared to the input contigs, while preserving low error rates compared to the state-of-the-art viral quasispecies assemblers.
Availability and implementation
Virus-VG is freely available at https://bitbucket.org/jbaaijens/virus-vg.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jasmijn A Baaijens
- Life Sciences and Health Group, Centrum Wiskunde & Informatica, Amsterdam, Netherlands
| | | | - Johannes Köster
- Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
- Medical Oncology, Dana Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Leen Stougie
- Life Sciences and Health Group, Centrum Wiskunde & Informatica, Amsterdam, Netherlands
- Department of Econometrics and Operations Research, Vrije Universiteit, Amsterdam, Netherlands
- INRIA-Erable, Grenoble, France
| | - Alexander Schönhuth
- Life Sciences and Health Group, Centrum Wiskunde & Informatica, Amsterdam, Netherlands
- INRIA-Erable, Grenoble, France
- Theoretical Biology and Bioinformatics, Utrecht University, Utrecht, Netherlands
| |
Collapse
|
45
|
Liu CC, Ji H. PCR Amplification Strategies Towards Full-length HIV-1 Genome Sequencing. Curr HIV Res 2019; 16:98-105. [PMID: 29943704 DOI: 10.2174/1570162x16666180626152252] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Revised: 05/05/2018] [Accepted: 06/20/2018] [Indexed: 11/22/2022]
Abstract
The advent of next-generation sequencing has enabled greater resolution of viral diversity and improved feasibility of full viral genome sequencing allowing routine HIV-1 full genome sequencing in both research and diagnostic settings. Regardless of the sequencing platform selected, successful PCR amplification of the HIV-1 genome is essential for sequencing template preparation. As such, full HIV-1 genome amplification is a crucial step in dictating the successful and reliable sequencing downstream. Here we reviewed existing PCR protocols leading to HIV-1 full genome sequencing. In addition to the discussion on basic considerations on relevant PCR design, the advantages as well as the pitfalls of the published protocols were reviewed.
Collapse
Affiliation(s)
- Chao Chun Liu
- National Microbiology Laboratory at JC Wilt Infectious Diseases Research Center, Public Health Agency of Canada, Winnipeg, Canada
| | - Hezhao Ji
- National Microbiology Laboratory at JC Wilt Infectious Diseases Research Center, Public Health Agency of Canada, Winnipeg, Canada.,Department of Medical Microbiology and Infectious Diseases, University of Manitoba, Winnipeg, Canada
| |
Collapse
|
46
|
Du N, Chen J, Sun Y. Improving the sensitivity of long read overlap detection using grouped short k-mer matches. BMC Genomics 2019; 20:190. [PMID: 30967123 PMCID: PMC6456931 DOI: 10.1186/s12864-019-5475-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Background Single-molecule, real-time sequencing (SMRT) developed by Pacific BioSciences produces longer reads than second-generation sequencing technologies such as Illumina. The increased read length enables PacBio sequencing to close gaps in genome assembly, reveal structural variations, and characterize the intra-species variations. It also holds the promise to decipher the community structure in complex microbial communities because long reads help metagenomic assembly. One key step in genome assembly using long reads is to quickly identify reads forming overlaps. Because PacBio data has higher sequencing error rate and lower coverage than popular short read sequencing technologies (such as Illumina), efficient detection of true overlaps requires specially designed algorithms. In particular, there is still a need to improve the sensitivity of detecting small overlaps or overlaps with high error rates in both reads. Addressing this need will enable better assembly for metagenomic data produced by third-generation sequencing technologies. Results In this work, we designed and implemented an overlap detection program named GroupK, for third-generation sequencing reads based on grouped k-mer hits. While using k-mer hits for detecting reads’ overlaps has been adopted by several existing programs, our method uses a group of short k-mer hits satisfying statistically derived distance constraints to increase the sensitivity of small overlap detection. Grouped k-mer hit was originally designed for homology search. We are the first to apply group hit for long read overlap detection. The experimental results of applying our pipeline to both simulated and real third-generation sequencing data showed that GroupK enables more sensitive overlap detection, especially for datasets of low sequencing coverage. Conclusions GroupK is best used for detecting small overlaps for third-generation sequencing data. It provides a useful supplementary tool to existing ones for more sensitive and accurate overlap detection. The source code is freely available at https://github.com/Strideradu/GroupK.
Collapse
Affiliation(s)
- Nan Du
- Department of Computer Science and Engineering, Michigan State University, East Lansing, 48824, MI, USA
| | - Jiao Chen
- Department of Computer Science and Engineering, Michigan State University, East Lansing, 48824, MI, USA
| | - Yanni Sun
- Electronic Engineering Department, City University of Hong Kong, Hong Kong SAR, China.
| |
Collapse
|
47
|
Mangul S, Martin LS, Hill BL, Lam AKM, Distler MG, Zelikovsky A, Eskin E, Flint J. Systematic benchmarking of omics computational tools. Nat Commun 2019; 10:1393. [PMID: 30918265 PMCID: PMC6437167 DOI: 10.1038/s41467-019-09406-4] [Citation(s) in RCA: 82] [Impact Index Per Article: 16.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Accepted: 03/06/2019] [Indexed: 01/11/2023] Open
Abstract
Computational omics methods packaged as software have become essential to modern biological research. The increasing dependence of scientists on these powerful software tools creates a need for systematic assessment of these methods, known as benchmarking. Adopting a standardized benchmarking practice could help researchers who use omics data to better leverage recent technological innovations. Our review summarizes benchmarking practices from 25 recent studies and discusses the challenges, advantages, and limitations of benchmarking across various domains of biology. We also propose principles that can make computational biology benchmarking studies more sustainable and reproducible, ultimately increasing the transparency of biomedical data and results. Benchmarking studies are important for comprehensively understanding and evaluating different computational omics methods. Here, the authors review practices from 25 recent studies and propose principles to improve the quality of benchmarking studies.
Collapse
Affiliation(s)
- Serghei Mangul
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA, 90095, USA. .,Institute for Quantitative and Computational Biosciences, University of California Los Angeles, 611 Charles E Young Drive East, Los Angeles, CA, 90095, USA.
| | - Lana S Martin
- Institute for Quantitative and Computational Biosciences, University of California Los Angeles, 611 Charles E Young Drive East, Los Angeles, CA, 90095, USA
| | - Brian L Hill
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA, 90095, USA
| | - Angela Ka-Mei Lam
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA, 90095, USA
| | - Margaret G Distler
- Department of Psychiatry and Biobehavioral Sciences, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, 90095, USA
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, Atlanta, GA, 30303, USA.,The Laboratory of Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, 119991, Russia
| | - Eleazar Eskin
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA, 90095, USA.,Department of Human Genetics, University of California Los Angeles, 695 Charles E. Young, Los Angeles, CA, USA
| | - Jonathan Flint
- Department of Psychiatry and Biobehavioral Sciences, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, 90095, USA
| |
Collapse
|
48
|
Abayasingam A, Leung P, Eltahla A, Bull RA, Luciani F, Grebely J, Dore GJ, Applegate T, Page K, Bruneau J, Cox AL, Kim AY, Schinkel J, Shoukry NH, Lauer GM, Maher L, Hellard M, Prins M, Lloyd A, Rodrigo C. Genomic characterization of hepatitis C virus transmitted founder variants with deep sequencing. INFECTION GENETICS AND EVOLUTION 2019; 71:36-41. [PMID: 30853512 DOI: 10.1016/j.meegid.2019.02.032] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/02/2018] [Revised: 02/26/2019] [Accepted: 02/28/2019] [Indexed: 12/30/2022]
Abstract
Transfer of hepatitis C virus (HCV) infection from a donor to a new recipient is associated with a bottleneck of genetic diversity in the transmitted viral variants. Existing data suggests that one, or very few, variants emerge from this bottleneck to establish the infection (transmitted founder [T/F] variants). In HCV, very few T/F variants have been characterized due to the challenges of obtaining early infection samples and of high throughput viral genome sequencing. This study used a large, acute HCV, deep-sequenced dataset from first viremia samples collected in nine prospective cohorts across four countries, to estimate the prevalence of single T/F viruses, and to identify host and virus-related factors associated with infections initiated by a single T/F variant. The short reads generated by Illumina sequencing were used to reconstruct viral haplotypes with two haplotype reconstruction algorithms. The haplotypes were examined for random mutations (Poisson distribution) and a star-like phylogeny to identify T/F viruses. The findings were cross-validated by haplotype reconstructions across three regions of the genome (Core-E2, NS3, NS5A) to minimize the possibility of spurious overestimation of single T/F variants. Of 190 acute infection samples examined, 54 were very early acute infections (HCV antibody negative, RNA positive), and single transmitted founders were identified in 14 (26%, 95% CI: 16-39%) after cross validation across multiple regions of the genome with two haplotype reconstruction algorithms. The presence of a single T/F virus was not associated with any host or virus-related factors, notably viral genotype or spontaneous clearance. In conclusion, approximately one in four new HCV infections originates from a single T/F virus. Resolution of genomic sequences of single T/F variants is the first step in exploring unique properties of these variants in the infection of host hepatocytes.
Collapse
Affiliation(s)
| | | | - Auda Eltahla
- School of Medical Sciences, Faculty of Medicine, UNSW, Sydney, NSW, Australia
| | - Rowena A Bull
- School of Medical Sciences, Faculty of Medicine, UNSW, Sydney, NSW, Australia
| | - Fabio Luciani
- School of Medical Sciences, Faculty of Medicine, UNSW, Sydney, NSW, Australia
| | | | | | | | - Kimberly Page
- Division of Epidemiology, Biostatistics and Preventive Medicine, University of New Mexico, Albuquerque, NM, USA
| | - Julie Bruneau
- CRCHUM, Université de Montréal, Montreal, QC, Canada
| | - Andrea L Cox
- Department of Medicine, Johns Hopkins Medical Institutions, Baltimore, MD, USA
| | | | - Janke Schinkel
- Department of Internal Medicine, Division of Infectious Diseases, Tropical Medicine and AIDS, Center for Infection and Immunity Amsterdam, Academic Medical Center, Meibergdreef 9, Amsterdam, The Netherlands
| | | | | | - Lisa Maher
- The Kirby Institute, UNSW, Sydney, NSW, Australia
| | - Margaret Hellard
- Burnet Institute, Melbourne, VIC, Australia; Monash University, Melbourne, Australia; Alfred Hospital, Melbourne, Australia; Doherty Institute and Melbourne School of Population and Global Health, University of Melbourne, Australia
| | - Maria Prins
- Department of Internal Medicine, Division of Infectious Diseases, Tropical Medicine and AIDS, Center for Infection and Immunity Amsterdam, Academic Medical Center, Meibergdreef 9, Amsterdam, The Netherlands; GGD Public Health Service of Amsterdam, Amsterdam, The Netherlands
| | - Andrew Lloyd
- The Kirby Institute, UNSW, Sydney, NSW, Australia
| | - Chaturaka Rodrigo
- School of Medical Sciences, Faculty of Medicine, UNSW, Sydney, NSW, Australia.
| | | |
Collapse
|
49
|
Barik S, Das S, Vikalo H. QSdpR: Viral quasispecies reconstruction via correlation clustering. Genomics 2018; 110:375-381. [DOI: 10.1016/j.ygeno.2017.12.007] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2017] [Revised: 12/03/2017] [Accepted: 12/13/2017] [Indexed: 02/05/2023]
|
50
|
Bons E, Bertels F, Regoes RR. Estimating the mutational fitness effects distribution during early HIV infection. Virus Evol 2018; 4:vey029. [PMID: 30310682 PMCID: PMC6172364 DOI: 10.1093/ve/vey029] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
The evolution of HIV during acute infection is often considered a neutral process. Recent analysis of sequencing data from this stage of infection, however, showed high levels of shared mutations between independent viral populations. This suggests that selection might play a role in the early stages of HIV infection. We adapted an existing model for random evolution during acute HIV-infection to include selection. Simulations of this model were used to fit a global mutational fitness effects distribution to previously published sequencing data of the env gene of individuals with acute HIV infection. Measures of sharing between viral populations were used as summary statistics to compare the data to the simulations. We confirm that evolution during acute infection is significantly different from neutral. The distribution of mutational fitness effects is best fit by a distribution with a low, but significant fraction of beneficial mutations and a high fraction of deleterious mutations. While most mutations are neutral or deleterious in this model, about 5% of mutations are beneficial. These beneficial mutations will, on average, result in a small but significant increase in fitness. When assuming no epistasis, this indicates that, at the moment of transmission, HIV is near, but not on the fitness peak for early infection.
Collapse
Affiliation(s)
- Eva Bons
- Department of Environmental Systems Science, Institute of Integrative Biology, ETH Zurich, Universitätstrasse 16, Zurich, Switzerland
| | - Frederic Bertels
- Department of Environmental Systems Science, Institute of Integrative Biology, ETH Zurich, Universitätstrasse 16, Zurich, Switzerland.,Department for Evolutionary Theory, Max Planck Institute for Evolutionary Biology, August-Thienemann-Str. 2, Plön, Germany
| | - Roland R Regoes
- Department of Environmental Systems Science, Institute of Integrative Biology, ETH Zurich, Universitätstrasse 16, Zurich, Switzerland
| |
Collapse
|