1
|
Jiao X, Imamichi H, Sherman BT, Nahar R, Dewar RL, Lane HC, Imamichi T, Chang W. QuasiSeq: profiling viral quasispecies via self-tuning spectral clustering with PacBio long sequencing reads. Bioinformatics 2022; 38:3192-3199. [PMID: 35532087 PMCID: PMC9890302 DOI: 10.1093/bioinformatics/btac313] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2021] [Revised: 04/27/2022] [Accepted: 05/04/2022] [Indexed: 02/04/2023] Open
Abstract
MOTIVATION The existence of quasispecies in the viral population causes difficulties for disease prevention and treatment. High-throughput sequencing provides opportunity to determine rare quasispecies and long sequencing reads covering full genomes reduce quasispecies determination to a clustering problem. The challenge is high similarity of quasispecies and high error rate of long sequencing reads. RESULTS We developed QuasiSeq using a novel signature-based self-tuning clustering method, SigClust, to profile viral mixtures with high accuracy and sensitivity. QuasiSeq can correctly identify quasispecies even using low-quality sequencing reads (accuracy <80%) and produce quasispecies sequences with high accuracy (≥99.55%). Using high-quality circular consensus sequencing reads, QuasiSeq can produce quasispecies sequences with 100% accuracy. QuasiSeq has higher sensitivity and specificity than similar published software. Moreover, the requirement of the computational resource can be controlled by the size of the signature, which makes it possible to handle big sequencing data for rare quasispecies discovery. Furthermore, parallel computation is implemented to process the clusters and further reduce the runtime. Finally, we developed a web interface for the QuasiSeq workflow with simple parameter settings based on the quality of sequencing data, making it easy to use for users without advanced data science skills. AVAILABILITY AND IMPLEMENTATION QuasiSeq is open source and freely available at https://github.com/LHRI-Bioinformatics/QuasiSeq. The current release (v1.0.0) is archived and available at https://zenodo.org/badge/latestdoi/340494542. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | | | | | - Robin L Dewar
- Virus Isolation and Serology Laboratory, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA
| | - H Clifford Lane
- Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases, Bethesda, MD 20892, USA
| | - Tomozumi Imamichi
- Laboratory of Human Retrovirology and Immunoinformatics, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA
| | | |
Collapse
|
2
|
Hoang MTV, Irinyi L, Hu Y, Schwessinger B, Meyer W. Long-Reads-Based Metagenomics in Clinical Diagnosis With a Special Focus on Fungal Infections. Front Microbiol 2022; 12:708550. [PMID: 35069461 PMCID: PMC8770865 DOI: 10.3389/fmicb.2021.708550] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Accepted: 12/03/2021] [Indexed: 12/12/2022] Open
Abstract
Identification of the causative infectious agent is essential in the management of infectious diseases, with the ideal diagnostic method being rapid, accurate, and informative, while remaining cost-effective. Traditional diagnostic techniques rely on culturing and cell propagation to isolate and identify the causative pathogen. These techniques are limited by the ability and the time required to grow or propagate an agent in vitro and the facts that identification based on morphological traits are non-specific, insensitive, and reliant on technical expertise. The evolution of next-generation sequencing has revolutionized genomic studies to generate more data at a cheaper cost. These are divided into short- and long-read sequencing technologies, depending on the length of reads generated during sequencing runs. Long-read sequencing also called third-generation sequencing emerged commercially through the instruments released by Pacific Biosciences and Oxford Nanopore Technologies, although relying on different sequencing chemistries, with the first one being more accurate both platforms can generate ultra-long sequence reads. Long-read sequencing is capable of entirely spanning previously established genomic identification regions or potentially small whole genomes, drastically improving the accuracy of the identification of pathogens directly from clinical samples. Long-read sequencing may also provide additional important clinical information, such as antimicrobial resistance profiles and epidemiological data from a single sequencing run. While initial applications of long-read sequencing in clinical diagnosis showed that it could be a promising diagnostic technique, it also has highlighted the need for further optimization. In this review, we show the potential long-read sequencing has in clinical diagnosis of fungal infections and discuss the pros and cons of its implementation.
Collapse
Affiliation(s)
- Minh Thuy Vi Hoang
- Molecular Mycology Research Laboratory, Centre for Infectious Diseases and Microbiology, Faculty of Medicine and Health, Sydney Medical School, Westmead Clinical School, The University of Sydney, Sydney, NSW, Australia
- Westmead Institute for Medical Research, Westmead, NSW, Australia
| | - Laszlo Irinyi
- Molecular Mycology Research Laboratory, Centre for Infectious Diseases and Microbiology, Faculty of Medicine and Health, Sydney Medical School, Westmead Clinical School, The University of Sydney, Sydney, NSW, Australia
- Westmead Institute for Medical Research, Westmead, NSW, Australia
- Sydney Infectious Disease Institute, The University of Sydney, Sydney, NSW, Australia
| | - Yiheng Hu
- Research School of Biology, Australia National University, Canberra, ACT, Australia
| | | | - Wieland Meyer
- Molecular Mycology Research Laboratory, Centre for Infectious Diseases and Microbiology, Faculty of Medicine and Health, Sydney Medical School, Westmead Clinical School, The University of Sydney, Sydney, NSW, Australia
- Westmead Institute for Medical Research, Westmead, NSW, Australia
- Sydney Infectious Disease Institute, The University of Sydney, Sydney, NSW, Australia
- Westmead Hospital (Research and Education Network), Westmead, NSW, Australia
| |
Collapse
|
3
|
Zhang Y, Ma L. Application of high-throughput sequencing technology in HIV drug resistance detection. BIOSAFETY AND HEALTH 2021. [DOI: 10.1016/j.bsheal.2021.06.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
|
4
|
Tedersoo L, Albertsen M, Anslan S, Callahan B. Perspectives and Benefits of High-Throughput Long-Read Sequencing in Microbial Ecology. Appl Environ Microbiol 2021; 87:e0062621. [PMID: 34132589 PMCID: PMC8357291 DOI: 10.1128/aem.00626-21] [Citation(s) in RCA: 58] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Short-read, high-throughput sequencing (HTS) methods have yielded numerous important insights into microbial ecology and function. Yet, in many instances short-read HTS techniques are suboptimal, for example, by providing insufficient phylogenetic resolution or low integrity of assembled genomes. Single-molecule and synthetic long-read (SLR) HTS methods have successfully ameliorated these limitations. In addition, nanopore sequencing has generated a number of unique analysis opportunities, such as rapid molecular diagnostics and direct RNA sequencing, and both Pacific Biosciences (PacBio) and nanopore sequencing support detection of epigenetic modifications. Although initially suffering from relatively low sequence quality, recent advances have greatly improved the accuracy of long-read sequencing technologies. In spite of great technological progress in recent years, the long-read HTS methods (PacBio and nanopore sequencing) are still relatively costly, require large amounts of high-quality starting material, and commonly need specific solutions in various analysis steps. Despite these challenges, long-read sequencing technologies offer high-quality, cutting-edge alternatives for testing hypotheses about microbiome structure and functioning as well as assembly of eukaryote genomes from complex environmental DNA samples.
Collapse
Affiliation(s)
- Leho Tedersoo
- Mycology and Microbiology Center, University of Tartu, Tartu, Estonia
| | - Mads Albertsen
- Department of Chemistry and Bioscience, Aalborg University, Aalborg, Denmark
| | - Sten Anslan
- Mycology and Microbiology Center, University of Tartu, Tartu, Estonia
- Braunschweig University of Technology, Zoological Institute, Braunschweig, Germany
| | - Benjamin Callahan
- Department of Population Health and Pathobiology, College of Veterinary Medicine and Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, USA
| |
Collapse
|
5
|
Gallardo CM, Wang S, Montiel-Garcia DJ, Little SJ, Smith DM, Routh AL, Torbett BE. MrHAMER yields highly accurate single molecule viral sequences enabling analysis of intra-host evolution. Nucleic Acids Res 2021; 49:e70. [PMID: 33849057 PMCID: PMC8266615 DOI: 10.1093/nar/gkab231] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Revised: 03/12/2021] [Accepted: 03/31/2021] [Indexed: 12/31/2022] Open
Abstract
Technical challenges remain in the sequencing of RNA viruses due to their high intra-host diversity. This bottleneck is particularly pronounced when interrogating long-range co-evolved genetic interactions given the read-length limitations of next-generation sequencing platforms. This has hampered the direct observation of these genetic interactions that code for protein-protein interfaces with relevance in both drug and vaccine development. Here we overcome these technical limitations by developing a nanopore-based long-range viral sequencing pipeline that yields accurate single molecule sequences of circulating virions from clinical samples. We demonstrate its utility in observing the evolution of individual HIV Gag-Pol genomes in response to antiviral pressure. Our pipeline, called Multi-read Hairpin Mediated Error-correction Reaction (MrHAMER), yields >1000s of viral genomes per sample at 99.9% accuracy, maintains the original proportion of sequenced virions present in a complex mixture, and allows the detection of rare viral genomes with their associated mutations present at <1% frequency. This method facilitates scalable investigation of genetic correlates of resistance to both antiviral therapy and immune pressure and enables the identification of novel host-viral and viral-viral interfaces that can be modulated for therapeutic benefit.
Collapse
Affiliation(s)
- Christian M Gallardo
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA.,Center for Immunity and Immunotherapies, Seattle Children's Research Institute, Seattle, WA, USA
| | - Shiyi Wang
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA.,Center for Immunity and Immunotherapies, Seattle Children's Research Institute, Seattle, WA, USA
| | - Daniel J Montiel-Garcia
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Susan J Little
- Division of Infectious Diseases and Global Public Health, University of California, San Diego, La Jolla, CA, USA
| | - Davey M Smith
- Division of Infectious Diseases and Global Public Health, University of California, San Diego, La Jolla, CA, USA.,Veterans Affairs San Diego Healthcare System, San Diego, CA, USA
| | - Andrew L Routh
- Department of Biochemistry and Molecular Biology, University of Texas Medical Branch, Galveston, TX, USA.,Sealy Center for Structural Biology, University of Texas Medical Branch, Galveston, TX, USA
| | - Bruce E Torbett
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA.,Center for Immunity and Immunotherapies, Seattle Children's Research Institute, Seattle, WA, USA.,Department of Pediatrics, University of Washington School of Medicine, Seattle, WA, USA
| |
Collapse
|
6
|
Shao W, Boltz VF, Hattori J, Bale MJ, Maldarelli F, Coffin JM, Kearney MF. Short Communication: HIV-DRLink: A Tool for Reporting Linked HIV-1 Drug Resistance Mutations in Large Single-Genome Data Sets Using the Stanford HIV Database. AIDS Res Hum Retroviruses 2020; 36:942-947. [PMID: 32683881 DOI: 10.1089/aid.2020.0109] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
The prevalence of HIV-1 drug resistance is increasing worldwide and monitoring its emergence is important for the successful management of populations receiving combination antiretroviral therapy. It is likely that pre-existing drug resistance mutations linked on the same viral genomes are predictive of treatment failure. Because of the large number of sequences generated by ultrasensitive single-genome sequencing (uSGS) and other similar next-generation sequencing methods, it is difficult to assess each sequence individually for linked drug resistance mutations. Several software/programs exist to report the frequencies of individual mutations in large data sets, but they provide no information on linkage of resistance mutations. In this study, we report the HIV-DRLink program, a research tool that provides resistance mutation frequencies as well as their genetic linkage by parsing and summarizing the Sierra output from the Stanford HIV Database. The HIV-DRLink program should only be used on data sets generated by methods that eliminate artifacts due to polymerase chain reaction recombination, for example, standard single-genome sequencing or uSGS. HIV-DRLink is exclusively a research tool and is not intended to inform clinical decisions.
Collapse
Affiliation(s)
- Wei Shao
- Advanced Biomedical Computing Science, Frederick National Laboratory for Cancer Research (FNLCR) sponsored by the National Cancer Institute, Frederick, Maryland, USA
| | - Valerie F. Boltz
- HIV Dynamics and Replication Program, CCR, National Cancer Institute, Frederick, Maryland, USA
| | - Junko Hattori
- HIV Dynamics and Replication Program, CCR, National Cancer Institute, Frederick, Maryland, USA
| | - Michael J. Bale
- HIV Dynamics and Replication Program, CCR, National Cancer Institute, Frederick, Maryland, USA
| | - Frank Maldarelli
- HIV Dynamics and Replication Program, CCR, National Cancer Institute, Frederick, Maryland, USA
| | - John M. Coffin
- Department of Molecular Biology and Microbiology, Tufts University, Boston, Massachusetts, USA
| | - Mary F. Kearney
- HIV Dynamics and Replication Program, CCR, National Cancer Institute, Frederick, Maryland, USA
| |
Collapse
|
7
|
O’Donnell ST, Ross RP, Stanton C. The Progress of Multi-Omics Technologies: Determining Function in Lactic Acid Bacteria Using a Systems Level Approach. Front Microbiol 2020; 10:3084. [PMID: 32047482 PMCID: PMC6997344 DOI: 10.3389/fmicb.2019.03084] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Accepted: 12/20/2019] [Indexed: 12/12/2022] Open
Abstract
Lactic Acid Bacteria (LAB) have long been recognized as having a significant impact ranging from commercial to health domains. A vast amount of research has been carried out on these microbes, deciphering many of the pathways and components responsible for these desirable effects. However, a large proportion of this functional information has been derived from a reductionist approach working with pure culture strains. This provides limited insight into understanding the impact of LAB within intricate systems such as the gut microbiome or multi strain starter cultures. Whole genome sequencing of strains and shotgun metagenomics of entire systems are powerful techniques that are currently widely used to decipher function in microbes, but they also have their limitations. An available genome or metagenome can provide an image of what a strain or microbiome, respectively, is potentially capable of and the functions that they may carry out. A top-down, multi-omics approach has the power to resolve the functional potential of an ecosystem into an image of what is being expressed, translated and produced. With this image, it is possible to see the real functions that members of a system are performing and allow more accurate and impactful predictions of the effects of these microorganisms. This review will discuss how technological advances have the potential to increase the yield of information from genomics, transcriptomics, proteomics and metabolomics. The potential for integrated omics to resolve the role of LAB in complex systems will also be assessed. Finally, the current software approaches for managing these omics data sets will be discussed.
Collapse
Affiliation(s)
- Shane Thomas O’Donnell
- Teagasc Food Research Centre, Moorepark, Fermoy, Ireland
- Department of Microbiology, University College Cork – National University of Ireland, Cork, Ireland
- APC Microbiome Ireland, Cork, Ireland
| | - R. Paul Ross
- Teagasc Food Research Centre, Moorepark, Fermoy, Ireland
- Department of Microbiology, University College Cork – National University of Ireland, Cork, Ireland
- APC Microbiome Ireland, Cork, Ireland
| | - Catherine Stanton
- Teagasc Food Research Centre, Moorepark, Fermoy, Ireland
- APC Microbiome Ireland, Cork, Ireland
| |
Collapse
|
8
|
Takeda H, Yamashita T, Ueda Y, Sekine A. Exploring the hepatitis C virus genome using single molecule real-time sequencing. World J Gastroenterol 2019; 25:4661-4672. [PMID: 31528092 PMCID: PMC6718035 DOI: 10.3748/wjg.v25.i32.4661] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Revised: 07/04/2019] [Accepted: 07/19/2019] [Indexed: 02/06/2023] Open
Abstract
Single molecular real-time (SMRT) sequencing, also called third-generation sequencing, is a novel sequencing technique capable of generating extremely long contiguous sequence reads. While conventional short-read sequencing cannot evaluate the linkage of nucleotide substitutions distant from one another, SMRT sequencing can directly demonstrate linkage of nucleotide changes over a span of more than 20 kbp, and thus can be applied to directly examine the haplotypes of viruses or bacteria whose genome structures are changing in real time. In addition, an error correction method (circular consensus sequencing) has been established and repeated sequencing of a single-molecule DNA template can result in extremely high accuracy. The advantages of long read sequencing enable accurate determination of the haplotypes of individual viral clones. SMRT sequencing has been applied in various studies of viral genomes including determination of the full-length contiguous genome sequence of hepatitis C virus (HCV), targeted deep sequencing of the HCV NS5A gene, and assessment of heterogeneity among viral populations. Recently, the emergence of multi-drug resistant HCV viruses has become a significant clinical issue and has been also demonstrated using SMRT sequencing. In this review, we introduce the novel third-generation PacBio RSII/Sequel systems, compare them with conventional next-generation sequencers, and summarize previous studies in which SMRT sequencing technology has been applied for HCV genome analysis. We also refer to another long-read sequencing platform, nanopore sequencing technology, and discuss the advantages, limitations and future perspectives in using these third-generation sequencers for HCV genome analysis.
Collapse
Affiliation(s)
- Haruhiko Takeda
- Department of Omics-based Medicine, Center for Preventive Medical Science, Chiba University, Chiba 260-0856, Japan
- Department of Gastroenterology and Hepatology, Graduate School of Medicine, Kyoto University, Kyoto 606-8507, Japan
| | - Taiki Yamashita
- Department of Omics-based Medicine, Center for Preventive Medical Science, Chiba University, Chiba 260-0856, Japan
| | - Yoshihide Ueda
- Department of Gastroenterology and Hepatology, Graduate School of Medicine, Kyoto University, Kyoto 606-8507, Japan
| | - Akihiro Sekine
- Department of Omics-based Medicine, Center for Preventive Medical Science, Chiba University, Chiba 260-0856, Japan
| |
Collapse
|
9
|
Single-Molecule Sequencing: Towards Clinical Applications. Trends Biotechnol 2019; 37:72-85. [DOI: 10.1016/j.tibtech.2018.07.013] [Citation(s) in RCA: 112] [Impact Index Per Article: 22.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2018] [Revised: 07/16/2018] [Accepted: 07/18/2018] [Indexed: 12/31/2022]
|
10
|
Full-Length Envelope Analyzer (FLEA): A tool for longitudinal analysis of viral amplicons. PLoS Comput Biol 2018; 14:e1006498. [PMID: 30543621 PMCID: PMC6314628 DOI: 10.1371/journal.pcbi.1006498] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2018] [Revised: 01/02/2019] [Accepted: 09/10/2018] [Indexed: 01/07/2023] Open
Abstract
Next generation sequencing of viral populations has advanced our understanding of viral population dynamics, the development of drug resistance, and escape from host immune responses. Many applications require complete gene sequences, which can be impossible to reconstruct from short reads. HIV env, the protein of interest for HIV vaccine studies, is exceptionally challenging for long-read sequencing and analysis due to its length, high substitution rate, and extensive indel variation. While long-read sequencing is attractive in this setting, the analysis of such data is not well handled by existing methods. To address this, we introduce FLEA (Full-Length Envelope Analyzer), which performs end-to-end analysis and visualization of long-read sequencing data. FLEA consists of both a pipeline (optionally run on a high-performance cluster), and a client-side web application that provides interactive results. The pipeline transforms FASTQ reads into high-quality consensus sequences (HQCSs) and uses them to build a codon-aware multiple sequence alignment. The resulting alignment is then used to infer phylogenies, selection pressure, and evolutionary dynamics. The web application provides publication-quality plots and interactive visualizations, including an annotated viral alignment browser, time series plots of evolutionary dynamics, visualizations of gene-wide selective pressures (such as dN/dS) across time and across protein structure, and a phylogenetic tree browser. We demonstrate how FLEA may be used to process Pacific Biosciences HIV env data and describe recent examples of its use. Simulations show how FLEA dramatically reduces the error rate of this sequencing platform, providing an accurate portrait of complex and variable HIV env populations. A public instance of FLEA is hosted at http://flea.datamonkey.org. The Python source code for the FLEA pipeline can be found at https://github.com/veg/flea-pipeline. The client-side application is available at https://github.com/veg/flea-web-app. A live demo of the P018 results can be found at http://flea.murrell.group/view/P018. Viral populations constantly evolve and diversify. In this article we introduce a method, FLEA, for reconstructing and visualizing the details of evolutionary changes. FLEA specifically processes data from sequencing platforms that generate reads that are long, but error-prone. To study the evolutionary dynamics of entire genes during viral infection, data is collected via long-read sequencing at discrete time points, allowing us to understand how the virus changes over time. However, the experimental and sequencing process is imperfect, so the resulting data contain not only real evolutionary changes, but also mutations and other genetic artifacts caused by sequencing errors. Our method corrects most of these errors by combining thousands of erroneous sequences into a much smaller number of unique consensus sequences that represent biologically meaningful variation. The resulting high-quality sequences are used for further analysis, such as building an evolutionary tree that tracks and interprets the genetic changes in the viral population over time. FLEA is open source, and is freely available online.
Collapse
|
11
|
Lambert C, Braxton C, Charlebois RL, Deyati A, Duncan P, La Neve F, Malicki HD, Ribrioux S, Rozelle DK, Michaels B, Sun W, Yang Z, Khan AS. Considerations for Optimization of High-Throughput Sequencing Bioinformatics Pipelines for Virus Detection. Viruses 2018; 10:E528. [PMID: 30262776 PMCID: PMC6213042 DOI: 10.3390/v10100528] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2018] [Revised: 09/19/2018] [Accepted: 09/25/2018] [Indexed: 02/07/2023] Open
Abstract
High-throughput sequencing (HTS) has demonstrated capabilities for broad virus detection based upon discovery of known and novel viruses in a variety of samples, including clinical, environmental, and biological. An important goal for HTS applications in biologics is to establish parameter settings that can afford adequate sensitivity at an acceptable computational cost (computation time, computer memory, storage, expense or/and efficiency), at critical steps in the bioinformatics pipeline, including initial data quality assessment, trimming/cleaning, and assembly (to reduce data volume and increase likelihood of appropriate sequence identification). Additionally, the quality and reliability of the results depend on the availability of a complete and curated viral database for obtaining accurate results; selection of sequence alignment programs and their configuration, that retains specificity for broad virus detection with reduced false-positive signals; removal of host sequences without loss of endogenous viral sequences of interest; and use of a meaningful reporting format, which can retain critical information of the analysis for presentation of readily interpretable data and actionable results. Furthermore, after alignment, both automated and manual evaluation may be needed to verify the results and help assign a potential risk level to residual, unmapped reads. We hope that the collective considerations discussed in this paper aid toward optimization of data analysis pipelines for virus detection by HTS.
Collapse
Affiliation(s)
| | | | - Robert L Charlebois
- Analytical Research and Development, Sanofi Pasteur, Toronto, ON M2R 3T4, Canada.
| | | | - Paul Duncan
- Merck & Co. Inc., West Point, PA 19486, USA.
| | | | | | | | | | - Brandye Michaels
- Analytical Research and Development: Microbiology, Pfizer Inc., Andover, MA 01810, USA.
| | | | - Zhihui Yang
- Office of Applied Research and Safety Assessment, Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration, Laurel, MD 20708, USA.
| | - Arifa S Khan
- Office of Vaccines Research and Review, Center for Biologics Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, MD 20993, USA.
| |
Collapse
|
12
|
Huang C, Sam V, Du S, Le T, Fletcher A, Lau W, Meyer K, Asaki E, Huang DW, Johnson C. Towards Personalized Medicine: An Improved De Novo Assembly Procedure for Early Detection of Drug Resistant HIV Minor Quasispecies in Patient Samples. Bioinformation 2018; 14:449-454. [PMID: 30310253 PMCID: PMC6166399 DOI: 10.6026/97320630014449] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2018] [Revised: 09/17/2018] [Accepted: 09/17/2018] [Indexed: 12/31/2022] Open
Abstract
The third-generation sequencing technology, PacBio, has shown an ability to sequence the HIV virus amplicons in their full length. The
long read of PaBio offers a distinct advantage to comprehensively understand the virus evolution complexity at quasispecies level (i.e.
maintaining linkage information of variants) comparing to the short reads from Illumina shotgun sequencing. However, due to the highnoise
nature of the PacBio reads, it is still a challenge to build accurate contigs at high sensitivity. Most of previously developed NGS
assembly tools work with the assumption that the input reads are fairly accurate, which is largely true for the data derived from Sanger or
Illumina technologies. When applying these tools on PacBio high-noise reads, they are largely driven by noise rather than true signal
eventually leading to poor results in most cases. In this study, we propose the de novo assembly procedure, which comprises a positivefocused
strategy, and linkage-frequency noise reduction so that it is more suitable for PacBio high-noise reads. We further tested the
unique de novo assembly procedure on HIV PacBio benchmark data and clinical samples, which accurately assembled dominant and minor
populations of HIV quasispecies as expected. The improved de novo assembly procedure shows potential ability to promote PacBio
technology in the field of HIV drug-resistance clinical detection, as well as in broad HIV phylogenetic studies.
Collapse
Affiliation(s)
- Cindy Huang
- Center for Information Technology, National Institutes of Health, Bethesda, Maryland 10891.,Thomas Wootton High School, Rockville, Maryland 20850
| | - Vichetra Sam
- Center for Information Technology, National Institutes of Health, Bethesda, Maryland 10891.,CSRA, Falls Church, VA 22042
| | - Sophie Du
- Center for Information Technology, National Institutes of Health, Bethesda, Maryland 10891.,CSRA, Falls Church, VA 22042
| | - Tuan Le
- Center for Information Technology, National Institutes of Health, Bethesda, Maryland 10891
| | - Anthony Fletcher
- Center for Information Technology, National Institutes of Health, Bethesda, Maryland 10891
| | - William Lau
- Center for Information Technology, National Institutes of Health, Bethesda, Maryland 10891
| | - Kathleen Meyer
- Center for Information Technology, National Institutes of Health, Bethesda, Maryland 10891.,CSRA, Falls Church, VA 22042
| | - Esther Asaki
- Center for Information Technology, National Institutes of Health, Bethesda, Maryland 10891.,CSRA, Falls Church, VA 22042
| | - Da Wei Huang
- Center for Information Technology, National Institutes of Health, Bethesda, Maryland 10891
| | - Calvin Johnson
- Center for Information Technology, National Institutes of Health, Bethesda, Maryland 10891
| |
Collapse
|
13
|
Brese RL, Gonzalez-Perez MP, Koch M, O'Connell O, Luzuriaga K, Somasundaran M, Clapham PR, Dollar JJ, Nolan DJ, Rose R, Lamers SL. Ultradeep single-molecule real-time sequencing of HIV envelope reveals complete compartmentalization of highly macrophage-tropic R5 proviral variants in brain and CXCR4-using variants in immune and peripheral tissues. J Neurovirol 2018; 24:439-453. [PMID: 29687407 PMCID: PMC7281851 DOI: 10.1007/s13365-018-0633-5] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2017] [Revised: 02/28/2018] [Accepted: 03/19/2018] [Indexed: 01/07/2023]
Abstract
Despite combined antiretroviral therapy (cART), HIV+ patients still develop neurological disorders, which may be due to persistent HIV infection and selective evolution in brain tissues. Single-molecule real-time (SMRT) sequencing technology offers an improved opportunity to study the relationship among HIV isolates in the brain and lymphoid tissues because it is capable of generating thousands of long sequence reads in a single run. Here, we used SMRT sequencing to generate ~ 50,000 high-quality full-length HIV envelope sequences (> 2200 bp) from seven autopsy tissues from an HIV+/cART+ subject, including three brain and four non-brain sites. Sanger sequencing was used for comparison with SMRT data and to clone functional pseudoviruses for in vitro tropism assays. Phylogenetic analysis demonstrated that brain-derived HIV was compartmentalized from HIV outside the brain and that the variants from each of the three brain tissues grouped independently. Variants from all peripheral tissues were intermixed on the tree but independent of the brain clades. Due to the large number of sequences, a clustering analysis at three similarity thresholds (99, 99.5, and 99.9%) was also performed. All brain sequences clustered exclusive of any non-brain sequences at all thresholds; however, frontal lobe sequences clustered independently of occipital and parietal lobes. Translated sequences revealed potentially functional differences between brain and non-brain sequences in the location of putative N-linked glycosylation sites (N-sites), V1 length, V3 charge, and the number of V4 N-sites. All brain sequences were predicted to use the CCR5 co-receptor, while most non-brain sequences were predicted to use CXCR4 co-receptor. Tropism results were confirmed by in vitro infection assays. The study is the first to use a SMRT sequencing approach to study HIV compartmentalization in tissues and supports other reports of limited trafficking between brain and non-brain sequences during cART. Due to the long sequence length, we could observe changes along the entire envelope gene, likely caused by differential selective pressure in the brain that may contribute to neurological disease.
Collapse
Affiliation(s)
- Robin L Brese
- Program in Molecular Medicine, University of Massachusetts Medical School, Biotech 2, 373 Plantation Street, Worcester, MA, 01605, USA
| | - Maria Paz Gonzalez-Perez
- Program in Molecular Medicine, University of Massachusetts Medical School, Biotech 2, 373 Plantation Street, Worcester, MA, 01605, USA
| | - Matthew Koch
- Program in Molecular Medicine, University of Massachusetts Medical School, Biotech 2, 373 Plantation Street, Worcester, MA, 01605, USA
| | - Olivia O'Connell
- Program in Molecular Medicine, University of Massachusetts Medical School, Biotech 2, 373 Plantation Street, Worcester, MA, 01605, USA
| | - Katherine Luzuriaga
- Program in Molecular Medicine, University of Massachusetts Medical School, Biotech 2, 373 Plantation Street, Worcester, MA, 01605, USA
| | - Mohan Somasundaran
- Program in Molecular Medicine, University of Massachusetts Medical School, Biotech 2, 373 Plantation Street, Worcester, MA, 01605, USA
| | - Paul R Clapham
- Program in Molecular Medicine, University of Massachusetts Medical School, Biotech 2, 373 Plantation Street, Worcester, MA, 01605, USA
| | | | - David J Nolan
- Bioinfoexperts, LLC, 718 Bayou Ln, Thibodaux, LA, 70301, USA
| | - Rebecca Rose
- Bioinfoexperts, LLC, 718 Bayou Ln, Thibodaux, LA, 70301, USA.
| | | |
Collapse
|
14
|
Sauvage V, Boizeau L, Candotti D, Vandenbogaert M, Servant-Delmas A, Caro V, Laperche S. Early MinION™ nanopore single-molecule sequencing technology enables the characterization of hepatitis B virus genetic complexity in clinical samples. PLoS One 2018; 13:e0194366. [PMID: 29566006 PMCID: PMC5864009 DOI: 10.1371/journal.pone.0194366] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2017] [Accepted: 03/01/2018] [Indexed: 12/14/2022] Open
Abstract
Until recently, the method of choice to characterize viral diversity consisted in cloning PCR amplicons of full-length viral genomes and Sanger-sequencing of multiple clones. However, this is extremely laborious, time-consuming, and low-throughput. Next generation short-read sequencing appears also limited by its inability to directly sequence full-length viral genomes. The MinION™ device recently developed by Oxford Nanopore Technologies can be a promising alternative by applying long-read single-molecule sequencing directly to the overall amplified products generated in a PCR reaction. This new technology was evaluated by using hepatitis B virus (HBV) as a model. Several previously characterized HBV-infected clinical samples were investigated including recombinant virus, variants that harbored deletions and mixed population. Original MinION device was able to generate individual complete 3,200-nt HBV genome sequences and to identify recombinant variants. MinION was particularly efficient in detecting HBV genomes with multiple large in-frame deletions and spliced variants concomitantly with non-deleted parental genomes. However, an average-12% sequencing error rate per individual reads associated to a low throughput challenged single-nucleotide resolution, polymorphism calling and phasing mutations directly from the sequencing reads. Despite this high error rate, the pairwise identity of MinION HBV consensus genome was consistent with Sanger sequencing method. MinION being under continuous development, further studies are needed to evaluate its potential use for viral infection characterization.
Collapse
Affiliation(s)
- Virginie Sauvage
- Institut National de la Transfusion Sanguine (INTS), Département D’études des Agents Transmissibles par le Sang, Centre National de Référence Risques Infectieux Transfusionnels, Paris, France
- * E-mail:
| | - Laure Boizeau
- Institut National de la Transfusion Sanguine (INTS), Département D’études des Agents Transmissibles par le Sang, Centre National de Référence Risques Infectieux Transfusionnels, Paris, France
| | - Daniel Candotti
- Institut National de la Transfusion Sanguine (INTS), Département D’études des Agents Transmissibles par le Sang, Centre National de Référence Risques Infectieux Transfusionnels, Paris, France
| | - Mathias Vandenbogaert
- Institut Pasteur, Genotyping of Pathogens Pole, Laboratory for Urgent Response to Biological Threats, Environment and Infectious Risks, Paris, France
| | - Annabelle Servant-Delmas
- Institut National de la Transfusion Sanguine (INTS), Département D’études des Agents Transmissibles par le Sang, Centre National de Référence Risques Infectieux Transfusionnels, Paris, France
| | - Valérie Caro
- Institut Pasteur, Genotyping of Pathogens Pole, Laboratory for Urgent Response to Biological Threats, Environment and Infectious Risks, Paris, France
| | - Syria Laperche
- Institut National de la Transfusion Sanguine (INTS), Département D’études des Agents Transmissibles par le Sang, Centre National de Référence Risques Infectieux Transfusionnels, Paris, France
| |
Collapse
|
15
|
Cartwright JF, Anderson K, Longworth J, Lobb P, James DC. Highly sensitive detection of mutations in CHO cell recombinant DNA using multi-parallel single molecule real-time DNA sequencing. Biotechnol Bioeng 2018; 115:1485-1498. [DOI: 10.1002/bit.26561] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2017] [Revised: 12/01/2017] [Accepted: 02/04/2018] [Indexed: 12/13/2022]
Affiliation(s)
- Joseph F. Cartwright
- Department of Chemical and Biological Engineering; University of Sheffield; Sheffield UK
| | - Karin Anderson
- Cell Line Development; BioTherapeutic Pharmaceutical Sciences; Pfizer Inc; Andover Massachusetts
| | - Joseph Longworth
- Department of Chemical and Biological Engineering; University of Sheffield; Sheffield UK
| | | | - David C. James
- Department of Chemical and Biological Engineering; University of Sheffield; Sheffield UK
| |
Collapse
|
16
|
Wu CP, Wu P, Zhao HF, Liu WL, Li WP. Clinical Applications of and Challenges in Single-Cell Analysis of Circulating Tumor Cells. DNA Cell Biol 2018; 37:78-89. [PMID: 29265876 DOI: 10.1089/dna.2017.3981] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Affiliation(s)
- Chang-peng Wu
- Department of Neurosurgery, Shenzhen Second People's Hospital, Clinical Medicine College of Anhui Medical University, Shenzhen, China
- Department of Neurosurgery, Shenzhen Key Laboratory of Neurosurgery, The First Affiliated Hospital of Shenzhen University, Shenzhen Second People's Hospital, Shenzhen, China
| | - Peng Wu
- The Affiliated Luohu Hospital of Shenzhen University, Shenzhen Luohu Hospital Group Department of Urology, Shenzhen, China
| | - Hua-fu Zhao
- Department of Neurosurgery, Shenzhen Key Laboratory of Neurosurgery, The First Affiliated Hospital of Shenzhen University, Shenzhen Second People's Hospital, Shenzhen, China
- Department of Neurosurgery/Neuro-oncology, State Key Laboratory of Oncology in South China, Sun Yat-sen University Cancer Center, Collaborative Innovation Center for Cancer Medicine, Guangzhou, China
| | - Wen-lan Liu
- Department of Neurosurgery, Shenzhen Key Laboratory of Neurosurgery, The First Affiliated Hospital of Shenzhen University, Shenzhen Second People's Hospital, Shenzhen, China
| | - Wei-ping Li
- Department of Neurosurgery, Shenzhen Key Laboratory of Neurosurgery, The First Affiliated Hospital of Shenzhen University, Shenzhen Second People's Hospital, Shenzhen, China
| |
Collapse
|
17
|
A comparative study on the characterization of hepatitis B virus quasispecies by clone-based sequencing and third-generation sequencing. Emerg Microbes Infect 2017; 6:e100. [PMID: 29116219 PMCID: PMC5717089 DOI: 10.1038/emi.2017.88] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2017] [Revised: 09/12/2017] [Accepted: 09/17/2017] [Indexed: 02/07/2023]
Abstract
Hepatitis B virus (HBV) has a high mutation rate due to the extremely high replication rate and the proofreading deficiency during reverse transcription. The generated variants with genetic heterogeneity are described as viral quasispecies (QS). Clone-based sequencing (CBS) is thought to be the ‘gold standard’ for assessing QS complexity and diversity of HBV, but an important issue about CBS is cost-effectiveness and laborious. In this study, we investigated the utility of the third-generation sequencing (TGS) DNA sequencing to characterize genetic heterogeneity of HBV QS and assessed the possible contribution of TGS technology in HBV QS studies. Parallel experiments including 3 control samples, which consisted of HBV full gene genotype B and genotype C plasmids, and 10 patients samples were performed by using CBS and TGS to analyze HBV whole-genome QS. Characterization of QS heterogeneity was conducted by using comprehensive statistical analysis. The results showed that TGS had a high consistency with CBS when measuring the complexity and diversity of QS. In addition, to detect rare variants, there were strong advantages conferred by TGS. In summary, TGS was considered to be practicable in HBV QS studies and it might have a relevant role in the clinical management of HBV infection in the future.
Collapse
|
18
|
Bujakowska KM, Fernandez-Godino R, Place E, Consugar M, Navarro-Gomez D, White J, Bedoukian EC, Zhu X, Xie HM, Gai X, Leroy BP, Pierce EA. Copy-number variation is an important contributor to the genetic causality of inherited retinal degenerations. Genet Med 2017; 19:643-651. [PMID: 27735924 PMCID: PMC6377944 DOI: 10.1038/gim.2016.158] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Accepted: 08/30/2016] [Indexed: 11/10/2022] Open
Abstract
PURPOSE Despite substantial progress in sequencing, current strategies can genetically solve only approximately 55-60% of inherited retinal degeneration (IRD) cases. This can be partially attributed to elusive mutations in the known IRD genes, which are not easily identified by the targeted next-generation sequencing (NGS) or Sanger sequencing approaches. We hypothesized that copy-number variations (CNVs) are a major contributor to the elusive genetic causality of IRDs. METHODS Twenty-eight cases previously unsolved with a targeted NGS were investigated with whole-genome single-nucleotide polymorphism (SNP) and comparative genomic hybridization (CGH) arrays. RESULTS Deletions in the IRD genes were detected in 5 of 28 families, including a de novo deletion. We suggest that the de novo deletion occurred through nonallelic homologous recombination (NAHR) and we constructed a genomic map of NAHR-prone regions with overlapping IRD genes. In this article, we also report an unusual case of recessive retinitis pigmentosa due to compound heterozygous mutations in SNRNP200, a gene that is typically associated with the dominant form of this disease. CONCLUSIONS CNV mapping substantially increased the genetic diagnostic rate of IRDs, detecting genetic causality in 18% of previously unsolved cases. Extending the search to other structural variations will probably demonstrate an even higher contribution to genetic causality of IRDs.Genet Med advance online publication 13 October 2016.
Collapse
Affiliation(s)
- Kinga M Bujakowska
- Ocular Genomics Institute, Massachusetts Eye and Ear Infirmary, Department of Ophthalmology, Harvard Medical School, Boston, Massachusetts, USA
| | - Rosario Fernandez-Godino
- Ocular Genomics Institute, Massachusetts Eye and Ear Infirmary, Department of Ophthalmology, Harvard Medical School, Boston, Massachusetts, USA
| | - Emily Place
- Ocular Genomics Institute, Massachusetts Eye and Ear Infirmary, Department of Ophthalmology, Harvard Medical School, Boston, Massachusetts, USA
| | - Mark Consugar
- Ocular Genomics Institute, Massachusetts Eye and Ear Infirmary, Department of Ophthalmology, Harvard Medical School, Boston, Massachusetts, USA
| | - Daniel Navarro-Gomez
- Ocular Genomics Institute, Massachusetts Eye and Ear Infirmary, Department of Ophthalmology, Harvard Medical School, Boston, Massachusetts, USA
| | - Joseph White
- Ocular Genomics Institute, Massachusetts Eye and Ear Infirmary, Department of Ophthalmology, Harvard Medical School, Boston, Massachusetts, USA
| | - Emma C Bedoukian
- Ophthalmic Genetics &Visual Electrophysiology, Division of Ophthalmology, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA
| | - Xiaosong Zhu
- Ophthalmic Genetics &Visual Electrophysiology, Division of Ophthalmology, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA
| | - Hongbo M Xie
- Department of BioMedical Health Informatics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA
| | - Xiaowu Gai
- Center for Personalized Medicine, Children's Hospital Los Angeles, Los Angeles, California, USA
| | - Bart P Leroy
- Ophthalmic Genetics &Visual Electrophysiology, Division of Ophthalmology, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA
- Department of Ophthalmology &Center for Medical Genetics, Ghent University Hospital &Ghent University, Ghent, Belgium
| | - Eric A Pierce
- Ocular Genomics Institute, Massachusetts Eye and Ear Infirmary, Department of Ophthalmology, Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
19
|
Evolution of multi-drug resistant HCV clones from pre-existing resistant-associated variants during direct-acting antiviral therapy determined by third-generation sequencing. Sci Rep 2017; 7:45605. [PMID: 28361915 PMCID: PMC5374541 DOI: 10.1038/srep45605] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2017] [Accepted: 02/28/2017] [Indexed: 02/07/2023] Open
Abstract
Resistance-associated variant (RAV) is one of the most significant clinical challenges in treating HCV-infected patients with direct-acting antivirals (DAAs). We investigated the viral dynamics in patients receiving DAAs using third-generation sequencing technology. Among 283 patients with genotype-1b HCV receiving daclatasvir + asunaprevir (DCV/ASV), 32 (11.3%) failed to achieve sustained virological response (SVR). Conventional ultra-deep sequencing of HCV genome was performed in 104 patients (32 non-SVR, 72 SVR), and detected representative RAVs in all non-SVR patients at baseline, including Y93H in 28 (87.5%). Long contiguous sequences spanning NS3 to NS5A regions of each viral clone in 12 sera from 6 representative non-SVR patients were determined by third-generation sequencing, and showed the concurrent presence of several synonymous mutations linked to resistance-associated substitutions in a subpopulation of pre-existing RAVs and dominant isolates at treatment failure. Phylogenetic analyses revealed close genetic distances between pre-existing RAVs and dominant RAVs at treatment failure. In addition, multiple drug-resistant mutations developed on pre-existing RAVs after DCV/ASV in all non-SVR cases. In conclusion, multi-drug resistant viral clones at treatment failure certainly originated from a subpopulation of pre-existing RAVs in HCV-infected patients. Those RAVs were selected for and became dominant with the acquisition of multiple resistance-associated substitutions under DAA treatment pressure.
Collapse
|
20
|
Abstract
Before starting chronic hepatitis C treatment, the viral genotype/subtype has to be accurately determined and potentially coupled with drug resistance testing. Due to the high genetic variability of the hepatitis C virus, this can be a demanding task that can potentially be streamlined by viral whole-genome sequencing using next-generation sequencing as demonstrated by an article in this issue of the Journal of Clinical Microbiology by E. Thomson, C. L. C. Ip, A. Badhan, M. T. Christiansen, W. Adamson, et al. (J Clin Microbiol. 54:2455-2469, 2016, http://dx.doi.org/10.1128/JCM.00330-16).
Collapse
|