1
|
Zhou S, Hill CS, Spielvogel E, Clark MU, Hudgens MG, Swanstrom R. Unique Molecular Identifiers and Multiplexing Amplicons Maximize the Utility of Deep Sequencing To Critically Assess Population Diversity in RNA Viruses. ACS Infect Dis 2022; 8:2505-2514. [PMID: 36326446 PMCID: PMC9742341 DOI: 10.1021/acsinfecdis.2c00319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Next generation sequencing (NGS)/deep sequencing has become an important tool in the study of viruses. The use of unique molecular identifiers (UMI) can overcome the limitations of PCR errors and PCR-mediated recombination and reveal the true sampling depth of a viral population being sequenced in an NGS experiment. This approach of enhanced sequence data represents an ideal tool to study both high and low abundance drug resistance mutations and more generally to explore the genetic structure of viral populations. Central to the use of the UMI/Primer ID approach is the creation of a template consensus sequence (TCS) for each genome sequenced. Here we describe a series of experiments to validate several aspects of the Multiplexed Primer ID (MPID) sequencing approach using the MiSeq platform. We have evaluated how multiplexing of cDNA synthesis and amplicons affects the sampling depth of the viral population for each individual cDNA and amplicon to understand the relationship between broader genome coverage versus maximal sequencing depth. We have validated reproducibility of the MPID assay in the detection of minority mutations in viral genomes. We have also examined the determinants that allow sequencing reads of PCR recombinants to contaminate the final TCS data set and show how such contamination can be limited. Finally, we provide several examples where we have applied MPID to analyze features of minority variants and describe limits on their detection in viral populations of HIV-1 and SARS-CoV-2 to demonstrate the generalizable utility of this approach with any RNA virus.
Collapse
Affiliation(s)
- Shuntai Zhou
- UNC Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA,Corresponding Author: Shuntai Zhou - UNC Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, 27599, USA.
| | - Collin S. Hill
- UNC Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Ean Spielvogel
- UNC Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Michael U. Clark
- UNC Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Michael G. Hudgens
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Ronald Swanstrom
- UNC Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA,Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
2
|
Hynst J, Navrkalova V, Pal K, Pospisilova S. Bioinformatic strategies for the analysis of genomic aberrations detected by targeted NGS panels with clinical application. PeerJ 2021; 9:e10897. [PMID: 33850640 PMCID: PMC8019320 DOI: 10.7717/peerj.10897] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Accepted: 01/13/2021] [Indexed: 01/21/2023] Open
Abstract
Molecular profiling of tumor samples has acquired importance in cancer research, but currently also plays an important role in the clinical management of cancer patients. Rapid identification of genomic aberrations improves diagnosis, prognosis and effective therapy selection. This can be attributed mainly to the development of next-generation sequencing (NGS) methods, especially targeted DNA panels. Such panels enable a relatively inexpensive and rapid analysis of various aberrations with clinical impact specific to particular diagnoses. In this review, we discuss the experimental approaches and bioinformatic strategies available for the development of an NGS panel for a reliable analysis of selected biomarkers. Compliance with defined analytical steps is crucial to ensure accurate and reproducible results. In addition, a careful validation procedure has to be performed before the application of NGS targeted assays in routine clinical practice. With more focus on bioinformatics, we emphasize the need for thorough pipeline validation and management in relation to the particular experimental setting as an integral part of the NGS method establishment. A robust and reproducible bioinformatic analysis running on powerful machines is essential for proper detection of genomic variants in clinical settings since distinguishing between experimental noise and real biological variants is fundamental. This review summarizes state-of-the-art bioinformatic solutions for careful detection of the SNV/Indels and CNVs for targeted sequencing resulting in translation of sequencing data into clinically relevant information. Finally, we share our experience with the development of a custom targeted NGS panel for an integrated analysis of biomarkers in lymphoproliferative disorders.
Collapse
Affiliation(s)
- Jakub Hynst
- Center of Molecular Medicine, Central European Institute of Technology, Masaryk University, Brno, Czech Republic.,Department of Internal Medicine-Hematology and Oncology, Faculty of Medicine and University Hospital Brno, Masaryk University, Brno, Czech Republic.,Department of Medical Genetics and Genomics, Faculty of Medicine and University Hospital Brno, Masaryk University, Brno, Czech Republic
| | - Veronika Navrkalova
- Center of Molecular Medicine, Central European Institute of Technology, Masaryk University, Brno, Czech Republic.,Department of Internal Medicine-Hematology and Oncology, Faculty of Medicine and University Hospital Brno, Masaryk University, Brno, Czech Republic
| | - Karol Pal
- Center of Molecular Medicine, Central European Institute of Technology, Masaryk University, Brno, Czech Republic.,Department of Hematology, University Hospital Schleswig-Holstein, Kiel, Germany
| | - Sarka Pospisilova
- Center of Molecular Medicine, Central European Institute of Technology, Masaryk University, Brno, Czech Republic.,Department of Internal Medicine-Hematology and Oncology, Faculty of Medicine and University Hospital Brno, Masaryk University, Brno, Czech Republic.,Department of Medical Genetics and Genomics, Faculty of Medicine and University Hospital Brno, Masaryk University, Brno, Czech Republic
| |
Collapse
|
3
|
Chen S, He C, Li Y, Li Z, Melançon CE. A computational toolset for rapid identification of SARS-CoV-2, other viruses and microorganisms from sequencing data. Brief Bioinform 2021; 22:924-935. [PMID: 33003197 PMCID: PMC7543257 DOI: 10.1093/bib/bbaa231] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Revised: 08/03/2020] [Accepted: 08/26/2020] [Indexed: 12/17/2022] Open
Abstract
In this paper, we present a toolset and related resources for rapid identification of viruses and microorganisms from short-read or long-read sequencing data. We present fastv as an ultra-fast tool to detect microbial sequences present in sequencing data, identify target microorganisms and visualize coverage of microbial genomes. This tool is based on the k-mer mapping and extension method. K-mer sets are generated by UniqueKMER, another tool provided in this toolset. UniqueKMER can generate complete sets of unique k-mers for each genome within a large set of viral or microbial genomes. For convenience, unique k-mers for microorganisms and common viruses that afflict humans have been generated and are provided with the tools. As a lightweight tool, fastv accepts FASTQ data as input and directly outputs the results in both HTML and JSON formats. Prior to the k-mer analysis, fastv automatically performs adapter trimming, quality pruning, base correction and other preprocessing to ensure the accuracy of k-mer analysis. Specifically, fastv provides built-in support for rapid severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) identification and typing. Experimental results showed that fastv achieved 100% sensitivity and 100% specificity for detecting SARS-CoV-2 from sequencing data; and can distinguish SARS-CoV-2 from SARS, Middle East respiratory syndrome and other coronaviruses. This toolset is available at: https://github.com/OpenGene/fastv.
Collapse
Affiliation(s)
- Shifu Chen
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences. He also serves as chief technology officer of HaploX Biotechnology. He is the initiator of OpenGene projects and a contributor to many open source tools
| | - Changshou He
- department of bioinformatics, HaploX Biotechnology
| | - Yingqiang Li
- department of bioinformatics, HaploX Biotechnology
| | - Zhicheng Li
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences. His research interests lie mainly in imaging genomics
| | - Charles E Melançon
- department of research and development, HaploX Biotechnology. His research interests lie mainly in next-generation sequencing and bioinformatics
| |
Collapse
|
4
|
Gao C, Zhang M, Chen L. The Comparison of Two Single-cell Sequencing Platforms: BD Rhapsody and 10x Genomics Chromium. Curr Genomics 2020; 21:602-609. [PMID: 33414681 PMCID: PMC7770630 DOI: 10.2174/1389202921999200625220812] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2020] [Revised: 05/07/2020] [Accepted: 06/01/2020] [Indexed: 01/08/2023] Open
Abstract
The cell is the unit of life for all organisms, and all cells are certainly not the same. So the technology to generate transcription expression or genomic DNA profiles from single cells is crucial. Since its establishment in 2009, single-cell RNA sequencing (scRNA-seq) has emerged as a major driver of progress in biomedical research. During the last three years, several new single-cell sequencing platforms have emerged. Yet there are only a few systematic comparisons of the advantages and limitations of these commonly used platforms. Here we compare two single-cell sequencing platforms: BD Rhapsody and 10x Genomics Chromium, including their different mechanisms and some scRNA-seq results obtained with them.
Collapse
Affiliation(s)
- Caixia Gao
- Center for Microbiota and Immunological Diseases, Shanghai General Hospital, Shanghai Institute of Immunology, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Mingnan Zhang
- Center for Microbiota and Immunological Diseases, Shanghai General Hospital, Shanghai Institute of Immunology, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Lei Chen
- Center for Microbiota and Immunological Diseases, Shanghai General Hospital, Shanghai Institute of Immunology, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
5
|
Shao W, Boltz VF, Hattori J, Bale MJ, Maldarelli F, Coffin JM, Kearney MF. Short Communication: HIV-DRLink: A Tool for Reporting Linked HIV-1 Drug Resistance Mutations in Large Single-Genome Data Sets Using the Stanford HIV Database. AIDS Res Hum Retroviruses 2020; 36:942-947. [PMID: 32683881 DOI: 10.1089/aid.2020.0109] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
The prevalence of HIV-1 drug resistance is increasing worldwide and monitoring its emergence is important for the successful management of populations receiving combination antiretroviral therapy. It is likely that pre-existing drug resistance mutations linked on the same viral genomes are predictive of treatment failure. Because of the large number of sequences generated by ultrasensitive single-genome sequencing (uSGS) and other similar next-generation sequencing methods, it is difficult to assess each sequence individually for linked drug resistance mutations. Several software/programs exist to report the frequencies of individual mutations in large data sets, but they provide no information on linkage of resistance mutations. In this study, we report the HIV-DRLink program, a research tool that provides resistance mutation frequencies as well as their genetic linkage by parsing and summarizing the Sierra output from the Stanford HIV Database. The HIV-DRLink program should only be used on data sets generated by methods that eliminate artifacts due to polymerase chain reaction recombination, for example, standard single-genome sequencing or uSGS. HIV-DRLink is exclusively a research tool and is not intended to inform clinical decisions.
Collapse
Affiliation(s)
- Wei Shao
- Advanced Biomedical Computing Science, Frederick National Laboratory for Cancer Research (FNLCR) sponsored by the National Cancer Institute, Frederick, Maryland, USA
| | - Valerie F. Boltz
- HIV Dynamics and Replication Program, CCR, National Cancer Institute, Frederick, Maryland, USA
| | - Junko Hattori
- HIV Dynamics and Replication Program, CCR, National Cancer Institute, Frederick, Maryland, USA
| | - Michael J. Bale
- HIV Dynamics and Replication Program, CCR, National Cancer Institute, Frederick, Maryland, USA
| | - Frank Maldarelli
- HIV Dynamics and Replication Program, CCR, National Cancer Institute, Frederick, Maryland, USA
| | - John M. Coffin
- Department of Molecular Biology and Microbiology, Tufts University, Boston, Massachusetts, USA
| | - Mary F. Kearney
- HIV Dynamics and Replication Program, CCR, National Cancer Institute, Frederick, Maryland, USA
| |
Collapse
|
6
|
Next-Generation Sequencing in High-Sensitive Detection of Mutations in Tumors: Challenges, Advances, and Applications. J Mol Diagn 2020; 22:994-1007. [PMID: 32480002 DOI: 10.1016/j.jmoldx.2020.04.213] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 03/17/2020] [Accepted: 04/23/2020] [Indexed: 02/06/2023] Open
Abstract
Next-generation sequencing (NGS) technologies have come of age as preferred technologies for screening of genomic variants of pathologic and therapeutic potential. Because of their capability for high-throughput and massively parallel sequencing, they can screen for a variety of genomic changes in multiple samples simultaneously. This has made them platforms of choice for clinical testing of solid tumors and hematological malignancies. Consequently, they are increasingly replacing conventional technologies, such as Sanger sequencing and pyrosequencing, expression arrays, real-time PCR, and fluorescence in situ hybridization methods, for routine molecular testing of tumors. However, one limitation of routinely used NGS technologies is the inability to detect low-level genomic variants with high accuracy. This can be attributed to the frequent occurrence of low-level sequencing errors and artifacts in NGS workflow that need specialized approaches to be identified and eliminated. This review focuses on the origins and nature of these artifacts and recent improvements in the NGS technologies to overcome them to facilitate accurate high-sensitive detection of low-level mutations. Potential applications of high-sensitive NGS in oncology and comparisons with non-NGS technologies of similar capabilities are also summarized.
Collapse
|
7
|
Xu C, Gu X, Padmanabhan R, Wu Z, Peng Q, DiCarlo J, Wang Y. smCounter2: an accurate low-frequency variant caller for targeted sequencing data with unique molecular identifiers. Bioinformatics 2020; 35:1299-1309. [PMID: 30192920 PMCID: PMC6477992 DOI: 10.1093/bioinformatics/bty790] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2018] [Revised: 08/03/2018] [Accepted: 09/05/2018] [Indexed: 12/30/2022] Open
Abstract
MOTIVATION Low-frequency DNA mutations are often confounded with technical artifacts from sample preparation and sequencing. With unique molecular identifiers (UMIs), most of the sequencing errors can be corrected. However, errors before UMI tagging, such as DNA polymerase errors during end repair and the first PCR cycle, cannot be corrected with single-strand UMIs and impose fundamental limits to UMI-based variant calling. RESULTS We developed smCounter2, a UMI-based variant caller for targeted sequencing data and an upgrade from the current version of smCounter. Compared to smCounter, smCounter2 features lower detection limit that decreases from 1 to 0.5%, better overall accuracy (particularly in non-coding regions), a consistent threshold that can be applied to both deep and shallow sequencing runs, and easier use via a Docker image and code for read pre-processing. We benchmarked smCounter2 against several state-of-the-art UMI-based variant calling methods using multiple datasets and demonstrated smCounter2's superior performance in detecting somatic variants. At the core of smCounter2 is a statistical test to determine whether the allele frequency of the putative variant is significantly above the background error rate, which was carefully modeled using an independent dataset. The improved accuracy in non-coding regions was mainly achieved using novel repetitive region filters that were specifically designed for UMI data. AVAILABILITY AND IMPLEMENTATION The entire pipeline is available at https://github.com/qiaseq/qiaseq-dna under MIT license. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chang Xu
- Life Science Research and Foundation, QIAGEN Sciences Inc., Frederick, MD, USA
| | - Xiujing Gu
- Life Science Research and Foundation, QIAGEN Sciences Inc., Frederick, MD, USA
| | | | - Zhong Wu
- Life Science Research and Foundation, QIAGEN Sciences Inc., Frederick, MD, USA
| | - Quan Peng
- Life Science Research and Foundation, QIAGEN Sciences Inc., Frederick, MD, USA
| | - John DiCarlo
- Life Science Research and Foundation, QIAGEN Sciences Inc., Frederick, MD, USA
| | - Yexun Wang
- Life Science Research and Foundation, QIAGEN Sciences Inc., Frederick, MD, USA
| |
Collapse
|
8
|
Balagopal V, Hantel A, Kadri S, Steinhardt G, Zhen CJ, Kang W, Wanjari P, Ritterhouse LL, Stock W, Segal JP. Measurable residual disease monitoring for patients with acute myeloid leukemia following hematopoietic cell transplantation using error corrected hybrid capture next generation sequencing. PLoS One 2019; 14:e0224097. [PMID: 31658273 PMCID: PMC6816574 DOI: 10.1371/journal.pone.0224097] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2019] [Accepted: 10/05/2019] [Indexed: 12/22/2022] Open
Abstract
Improved systems for detection of measurable residual disease (MRD) in acute myeloid leukemia (AML) are urgently needed, however attempts to utilize broad-scale next-generation sequencing (NGS) panels to perform multi-gene surveillance in AML post-induction have been stymied by persistent premalignant mutation-bearing clones. We hypothesized that this technology may be more suitable for evaluation of fully engrafted patients following hematopoietic cell transplantation (HCT). To address this question, we developed a hybrid-capture NGS panel utilizing unique molecular identifiers (UMIs) to detect variants at 0.1% VAF or below across 22 genes frequently mutated in myeloid disorders and applied it to a retrospective sample set of blood and bone marrow DNA samples previously evaluated as negative for disease via standard-of-care short tandem repeat (STR)-based engraftment testing and hematopathology analysis in our laboratory. Of 30 patients who demonstrated trackable mutations in the 22 genes at eventual relapse by standard NGS analysis, we were able to definitively detect relapse-associated mutations in 18/30 (60%) at previously disease-negative timepoints collected 20-100 days prior to relapse date. MRD was detected in both bone marrow (15/28, 53.6%) and peripheral blood samples (9/18, 50%), while showing excellent technical specificity in our sample set. We also confirmed the disappearance of all MRD signal with increasing time prior to relapse (>100 days), indicating true clinical specificity, even using genes commonly associated with clonal hematopoiesis of indeterminate potential (CHIP). This study highlights the efficacy of a highly sensitive, NGS panel-based approach to early detection of relapse in AML and supports the clinical validity of extending MRD analysis across many genes in the post-transplant setting.
Collapse
Affiliation(s)
- Vidya Balagopal
- Department of Pathology, Division of Genomic and Molecular Pathology, The University of Chicago, Chicago, Illinois, United States of America
| | - Andrew Hantel
- Department of Medicine, Section of Hematology/Oncology, The University of Chicago, Chicago, Illinois, United States of America
| | - Sabah Kadri
- Department of Pathology, Division of Genomic and Molecular Pathology, The University of Chicago, Chicago, Illinois, United States of America
| | - George Steinhardt
- Department of Pathology, Division of Genomic and Molecular Pathology, The University of Chicago, Chicago, Illinois, United States of America
| | - Chao Jie Zhen
- Department of Pathology, Division of Genomic and Molecular Pathology, The University of Chicago, Chicago, Illinois, United States of America
| | - Wenjun Kang
- Center for Research Informatics, The University of Chicago, Chicago, Illinois, United States of America
| | - Pankhuri Wanjari
- Department of Pathology, Division of Genomic and Molecular Pathology, The University of Chicago, Chicago, Illinois, United States of America
| | - Lauren L. Ritterhouse
- Department of Pathology, Division of Genomic and Molecular Pathology, The University of Chicago, Chicago, Illinois, United States of America
| | - Wendy Stock
- Department of Medicine, Section of Hematology/Oncology, The University of Chicago, Chicago, Illinois, United States of America
| | - Jeremy P. Segal
- Department of Pathology, Division of Genomic and Molecular Pathology, The University of Chicago, Chicago, Illinois, United States of America
- * E-mail:
| |
Collapse
|
9
|
Liu CC, Ji H. PCR Amplification Strategies Towards Full-length HIV-1 Genome Sequencing. Curr HIV Res 2019; 16:98-105. [PMID: 29943704 DOI: 10.2174/1570162x16666180626152252] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Revised: 05/05/2018] [Accepted: 06/20/2018] [Indexed: 11/22/2022]
Abstract
The advent of next-generation sequencing has enabled greater resolution of viral diversity and improved feasibility of full viral genome sequencing allowing routine HIV-1 full genome sequencing in both research and diagnostic settings. Regardless of the sequencing platform selected, successful PCR amplification of the HIV-1 genome is essential for sequencing template preparation. As such, full HIV-1 genome amplification is a crucial step in dictating the successful and reliable sequencing downstream. Here we reviewed existing PCR protocols leading to HIV-1 full genome sequencing. In addition to the discussion on basic considerations on relevant PCR design, the advantages as well as the pitfalls of the published protocols were reviewed.
Collapse
Affiliation(s)
- Chao Chun Liu
- National Microbiology Laboratory at JC Wilt Infectious Diseases Research Center, Public Health Agency of Canada, Winnipeg, Canada
| | - Hezhao Ji
- National Microbiology Laboratory at JC Wilt Infectious Diseases Research Center, Public Health Agency of Canada, Winnipeg, Canada.,Department of Medical Microbiology and Infectious Diseases, University of Manitoba, Winnipeg, Canada
| |
Collapse
|
10
|
Yeom H, Lee Y, Ryu T, Noh J, Lee AC, Lee HB, Kang E, Song SW, Kwon S. Barcode-free next-generation sequencing error validation for ultra-rare variant detection. Nat Commun 2019; 10:977. [PMID: 30816127 PMCID: PMC6395625 DOI: 10.1038/s41467-019-08941-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2018] [Accepted: 01/30/2019] [Indexed: 12/31/2022] Open
Abstract
The advent of next-generation sequencing (NGS) has accelerated biomedical research by enabling the high-throughput analysis of DNA sequences at a very low cost. However, NGS has limitations in detecting rare-frequency variants (< 1%) because of high sequencing errors (> 0.1~1%). NGS errors could be filtered out using molecular barcodes, by comparing read replicates among those with the same barcodes. Accordingly, these barcoding methods require redundant reads of non-target sequences, resulting in high sequencing cost. Here, we present a cost-effective NGS error validation method in a barcode-free manner. By physically extracting and individually amplifying the DNA clones of erroneous reads, we distinguish true variants of frequency > 0.003% from the systematic NGS error and selectively validate NGS error after NGS. We achieve a PCR-induced error rate of 2.5×10−6 per base per doubling event, using 10 times less sequencing reads compared to those from previous studies. Next generation sequencing has difficulty in detecting rare-frequency variants due to high sequencing errors. Here the authors present a barcode-free error validation method that physically extracts erroneous reads to identify true variants.
Collapse
Affiliation(s)
- Huiran Yeom
- Department of Electrical and Computer Engineering, Seoul National University, Seoul, 08826, Republic of Korea
| | - Yonghee Lee
- Department of Electrical and Computer Engineering, Seoul National University, Seoul, 08826, Republic of Korea
| | - Taehoon Ryu
- Department of Molecular and Genetical Engineering, Celemics Inc., 371-17, Gasan-dong, Geumcheon-gu, 08506, Seoul, Republic of Korea
| | - Jinsung Noh
- Department of Electrical and Computer Engineering, Seoul National University, Seoul, 08826, Republic of Korea
| | - Amos Chungwon Lee
- Interdisciplinary Program for Bioengineering, Seoul National University, 08826, Seoul, Republic of Korea
| | - Han-Byoel Lee
- Department of Surgery, Seoul National University College of Medicine, Seoul National University Hospital Biomedical Research Institute, 03080, Seoul, Republic of Korea
| | - Eunji Kang
- Cancer Research Institute, Seoul National University, 03080, Seoul, Republic of Korea
| | - Seo Woo Song
- Department of Electrical and Computer Engineering, Seoul National University, Seoul, 08826, Republic of Korea
| | - Sunghoon Kwon
- Department of Electrical and Computer Engineering, Seoul National University, Seoul, 08826, Republic of Korea. .,Department of Molecular and Genetical Engineering, Celemics Inc., 371-17, Gasan-dong, Geumcheon-gu, 08506, Seoul, Republic of Korea. .,Interdisciplinary Program for Bioengineering, Seoul National University, 08826, Seoul, Republic of Korea. .,Bio-MAX institute, Seoul National University, 08826, Seoul, Republic of Korea.
| |
Collapse
|
11
|
Abstract
As a major biomarker of liquid biopsy, cell-free tumor DNA (ctDNA), which can be extracted from blood, urine, or other circulating liquids, is able to provide comprehensive genetic information of tumor and better overcome the tumor heterogeneity problem comparing to tissue biopsy. Developed in recent years, next-generation sequencing (NGS) is a widely used technology for analyzing ctDNA. Although the technologies of processing ctDNA samples are mature, the task to detect low mutated allele frequency (MAF) variations from noisy sequencing data remains challenging. In this chapter, the authors will first explain the difficulties of analyzing ctDNA sequencing data, review related technologies, and then present some novel bioinformatics methods for analyzing ctDNA NGS data in better ways.
Collapse
|
12
|
Tebaldi M, Salvi S. From cfDNA to Sequencing: Workflows and Potentials. Methods Mol Biol 2019; 1909:119-125. [PMID: 30580427 DOI: 10.1007/978-1-4939-8973-7_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Cell-free DNA (cfDNA) is acquiring increasingly importance in oncologic clinical practice, mostly due to its role in predicting the onset of therapy resistance by following the mutation status changes of patients. In this field, high-sensitivity methods like next-generation sequencing (NGS) could help to accurately detect somatic mutations at low frequency. Here, we report some advantages and limitations of NGS approaches for cfDNA mutation analyses with the aim of choosing the most suitable in terms of sensitivity, specificity, data output, costs, and time work.
Collapse
Affiliation(s)
- Michela Tebaldi
- Biosciences Laboratory, Istituto Scientifico Romagnolo per lo Studio e la Cura dei Tumori (IRST) IRCCS, Meldola, Italy.
| | - Samanta Salvi
- Biosciences Laboratory, Istituto Scientifico Romagnolo per lo Studio e la Cura dei Tumori (IRST) IRCCS, Meldola, Italy
| |
Collapse
|
13
|
Salk JJ, Schmitt MW, Loeb LA. Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations. Nat Rev Genet 2018; 19:269-285. [PMID: 29576615 PMCID: PMC6485430 DOI: 10.1038/nrg.2017.117] [Citation(s) in RCA: 305] [Impact Index Per Article: 50.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Mutations, the fuel of evolution, are first manifested as rare DNA changes within a population of cells. Although next-generation sequencing (NGS) technologies have revolutionized the study of genomic variation between species and individual organisms, most have limited ability to accurately detect and quantify rare variants among the different genome copies in heterogeneous mixtures of cells or molecules. We describe the technical challenges in characterizing subclonal variants using conventional NGS protocols and the recent development of error correction strategies, both computational and experimental, including consensus sequencing of single DNA molecules. We also highlight major applications for low-frequency mutation detection in science and medicine, describe emerging methodologies and provide our vision for the future of DNA sequencing.
Collapse
Affiliation(s)
- Jesse J Salk
- Department of Pathology, University of Washington School of Medicine, Seattle, WA, USA
- Department of Medicine, Divisions of Hematology and Medical Oncology, University of Washington School of Medicine, Seattle, WA, USA
- Fred Hutchinson Cancer Research Center, Clinical Research Division, Seattle, WA, USA
| | - Michael W Schmitt
- Department of Pathology, University of Washington School of Medicine, Seattle, WA, USA
- Department of Medicine, Divisions of Hematology and Medical Oncology, University of Washington School of Medicine, Seattle, WA, USA
- Fred Hutchinson Cancer Research Center, Clinical Research Division, Seattle, WA, USA
| | - Lawrence A Loeb
- Department of Pathology, University of Washington School of Medicine, Seattle, WA, USA
- Department of Biochemistry, University of Washington School of Medicine, Seattle, WA, USA
| |
Collapse
|
14
|
Ogawa T, Kryukov K, Imanishi T, Shiroguchi K. The efficacy and further functional advantages of random-base molecular barcodes for absolute and digital quantification of nucleic acid molecules. Sci Rep 2017; 7:13576. [PMID: 29051542 PMCID: PMC5648891 DOI: 10.1038/s41598-017-13529-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2017] [Accepted: 09/25/2017] [Indexed: 01/18/2023] Open
Abstract
Accurate quantification of biomolecules in system-wide measurements is in high demand, especially for systems with limited sample amounts such as single cells. Because of this, digital quantification of nucleic acid molecules using molecular barcodes has been developed, making, e.g., transcriptome analysis highly reproducible and quantitative. This counting scheme was shown to work using sequence-restricted barcodes, and non-sequence-restricted (random-base) barcodes that may provide a much higher dynamic range at significantly lower cost have been widely used. However, the efficacy of random-base barcodes is significantly affected by base changes due to amplification and/or sequencing errors and has not been investigated experimentally or quantitatively. Here, we show experimentally that random-base barcodes enable absolute and digital quantification of DNA molecules with high dynamic range (from one to more than 104, potentially up to 1015 molecules) conditional on our barcode design and variety, a certain range of sequencing depths, and computational analyses. Moreover, we quantitatively show further functional advantages of the molecular barcodes: the molecular barcodes enable one to find contaminants and misidentifications of target sequences. Our scheme here may be generally used to confirm that the digital quantification works in each platform.
Collapse
Affiliation(s)
- Taisaku Ogawa
- Laboratory for Integrative Omics, RIKEN Quantitative Biology Center (QBiC), 6-2-3 Furuedai Suita, Osaka, 565-0874, Japan
| | - Kirill Kryukov
- Biomedical Informatics Laboratory, Department of Molecular Life Science, Tokai University School of Medicine, 143 Shimokasuya, Isehara, Kanagawa, 259-1193, Japan
| | - Tadashi Imanishi
- Biomedical Informatics Laboratory, Department of Molecular Life Science, Tokai University School of Medicine, 143 Shimokasuya, Isehara, Kanagawa, 259-1193, Japan
| | - Katsuyuki Shiroguchi
- Laboratory for Integrative Omics, RIKEN Quantitative Biology Center (QBiC), 6-2-3 Furuedai Suita, Osaka, 565-0874, Japan. .,Laboratory for Immunogenetics, RIKEN Center for Integrative Medical Sciences (IMS), 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan. .,JST PRESTO, 4-1-8 Honcho, Kawaguchi, Saitama, 332-0012, Japan.
| |
Collapse
|
15
|
Roloff GW, Lai C, Hourigan CS, Dillon LW. Technical Advances in the Measurement of Residual Disease in Acute Myeloid Leukemia. J Clin Med 2017; 6:jcm6090087. [PMID: 28925935 PMCID: PMC5615280 DOI: 10.3390/jcm6090087] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2017] [Revised: 09/09/2017] [Accepted: 09/13/2017] [Indexed: 12/31/2022] Open
Abstract
Outcomes for those diagnosed with acute myeloid leukemia (AML) remain poor. It has been widely established that persistent residual leukemic burden, often referred to as measurable or minimal residual disease (MRD), after induction therapy or at the time of hematopoietic stem cell transplant (HSCT) is highly predictive for adverse clinical outcomes and can be used to identify patients likely to experience clinically evident relapse. As a result of inherent genetic and molecular heterogeneity in AML, there is no uniform method or protocol for MRD measurement to encompass all cases. Several techniques focusing on identifying recurrent molecular and cytogenetic aberrations or leukemia-associated immunophenotypes have been described, each with their own strengths and weaknesses. Modern technologies enabling the digital quantification and tracking of individual DNA or RNA molecules, next-generation sequencing (NGS) platforms, and high-resolution imaging capabilities are among several new avenues under development to supplement or replace the current standard of flow cytometry. In this review, we outline emerging modalities positioned to enhance MRD detection and discuss factors surrounding their integration into clinical practice.
Collapse
Affiliation(s)
- Gregory W Roloff
- Myeloid Malignances Section, Hematology Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA.
| | - Catherine Lai
- Myeloid Malignances Section, Hematology Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA.
| | - Christopher S Hourigan
- Myeloid Malignances Section, Hematology Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA.
| | - Laura W Dillon
- Myeloid Malignances Section, Hematology Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA.
| |
Collapse
|
16
|
Canzoniero JV, Cravero K, Park BH. The Impact of Collisions on the Ability to Detect Rare Mutant Alleles Using Barcode-Type Next-Generation Sequencing Techniques. Cancer Inform 2017. [DOI: 10.1177/1176935117719236] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Barcoding techniques are used to reduce error from next-generation sequencing, with applications ranging from understanding tumor subclone populations to detecting circulating tumor DNA. Collisions occur when more than one sample molecule is tagged by the same unique identifier (UID) and can result in failure to detect very-low-frequency mutations and error in estimating mutation frequency. Here, we created computer models of barcoding technique, with and without amplification bias introduced by the UID, and analyzed the effect of collisions for a range of mutant allele frequencies (1e−6 to 0.2), number of sample molecules (10 000 to 1e7), and number of UIDs (410-414). Inability to detect rare mutant alleles occurred in 0% to 100% of simulations, depending on collisions and number of mutant molecules. Collisions also introduced error in estimating mutant allele frequency resulting in underestimation of minor allele frequency. Incorporating an understanding of the effect of collisions into experimental design can allow for optimization of the number of sample molecules and number of UIDs to minimize the negative impact on rare mutant detection and mutant frequency estimation.
Collapse
Affiliation(s)
| | - Karen Cravero
- The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins Medicine, Baltimore, MD, USA
| | - Ben Ho Park
- The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins Medicine, Baltimore, MD, USA
| |
Collapse
|
17
|
Kirches E. MtDNA As a Cancer Marker: A Finally Closed Chapter? Curr Genomics 2017; 18:255-267. [PMID: 28659721 PMCID: PMC5476953 DOI: 10.2174/1389202918666170105093635] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Revised: 11/10/2016] [Accepted: 12/13/2016] [Indexed: 12/03/2022] Open
Abstract
Sequence alterations of the mitochondrial DNA (mtDNA) have been identified in many tu-mor types. Their nature is not entirely clear. Somatic mutation or shifts of heteroplasmic mtDNA vari-ants may play a role. These sequence alterations exhibit a sufficient frequency in all tumor types investi-gated thus far to justify their use as a tumor marker. This statement is supported by the high copy num-ber of mtDNA, which facilitates the detection of aberrant tumor-derived DNA in bodily fluids. This will be of special interest in tumors, which release a relatively high number of cells into bodily fluids, which are easily accessible, most strikingly in urinary bladder carcinoma. Due to the wide distribution of the observed base substitutions, deletions or insertions within the mitochondrial genome, high efforts for whole mtDNA sequencing (16.5 kb) from bodily fluids would be required, if the method would be in-tended for initial tumor screening. However, the usage of mtDNA for sensitive surveillance of known tumor diseases is a meaningful option, which may allow an improved non-invasive follow-up for the urinary bladder carcinoma, as compared to the currently existing cytological or molecular methods. Fol-lowing a short general introduction into mtDNA, this review demonstrates that the scenario of a sensi-tive cancer follow-up by mtDNA-analysis deserves more attention. It would be most important to inves-tigate precisely in the most relevant tumor types, if sequencing approaches in combination with simple PCR-assays for deletions/insertions in homopolymeric tracts has sufficient sensitivity to find most tu-mor-derived mtDNAs in bodily fluids.
Collapse
|
18
|
Boltz VF, Rausch J, Shao W, Hattori J, Luke B, Maldarelli F, Mellors JW, Kearney MF, Coffin JM. Ultrasensitive single-genome sequencing: accurate, targeted, next generation sequencing of HIV-1 RNA. Retrovirology 2016; 13:87. [PMID: 27998286 PMCID: PMC5175307 DOI: 10.1186/s12977-016-0321-6] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2016] [Accepted: 11/29/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Although next generation sequencing (NGS) offers the potential for studying virus populations in unprecedented depth, PCR error, amplification bias and recombination during library construction have limited its use to population sequencing and measurements of unlinked allele frequencies. Here we report a method, termed ultrasensitive Single-Genome Sequencing (uSGS), for NGS library construction and analysis that eliminates PCR errors and recombinants, and generates single-genome sequences of the same quality as the "gold-standard" of HIV-1 single-genome sequencing assay but with more than 100-fold greater depth. RESULTS Primer ID tagged cDNA was synthesized from mixtures of cloned BH10 wild-type and mutant HIV-1 transcripts containing ten drug resistance mutations. First, the resultant cDNA was divided and NGS libraries were generated in parallel using two methods: uSGS and a method applying long PCR primers to attach the NGS adaptors (LP-PCR-1). Second, cDNA was divided and NGS libraries were generated in parallel comparing 3 methods: uSGS and 2 methods adapted from more recent reports using variations of the long PCR primers to attach the adaptors (LP-PCR-2 and LP-PCR-3). Consistently, the uSGS method amplified a greater proportion of cDNAs, averaging 30% compared to 13% for LP-PCR-1, 21% for LP-PCR-2 and 14% for LP-PCR-3. Most importantly, when the uSGS sequences were binned according to their primer IDs, 94% of the bins did not contain PCR recombinant sequences versus only 55, 75 and 65% for LP-PCR-1, 2 and 3, respectively. Finally, when uSGS was applied to plasma samples from HIV-1 infected donors, both frequent and rare variants were detected in each sample and neighbor-joining trees revealed clusters of genomes driven by the linkage of these mutations, showing the lack of PCR recombinants in the datasets. CONCLUSIONS The uSGS assay can be used for accurate detection of rare variants and for identifying linkage of rare alleles associated with HIV-1 drug resistance. In addition, the method allows accurate in-depth analyses of the complex genetic relationships of viral populations in vivo.
Collapse
Affiliation(s)
- Valerie F Boltz
- HIV Dynamics and Replication Program, CCR, National Cancer Institute, NIH, Translational Research Unit, 105 Boyles Street, Building 535 Room 111, Frederick, MD, 21702-1201, USA.
| | - Jason Rausch
- HIV Dynamics and Replication Program, CCR, National Cancer Institute, NIH, Translational Research Unit, 105 Boyles Street, Building 535 Room 111, Frederick, MD, 21702-1201, USA
| | - Wei Shao
- Frederick National Laboratory for Cancer Research, Advanced Biomedical Computing Center, Leidos Biomedical Research, Inc, Frederick, MD, USA
| | - Junko Hattori
- HIV Dynamics and Replication Program, CCR, National Cancer Institute, NIH, Translational Research Unit, 105 Boyles Street, Building 535 Room 111, Frederick, MD, 21702-1201, USA
| | - Brian Luke
- Frederick National Laboratory for Cancer Research, Advanced Biomedical Computing Center, Leidos Biomedical Research, Inc, Frederick, MD, USA
| | - Frank Maldarelli
- HIV Dynamics and Replication Program, CCR, National Cancer Institute, NIH, Translational Research Unit, 105 Boyles Street, Building 535 Room 111, Frederick, MD, 21702-1201, USA
| | - John W Mellors
- Division of Infectious Disease, University of Pittsburgh, Pittsburgh, PA, USA
| | - Mary F Kearney
- HIV Dynamics and Replication Program, CCR, National Cancer Institute, NIH, Translational Research Unit, 105 Boyles Street, Building 535 Room 111, Frederick, MD, 21702-1201, USA
| | - John M Coffin
- Department of Molecular Biology and Microbiology, Tufts University, Boston, MA, USA
| |
Collapse
|
19
|
Brumme CJ, Poon AFY. Promises and pitfalls of Illumina sequencing for HIV resistance genotyping. Virus Res 2016; 239:97-105. [PMID: 27993623 DOI: 10.1016/j.virusres.2016.12.008] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2016] [Revised: 12/15/2016] [Accepted: 12/15/2016] [Indexed: 12/13/2022]
Abstract
Genetic sequencing ("genotyping") plays a critical role in the modern clinical management of HIV infection. This virus evolves rapidly within patients because of its error-prone reverse transcriptase and short generation time. Consequently, HIV variants with mutations that confer resistance to one or more antiretroviral drugs can emerge during sub-optimal treatment. There are now multiple HIV drug resistance interpretation algorithms that take the region of the HIV genome encoding the major drug targets as inputs; expert use of these algorithms can significantly improve to clinical outcomes in HIV treatment. Next-generation sequencing has the potential to revolutionize HIV resistance genotyping by lowering the threshold that rare but clinically significant HIV variants can be detected reproducibly, and by conferring improved cost-effectiveness in high-throughput scenarios. In this review, we discuss the relative merits and challenges of deploying the Illumina MiSeq instrument for clinical HIV genotyping.
Collapse
Affiliation(s)
- Chanson J Brumme
- BC Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canada
| | - Art F Y Poon
- Department of Pathology & Laboratory Medicine, Western University, London, Ontario, Canada.
| |
Collapse
|
20
|
Hughes P, Deng W, Olson SC, Coombs RW, Chung MH, Frenkel LM. Short Communication: Analysis of Minor Populations of Human Immunodeficiency Virus by Primer Identification and Insertion-Deletion and Carry Forward Correction Pipelines. AIDS Res Hum Retroviruses 2016; 32:296-302. [PMID: 26537573 DOI: 10.1089/aid.2015.0202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Accurate analysis of minor populations of drug-resistant HIV requires analysis of a sufficient number of viral templates. We assessed the effect of experimental conditions on the analysis of HIV pol 454 pyrosequences generated from plasma using (1) the "Insertion-deletion (indel) and Carry Forward Correction" (ICC) pipeline, which clusters sequence reads using a nonsubstitution approach and can correct for indels and carry forward errors, and (2) the "Primer Identification (ID)" method, which facilitates construction of a consensus sequence to correct for sequencing errors and allelic skewing. The Primer ID and ICC methods produced similar estimates of viral diversity, but differed in the number of sequence variants generated. Sequence preparation for ICC was comparably simple, but was limited by an inability to assess the number of templates analyzed and allelic skewing. The more costly Primer ID method corrected for allelic skewing and provided the number of viral templates analyzed, which revealed that amplifiable HIV templates varied across specimens and did not correlate with clinical viral load. This latter observation highlights the value of the Primer ID method, which by determining the number of templates amplified, enables more accurate assessment of minority species in the virus population, which may be relevant to prescribing effective antiretroviral therapy.
Collapse
Affiliation(s)
- Paul Hughes
- Center for Global Infectious Disease Research, Seattle Children's Research Institute, Seattle, Washington
| | - Wenjie Deng
- Department of Microbiology, University of Washington, Seattle, Washington
| | - Scott C. Olson
- Center for Global Infectious Disease Research, Seattle Children's Research Institute, Seattle, Washington
- Department of Pediatrics, University of Washington, Seattle, Washington
| | - Robert W. Coombs
- Department of Laboratory Medicine, University of Washington, Seattle, Washington
- Department of Medicine, University of Washington, Seattle, Washington
| | - Michael H. Chung
- Department of Laboratory Medicine, University of Washington, Seattle, Washington
- Department of Medicine, University of Washington, Seattle, Washington
| | - Lisa M. Frenkel
- Center for Global Infectious Disease Research, Seattle Children's Research Institute, Seattle, Washington
- Department of Pediatrics, University of Washington, Seattle, Washington
- Department of Laboratory Medicine, University of Washington, Seattle, Washington
- Department of Global Health, University of Washington, Seattle, Washington
| |
Collapse
|
21
|
Kou R, Lam H, Duan H, Ye L, Jongkam N, Chen W, Zhang S, Li S. Benefits and Challenges with Applying Unique Molecular Identifiers in Next Generation Sequencing to Detect Low Frequency Mutations. PLoS One 2016; 11:e0146638. [PMID: 26752634 PMCID: PMC4709065 DOI: 10.1371/journal.pone.0146638] [Citation(s) in RCA: 66] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2015] [Accepted: 12/21/2015] [Indexed: 11/18/2022] Open
Abstract
Indexing individual template molecules with a unique identifier (UID) before PCR and deep sequencing is promising for detecting low frequency mutations, as true mutations could be distinguished from PCR errors or sequencing errors based on consensus among reads sharing same index. In an effort to develop a robust assay to detect from urine low-abundant bladder cancer cells carrying well-documented mutations, we have tested the idea first on a set of mock templates, with wild type and known mutants mixed at defined ratios. We have measured the combined error rate for PCR and Illumina sequencing at each nucleotide position of three exons, and demonstrated the power of a UID in distinguishing and correcting errors. In addition, we have demonstrated that PCR sampling bias, rather than PCR errors, challenges the UID-deep sequencing method in faithfully detecting low frequency mutation.
Collapse
Affiliation(s)
- Ruqin Kou
- Department of Development, GENEWIZ LLC, 115 Corporate Blvd., South Plainfield, NJ, 07080, United States of America
| | - Ham Lam
- Department of Development, GENEWIZ LLC, 115 Corporate Blvd., South Plainfield, NJ, 07080, United States of America
| | - Hairong Duan
- Department of Bioinformatics, GENEWIZ CN, 218 Xinghu Street, Suzhou, Jiangsu, 215123, China
| | - Li Ye
- Department of Bioinformatics, GENEWIZ CN, 218 Xinghu Street, Suzhou, Jiangsu, 215123, China
| | - Narisra Jongkam
- Department of Development, GENEWIZ LLC, 115 Corporate Blvd., South Plainfield, NJ, 07080, United States of America
| | - Weizhi Chen
- Department of Bioinformatics, GENEWIZ CN, 218 Xinghu Street, Suzhou, Jiangsu, 215123, China
| | - Shifang Zhang
- Department of Development, GENEWIZ LLC, 115 Corporate Blvd., South Plainfield, NJ, 07080, United States of America
| | - Shihong Li
- Department of Development, GENEWIZ LLC, 115 Corporate Blvd., South Plainfield, NJ, 07080, United States of America
- * E-mail:
| |
Collapse
|
22
|
A Comprehensive Analysis of Primer IDs to Study Heterogeneous HIV-1 Populations. J Mol Biol 2015; 428:238-250. [PMID: 26711506 DOI: 10.1016/j.jmb.2015.12.012] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2015] [Revised: 11/25/2015] [Accepted: 12/16/2015] [Indexed: 01/01/2023]
Abstract
Determining the composition of viral populations is becoming increasingly important in the field of medical virology. While recently developed computational tools for viral haplotype analysis allow for correcting sequencing errors, they do not always allow for the removal of errors occurring in the upstream experimental protocol, such as PCR errors. Primer IDs (pIDs) are one method to address this problem by harnessing redundant template resampling for error correction. By using a reference mixture of five HIV-1 strains, we show how pIDs can be useful for estimating key experimental parameters, such as the substitution rate of the PCR process and the reverse transcription (RT) error rate. In addition, we introduce a hidden Markov model for determining the recombination rate of the RT PCR process. We found no strong sequence-specific bias in pID abundances (the same RT efficiencies as compared to commonly used short, specific RT primers) and no effects of pIDs on the estimated distribution of the references viruses.
Collapse
|
23
|
Yaari G, Kleinstein SH. Practical guidelines for B-cell receptor repertoire sequencing analysis. Genome Med 2015; 7:121. [PMID: 26589402 PMCID: PMC4654805 DOI: 10.1186/s13073-015-0243-2] [Citation(s) in RCA: 151] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
High-throughput sequencing of B-cell immunoglobulin repertoires is increasingly being applied to gain insights into the adaptive immune response in healthy individuals and in those with a wide range of diseases. Recent applications include the study of autoimmunity, infection, allergy, cancer and aging. As sequencing technologies continue to improve, these repertoire sequencing experiments are producing ever larger datasets, with tens- to hundreds-of-millions of sequences. These data require specialized bioinformatics pipelines to be analyzed effectively. Numerous methods and tools have been developed to handle different steps of the analysis, and integrated software suites have recently been made available. However, the field has yet to converge on a standard pipeline for data processing and analysis. Common file formats for data sharing are also lacking. Here we provide a set of practical guidelines for B-cell receptor repertoire sequencing analysis, starting from raw sequencing reads and proceeding through pre-processing, determination of population structure, and analysis of repertoire properties. These include methods for unique molecular identifiers and sequencing error correction, V(D)J assignment and detection of novel alleles, clonal assignment, lineage tree construction, somatic hypermutation modeling, selection analysis, and analysis of stereotyped or convergent responses. The guidelines presented here highlight the major steps involved in the analysis of B-cell repertoire sequencing data, along with recommendations on how to avoid common pitfalls.
Collapse
Affiliation(s)
- Gur Yaari
- Bioengineering Program, Faculty of Engineering, Bar-Ilan University, 5290002, Ramat Gan, Israel.
| | - Steven H Kleinstein
- Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06511, USA. .,Departments of Pathology and Immunobiology, Yale University School of Medicine, New Haven, CT, 06520, USA.
| |
Collapse
|
24
|
Primer ID Validates Template Sampling Depth and Greatly Reduces the Error Rate of Next-Generation Sequencing of HIV-1 Genomic RNA Populations. J Virol 2015; 89:8540-55. [PMID: 26041299 DOI: 10.1128/jvi.00522-15] [Citation(s) in RCA: 94] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2015] [Accepted: 05/30/2015] [Indexed: 12/29/2022] Open
Abstract
UNLABELLED Validating the sampling depth and reducing sequencing errors are critical for studies of viral populations using next-generation sequencing (NGS). We previously described the use of Primer ID to tag each viral RNA template with a block of degenerate nucleotides in the cDNA primer. We now show that low-abundance Primer IDs (offspring Primer IDs) are generated due to PCR/sequencing errors. These artifactual Primer IDs can be removed using a cutoff model for the number of reads required to make a template consensus sequence. We have modeled the fraction of sequences lost due to Primer ID resampling. For a typical sequencing run, less than 10% of the raw reads are lost to offspring Primer ID filtering and resampling. The remaining raw reads are used to correct for PCR resampling and sequencing errors. We also demonstrate that Primer ID reveals bias intrinsic to PCR, especially at low template input or utilization. cDNA synthesis and PCR convert ca. 20% of RNA templates into recoverable sequences, and 30-fold sequence coverage recovers most of these template sequences. We have directly measured the residual error rate to be around 1 in 10,000 nucleotides. We use this error rate and the Poisson distribution to define the cutoff to identify preexisting drug resistance mutations at low abundance in an HIV-infected subject. Collectively, these studies show that >90% of the raw sequence reads can be used to validate template sampling depth and to dramatically reduce the error rate in assessing a genetically diverse viral population using NGS. IMPORTANCE Although next-generation sequencing (NGS) has revolutionized sequencing strategies, it suffers from serious limitations in defining sequence heterogeneity in a genetically diverse population, such as HIV-1 due to PCR resampling and PCR/sequencing errors. The Primer ID approach reveals the true sampling depth and greatly reduces errors. Knowing the sampling depth allows the construction of a model of how to maximize the recovery of sequences from input templates and to reduce resampling of the Primer ID so that appropriate multiplexing can be included in the experimental design. With the defined sampling depth and measured error rate, we are able to assign cutoffs for the accurate detection of minority variants in viral populations. This approach allows the power of NGS to be realized without having to guess about sampling depth or to ignore the problem of PCR resampling, while also being able to correct most of the errors in the data set.
Collapse
|
25
|
Brodin J, Hedskog C, Heddini A, Benard E, Neher RA, Mild M, Albert J. Challenges with using primer IDs to improve accuracy of next generation sequencing. PLoS One 2015; 10:e0119123. [PMID: 25741706 PMCID: PMC4351057 DOI: 10.1371/journal.pone.0119123] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2014] [Accepted: 12/23/2014] [Indexed: 01/09/2023] Open
Abstract
Next generation sequencing technologies, like ultra-deep pyrosequencing (UDPS), allows detailed investigation of complex populations, like RNA viruses, but its utility is limited by errors introduced during sample preparation and sequencing. By tagging each individual cDNA molecule with barcodes, referred to as Primer IDs, before PCR and sequencing these errors could theoretically be removed. Here we evaluated the Primer ID methodology on 257,846 UDPS reads generated from a HIV-1 SG3Δenv plasmid clone and plasma samples from three HIV-infected patients. The Primer ID consisted of 11 randomized nucleotides, 4,194,304 combinations, in the primer for cDNA synthesis that introduced a unique sequence tag into each cDNA molecule. Consensus template sequences were constructed for reads with Primer IDs that were observed three or more times. Despite high numbers of input template molecules, the number of consensus template sequences was low. With 10,000 input molecules for the clone as few as 97 consensus template sequences were obtained due to highly skewed frequency of resampling. Furthermore, the number of sequenced templates was overestimated due to PCR errors in the Primer IDs. Finally, some consensus template sequences were erroneous due to hotspots for UDPS errors. The Primer ID methodology has the potential to provide highly accurate deep sequencing. However, it is important to be aware that there are remaining challenges with the methodology. In particular it is important to find ways to obtain a more even frequency of resampling of template molecules as well as to identify and remove artefactual consensus template sequences that have been generated by PCR errors in the Primer IDs.
Collapse
Affiliation(s)
- Johanna Brodin
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
- * E-mail:
| | - Charlotte Hedskog
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
| | - Alexander Heddini
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
| | - Emmanuel Benard
- Max Planck Institute for Developmental Biology, Tuebingen, Germany
| | - Richard A. Neher
- Max Planck Institute for Developmental Biology, Tuebingen, Germany
| | - Mattias Mild
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
- Unit for Support, Swedish Institute for Communicable Disease Control, Stockholm, Sweden
| | - Jan Albert
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
- Department of Clinical Microbiology, Karolinska University Hospital, Stockholm, Sweden
| |
Collapse
|
26
|
Yourstone SM, Lundberg DS, Dangl JL, Jones CD. MT-Toolbox: improved amplicon sequencing using molecule tags. BMC Bioinformatics 2014; 15:284. [PMID: 25149069 PMCID: PMC4153912 DOI: 10.1186/1471-2105-15-284] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2014] [Accepted: 08/12/2014] [Indexed: 01/18/2023] Open
Abstract
Background Short oligonucleotides can be used as markers to tag and track DNA sequences. For example, barcoding techniques (i.e. Multiplex Identifiers or Indexing) use short oligonucleotides to distinguish between reads from different DNA samples pooled for high-throughput sequencing. A similar technique called molecule tagging uses the same principles but is applied to individual DNA template molecules. Each template molecule is tagged with a unique oligonucleotide prior to polymerase chain reaction. The resulting amplicon sequences can be traced back to their original templates by their oligonucleotide tag. Consensus building from sequences sharing the same tag enables inference of original template molecules thereby reducing effects of sequencing error and polymerase chain reaction bias. Several independent groups have developed similar protocols for molecule tagging; however, user-friendly software for build consensus sequences from molecule tagged reads is not readily available or is highly specific for a particular protocol. Results MT-Toolbox recognizes oligonucleotide tags in amplicons and infers the correct template sequence. On a set of molecule tagged test reads, MT-Toolbox generates sequences having on average 0.00047 errors per base. MT-Toolbox includes a graphical user interface, command line interface, and options for speed and accuracy maximization. It can be run in serial on a standard personal computer or in parallel on a Load Sharing Facility based cluster system. An optional plugin provides features for common 16S metagenome profiling analysis such as chimera filtering, building operational taxonomic units, contaminant removal, and taxonomy assignments. Conclusions MT-Toolbox provides an accessible, user-friendly environment for analysis of molecule tagged reads thereby reducing technical errors and polymerase chain reaction bias. These improvements reduce noise and allow for greater precision in single amplicon sequencing experiments. Electronic supplementary material The online version of this article (doi:10.1186/1471-2105-15-284) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Scott M Yourstone
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina, Chapel Hill 27599, USA.
| | | | | | | |
Collapse
|