1
|
Connor R, Shakya M, Yarmosh DA, Maier W, Martin R, Bradford R, Brister JR, Chain PSG, Copeland CA, di Iulio J, Hu B, Ebert P, Gunti J, Jin Y, Katz KS, Kochergin A, LaRosa T, Li J, Li PE, Lo CC, Rashid S, Maiorova ES, Xiao C, Zalunin V, Purcell L, Pruitt KD. Recommendations for Uniform Variant Calling of SARS-CoV-2 Genome Sequence across Bioinformatic Workflows. Viruses 2024; 16:430. [PMID: 38543795 PMCID: PMC10975397 DOI: 10.3390/v16030430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 02/12/2024] [Accepted: 02/16/2024] [Indexed: 04/01/2024] Open
Abstract
Genomic sequencing of clinical samples to identify emerging variants of SARS-CoV-2 has been a key public health tool for curbing the spread of the virus. As a result, an unprecedented number of SARS-CoV-2 genomes were sequenced during the COVID-19 pandemic, which allowed for rapid identification of genetic variants, enabling the timely design and testing of therapies and deployment of new vaccine formulations to combat the new variants. However, despite the technological advances of deep sequencing, the analysis of the raw sequence data generated globally is neither standardized nor consistent, leading to vastly disparate sequences that may impact identification of variants. Here, we show that for both Illumina and Oxford Nanopore sequencing platforms, downstream bioinformatic protocols used by industry, government, and academic groups resulted in different virus sequences from same sample. These bioinformatic workflows produced consensus genomes with differences in single nucleotide polymorphisms, inclusion and exclusion of insertions, and/or deletions, despite using the same raw sequence as input datasets. Here, we compared and characterized such discrepancies and propose a specific suite of parameters and protocols that should be adopted across the field. Consistent results from bioinformatic workflows are fundamental to SARS-CoV-2 and future pathogen surveillance efforts, including pandemic preparation, to allow for a data-driven and timely public health response.
Collapse
Affiliation(s)
- Ryan Connor
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA; (R.C.); (J.R.B.); (J.G.); (Y.J.); (K.S.K.); (A.K.); (C.X.); (V.Z.)
| | - Migun Shakya
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA; (M.S.); (P.S.G.C.); (B.H.); (P.-E.L.); (C.-C.L.)
| | - David A. Yarmosh
- American Type Culture Collection, Manassas, VA 20110, USA; (D.A.Y.); (R.B.); (S.R.)
- BEI Resources, Manassas, VA 20110, USA
| | - Wolfgang Maier
- Galaxy Europe Team, University of Freiburg, 79085 Freiburg, Germany;
| | - Ross Martin
- Clinical Virology Department, Gilead Sciences, Foster City, CA 94404, USA; (R.M.); (J.L.); (E.S.M.)
| | - Rebecca Bradford
- American Type Culture Collection, Manassas, VA 20110, USA; (D.A.Y.); (R.B.); (S.R.)
- BEI Resources, Manassas, VA 20110, USA
| | - J. Rodney Brister
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA; (R.C.); (J.R.B.); (J.G.); (Y.J.); (K.S.K.); (A.K.); (C.X.); (V.Z.)
| | - Patrick S. G. Chain
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA; (M.S.); (P.S.G.C.); (B.H.); (P.-E.L.); (C.-C.L.)
| | | | - Julia di Iulio
- Vir Biotechnology Inc., San Francisco, CA 94158, USA; (J.d.I.); (L.P.)
| | - Bin Hu
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA; (M.S.); (P.S.G.C.); (B.H.); (P.-E.L.); (C.-C.L.)
| | - Philip Ebert
- Eli Lilly and Company, Indianapolis, IN 46225, USA;
| | - Jonathan Gunti
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA; (R.C.); (J.R.B.); (J.G.); (Y.J.); (K.S.K.); (A.K.); (C.X.); (V.Z.)
| | - Yumi Jin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA; (R.C.); (J.R.B.); (J.G.); (Y.J.); (K.S.K.); (A.K.); (C.X.); (V.Z.)
| | - Kenneth S. Katz
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA; (R.C.); (J.R.B.); (J.G.); (Y.J.); (K.S.K.); (A.K.); (C.X.); (V.Z.)
| | - Andrey Kochergin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA; (R.C.); (J.R.B.); (J.G.); (Y.J.); (K.S.K.); (A.K.); (C.X.); (V.Z.)
| | - Tré LaRosa
- Deloitte Consulting LLP, Rosslyn, VA 22209, USA; (C.A.C.); (T.L.)
| | - Jiani Li
- Clinical Virology Department, Gilead Sciences, Foster City, CA 94404, USA; (R.M.); (J.L.); (E.S.M.)
| | - Po-E Li
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA; (M.S.); (P.S.G.C.); (B.H.); (P.-E.L.); (C.-C.L.)
| | - Chien-Chi Lo
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA; (M.S.); (P.S.G.C.); (B.H.); (P.-E.L.); (C.-C.L.)
| | - Sujatha Rashid
- American Type Culture Collection, Manassas, VA 20110, USA; (D.A.Y.); (R.B.); (S.R.)
| | - Evguenia S. Maiorova
- Clinical Virology Department, Gilead Sciences, Foster City, CA 94404, USA; (R.M.); (J.L.); (E.S.M.)
| | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA; (R.C.); (J.R.B.); (J.G.); (Y.J.); (K.S.K.); (A.K.); (C.X.); (V.Z.)
| | - Vadim Zalunin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA; (R.C.); (J.R.B.); (J.G.); (Y.J.); (K.S.K.); (A.K.); (C.X.); (V.Z.)
| | - Lisa Purcell
- Vir Biotechnology Inc., San Francisco, CA 94158, USA; (J.d.I.); (L.P.)
| | - Kim D. Pruitt
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA; (R.C.); (J.R.B.); (J.G.); (Y.J.); (K.S.K.); (A.K.); (C.X.); (V.Z.)
| |
Collapse
|
2
|
Camp JV, Puchhammer-Stöckl E, Aberle SW, Buchta C. Virus sequencing performance during the SARS-CoV-2 pandemic: a retrospective analysis of data from multiple rounds of external quality assessment in Austria. Front Mol Biosci 2024; 11:1327699. [PMID: 38375507 PMCID: PMC10875003 DOI: 10.3389/fmolb.2024.1327699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Accepted: 01/03/2024] [Indexed: 02/21/2024] Open
Abstract
Introduction: A notable feature of the 2019 coronavirus disease (COVID-19) pandemic was the widespread use of whole genome sequencing (WGS) to monitor severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infections. Countries around the world relied on sequencing and other forms of variant detection to perform contact tracing and monitor changes in the virus genome, in the hopes that epidemic waves caused by variants would be detected and managed earlier. As sequencing was encouraged and rewarded by the government in Austria, but represented a new technicque for many laboratories, we designed an external quality assessment (EQA) scheme to monitor the accuracy of WGS and assist laboratories in validating their methods. Methods: We implemented SARS-CoV-2 WGS EQAs in Austria and report the results from 7 participants over 5 rounds from February 2021 until June 2023. The participants received sample material, sequenced genomes with routine methods, and provided the sequences as well as information about mutations and lineages. Participants were evaluated on the completeness and accuracy of the submitted sequence and the ability to analyze and interpret sequencing data. Results: The results indicate that performance was excellent with few exceptions, and these exceptions showed improvement over time. We extend our findings to infer that most publicly available sequences are accurate within ≤1 nucleotide, somewhat randomly distributed through the genome. Conclusion: WGS continues to be used for SARS-CoV-2 surveillance, and will likely be instrumental in future outbreak scenarios. We identified hurdles in building next-generation sequencing capacity in diagnostic laboratories. EQAs will help individual laboratories maintain high quality next-generation sequencing output, and strengthen variant monitoring and molecular epidemiology efforts.
Collapse
Affiliation(s)
- Jeremy V Camp
- Center for Virology, Medical University of Vienna, Vienna, Austria
| | | | - Stephan W Aberle
- Center for Virology, Medical University of Vienna, Vienna, Austria
| | - Christoph Buchta
- Austrian Association for Quality Assurance and Standardization of Medical and Diagnostic Tests (ÖQUASTA), Vienna, Austria
| |
Collapse
|
3
|
Crooks KR, Farwell Hagman KD, Mandelker D, Santani A, Schmidt RJ, Temple-Smolkin RL, Lincoln SE. Recommendations for Next-Generation Sequencing Germline Variant Confirmation: A Joint Report of the Association for Molecular Pathology and National Society of Genetic Counselors. J Mol Diagn 2023; 25:411-427. [PMID: 37207865 DOI: 10.1016/j.jmoldx.2023.03.012] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 02/27/2023] [Accepted: 03/30/2023] [Indexed: 05/21/2023] Open
Abstract
Clinical laboratory implementation of next-generation sequencing (NGS)-based constitutional genetic testing has been rapid and widespread. In the absence of widely adopted comprehensive guidance, there remains substantial variability among laboratories in the practice of NGS. One issue of sustained discussion in the field is whether and to what extent orthogonal confirmation of genetic variants identified by NGS is necessary or helpful. The Association for Molecular Pathology Clinical Practice Committee convened the NGS Germline Variant Confirmation Working Group to assess current evidence regarding orthogonal confirmation and to establish recommendations for standardizing orthogonal confirmation practices to support quality patient care. On the basis of the results of a survey of the literature, a survey of laboratory practices, and subject expert matter consensus, eight recommendations are presented, providing a common framework for clinical laboratory professionals to develop or refine individualized laboratory policies and procedures regarding orthogonal confirmation of germline variants detected by NGS.
Collapse
Affiliation(s)
- Kristy R Crooks
- NGS Germline Variant Confirmation Working Group of the Clinical Practice Committee, Association for Molecular Pathology, Rockville, Maryland; Department of Pathology, University of Colorado Anschutz Medical Campus, Aurora, Colorado.
| | - Kelly D Farwell Hagman
- NGS Germline Variant Confirmation Working Group of the Clinical Practice Committee, Association for Molecular Pathology, Rockville, Maryland; Department of Clinical Diagnostics, Ambry Genetics, Aliso Viejo, California
| | - Diana Mandelker
- NGS Germline Variant Confirmation Working Group of the Clinical Practice Committee, Association for Molecular Pathology, Rockville, Maryland; Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, New York
| | - Avni Santani
- NGS Germline Variant Confirmation Working Group of the Clinical Practice Committee, Association for Molecular Pathology, Rockville, Maryland; LetsGetChecked, PrivaPath Diagnostics, Dublin, Ireland; Veritas Genetics, Danvers, Massachusetts
| | - Ryan J Schmidt
- NGS Germline Variant Confirmation Working Group of the Clinical Practice Committee, Association for Molecular Pathology, Rockville, Maryland; Department of Pathology and Laboratory Medicine, Children's Hospital Los Angeles, Los Angeles, California; Department of Pathology, Keck School of Medicine of the University of Southern California, Los Angeles, California
| | | | - Stephen E Lincoln
- NGS Germline Variant Confirmation Working Group of the Clinical Practice Committee, Association for Molecular Pathology, Rockville, Maryland; InVitae, Bethesda, Maryland
| |
Collapse
|
4
|
Performance evaluation of six popular short-read simulators. Heredity (Edinb) 2023; 130:55-63. [PMID: 36496447 PMCID: PMC9905089 DOI: 10.1038/s41437-022-00577-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 11/10/2022] [Accepted: 11/11/2022] [Indexed: 12/14/2022] Open
Abstract
High-throughput sequencing data enables the comprehensive study of genomes and the variation therein. Essential for the interpretation of this genomic data is a thorough understanding of the computational methods used for processing and analysis. Whereas "gold-standard" empirical datasets exist for this purpose in humans, synthetic (i.e., simulated) sequencing data can offer important insights into the capabilities and limitations of computational pipelines for any arbitrary species and/or study design-yet, the ability of read simulator software to emulate genomic characteristics of empirical datasets remains poorly understood. We here compare the performance of six popular short-read simulators-ART, DWGSIM, InSilicoSeq, Mason, NEAT, and wgsim-and discuss important considerations for selecting suitable models for benchmarking.
Collapse
|
5
|
Connor R, Yarmosh DA, Maier W, Shakya M, Martin R, Bradford R, Brister JR, Chain PS, Copeland CA, di Iulio J, Hu B, Ebert P, Gunti J, Jin Y, Katz KS, Kochergin A, LaRosa T, Li J, Li PE, Lo CC, Rashid S, Maiorova ES, Xiao C, Zalunin V, Pruitt KD. Towards increased accuracy and reproducibility in SARS-CoV-2 next generation sequence analysis for public health surveillance. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2022:2022.11.03.515010. [PMID: 36380755 PMCID: PMC9645426 DOI: 10.1101/2022.11.03.515010] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
During the COVID-19 pandemic, SARS-CoV-2 surveillance efforts integrated genome sequencing of clinical samples to identify emergent viral variants and to support rapid experimental examination of genome-informed vaccine and therapeutic designs. Given the broad range of methods applied to generate new viral genomes, it is critical that consensus and variant calling tools yield consistent results across disparate pipelines. Here we examine the impact of sequencing technologies (Illumina and Oxford Nanopore) and 7 different downstream bioinformatic protocols on SARS-CoV-2 variant calling as part of the NIH Accelerating COVID-19 Therapeutic Interventions and Vaccines (ACTIV) Tracking Resistance and Coronavirus Evolution (TRACE) initiative, a public-private partnership established to address the COVID-19 outbreak. Our results indicate that bioinformatic workflows can yield consensus genomes with different single nucleotide polymorphisms, insertions, and/or deletions even when using the same raw sequence input datasets. We introduce the use of a specific suite of parameters and protocols that greatly improves the agreement among pipelines developed by diverse organizations. Such consistency among bioinformatic pipelines is fundamental to SARS-CoV-2 and future pathogen surveillance efforts. The application of analysis standards is necessary to more accurately document phylogenomic trends and support data-driven public health responses.
Collapse
Affiliation(s)
- Ryan Connor
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - David A Yarmosh
- American Type Culture Collection, 10807 University Blvd, Manassas, VA 20110, USA
- BEI Resources
| | - Wolfgang Maier
- Galaxy Europe Team, University of Freiburg, Freiburg, Germany
| | - Migun Shakya
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545 USA
| | - Ross Martin
- Clinical Virology Department, Gilead Sciences, 333 Lakeside Dr, Foster City, CA 94404, USA
| | - Rebecca Bradford
- American Type Culture Collection, 10807 University Blvd, Manassas, VA 20110, USA
- BEI Resources
| | - J Rodney Brister
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Patrick Sg Chain
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545 USA
| | - Courtney A Copeland
- Deloitte Consulting LLP, 1919 North Lynn St, Suite 1500, Rosslyn, VA 22209 USA
| | | | - Bin Hu
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545 USA
| | | | - Jonathan Gunti
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Yumi Jin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Kenneth S Katz
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Andrey Kochergin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Tré LaRosa
- Deloitte Consulting LLP, 1919 North Lynn St, Suite 1500, Rosslyn, VA 22209 USA
| | - Jiani Li
- Clinical Virology Department, Gilead Sciences, 333 Lakeside Dr, Foster City, CA 94404, USA
| | - Po-E Li
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545 USA
| | - Chien-Chi Lo
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545 USA
| | - Sujatha Rashid
- American Type Culture Collection, 10807 University Blvd, Manassas, VA 20110, USA
| | - Evguenia S Maiorova
- Clinical Virology Department, Gilead Sciences, 333 Lakeside Dr, Foster City, CA 94404, USA
| | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Vadim Zalunin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Kim D Pruitt
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
6
|
Malovitski K, Sarig O, Assaf S, Mohamad J, Malki L, Bergson S, Peled A, Eskin-Schwartz M, Gat A, Pavlovsky M, Sprecher E. Loss-of-function variants in KLF4 underlie autosomal dominant palmoplantar keratoderma. Genet Med 2022; 24:1085-1095. [PMID: 35168889 DOI: 10.1016/j.gim.2022.01.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Revised: 01/09/2022] [Accepted: 01/14/2022] [Indexed: 12/11/2022] Open
Abstract
PURPOSE Palmoplantar keratodermas (PPKs) form a group of disorders characterized by thickening of palm and sole skin. Over the past 2 decades, many types of inherited PPKs have been found to result from abnormal expression, processing, or function of adhesion proteins. METHODS We used exome and direct sequencing to detect causative pathogenic variants. Functional analysis of these variants was conducted using reverse transcription quantitative polymerase chain reaction, immunofluorescence confocal microscopy, immunoblotting, a promoter reporter assay, and chromatin immunoprecipitation. RESULTS We identified 2 heterozygous variants (c.1226A>G and c.633_634dupGT) in KLF4 in 3 individuals from 2 different unrelated families affected by a dominant form of PPK. Immunofluorescence staining for a number of functional markers revealed reduced epidermal DSG1 expression in patients harboring heterozygous KLF4 variants. Accordingly, human keratinocytes either transfected with constructs expressing these variants or downregulated for KLF4 displayed reduced DSG1 expression, which in turn has previously been found to be associated with PPK. A chromatin immunoprecipitation assay confirmed direct binding of KLF4 to the DSG1 promoter region. The ability of mutant KLF4 to transactivate the DSG1 promoter was significantly decreased when compared with wild-type KLF4. CONCLUSION Loss-of-function variants in KLF4 cause a novel form of dominant PPK and show its importance in the regulation of epidermal differentiation.
Collapse
Affiliation(s)
- Kiril Malovitski
- Division of Dermatology, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel; Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Ofer Sarig
- Division of Dermatology, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel
| | - Sari Assaf
- Division of Dermatology, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel; Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Janan Mohamad
- Division of Dermatology, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel; Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Liron Malki
- Division of Dermatology, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel
| | - Shir Bergson
- Division of Dermatology, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel; Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Alon Peled
- Division of Dermatology, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel; Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Marina Eskin-Schwartz
- Faculty of Health Sciences, Ben Gurion University of the Negev, Be'er Sheva, Israel; Genetic Institute, Soroka University Medical Center, Be'er Sheva, Israel
| | - Andrea Gat
- Institute of Pathology, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel
| | - Mor Pavlovsky
- Division of Dermatology, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel
| | - Eli Sprecher
- Division of Dermatology, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel; Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.
| |
Collapse
|
7
|
Establishment of reference standards for multifaceted mosaic variant analysis. Sci Data 2022; 9:35. [PMID: 35115554 PMCID: PMC8813952 DOI: 10.1038/s41597-022-01133-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Accepted: 12/20/2021] [Indexed: 11/21/2022] Open
Abstract
Detection of somatic mosaicism in non-proliferative cells is a new challenge in genome research, however, the accuracy of current detection strategies remains uncertain due to the lack of a ground truth. Herein, we sought to present a set of ultra-deep sequenced WES data based on reference standards generated by cell line mixtures, providing a total of 386,613 mosaic single-nucleotide variants (SNVs) and insertion-deletion mutations (INDELs) with variant allele frequencies (VAFs) ranging from 0.5% to 56%, as well as 35,113,417 non-variant and 19,936 germline variant sites as a negative control. The whole reference standard set mimics the cumulative aspect of mosaic variant acquisition such as in the early developmental stage owing to the progressive mixing of cell lines with established genotypes, ultimately unveiling 741 possible inter-sample relationships with respect to variant sharing and asymmetry in VAFs. We expect that our reference data will be essential for optimizing the current use of mosaic variant detection strategies and for developing algorithms to enable future improvements. Measurement(s) | genotype | Technology Type(s) | DNA sequencing | Factor Type(s) | genotyping | Sample Characteristic - Organism | Homo sapiens | Sample Characteristic - Environment | cell line |
Machine-accessible metadata file describing the reported data: 10.6084/m9.figshare.16970041
Collapse
|
8
|
Eslami Rasekh M, Hernández Y, Drinan SD, Fuxman Bass J, Benson G. Genome-wide characterization of human minisatellite VNTRs: population-specific alleles and gene expression differences. Nucleic Acids Res 2021; 49:4308-4324. [PMID: 33849068 PMCID: PMC8096271 DOI: 10.1093/nar/gkab224] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Revised: 03/06/2021] [Accepted: 03/18/2021] [Indexed: 11/12/2022] Open
Abstract
Variable Number Tandem Repeats (VNTRs) are tandem repeat (TR) loci that vary in copy number across a population. Using our program, VNTRseek, we analyzed human whole genome sequencing datasets from 2770 individuals in order to detect minisatellite VNTRs, i.e., those with pattern sizes ≥7 bp. We detected 35 638 VNTR loci and classified 5676 as commonly polymorphic (i.e. with non-reference alleles occurring in >5% of the population). Commonly polymorphic VNTR loci were found to be enriched in genomic regions with regulatory function, i.e. transcription start sites and enhancers. Investigation of the commonly polymorphic VNTRs in the context of population ancestry revealed that 1096 loci contained population-specific alleles and that those could be used to classify individuals into super-populations with near-perfect accuracy. Search for quantitative trait loci (eQTLs), among the VNTRs proximal to genes, indicated that in 187 genes expression differences correlated with VNTR genotype. We validated our predictions in several ways, including experimentally, through the identification of predicted alleles in long reads, and by comparisons showing consistency between sequencing platforms. This study is the most comprehensive analysis of minisatellite VNTRs in the human population to date.
Collapse
Affiliation(s)
| | - Yözen Hernández
- Graduate Program in Bioinformatics, Boston University, Boston, MA 02215, USA
| | | | - Juan I Fuxman Bass
- Graduate Program in Bioinformatics, Boston University, Boston, MA 02215, USA
- Department of Biology, Boston University, Boston, MA 02215, USA
| | - Gary Benson
- Graduate Program in Bioinformatics, Boston University, Boston, MA 02215, USA
- Department of Biology, Boston University, Boston, MA 02215, USA
- Department of Computer Science, Boston University, Boston, MA 02215, USA
| |
Collapse
|