1
|
Zhao S, Nakken S, Vodak D, Hovig E. FuSViz-visualization and interpretation of structural variation using cancer genomics and transcriptomics data. Nucleic Acids Res 2025; 53:gkaf078. [PMID: 39995037 PMCID: PMC11850231 DOI: 10.1093/nar/gkaf078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 01/05/2025] [Accepted: 01/29/2025] [Indexed: 02/26/2025] Open
Abstract
Structural variation (SV) is a frequent category of genetic alterations important for understanding cancer genome evolution and revealing key cancer driver events. With the development of high-throughput sequencing technologies, the ability to detect SVs of various sizes and types has improved, at both the DNA and RNA levels. However, SV calls are still prone to a considerable fraction of false positives, which necessitates visual inspection and manual curation as part of the quality control process. Identification of reliable and recurrent SVs in larger cohorts lends strength to revealing the driving roles of SVs in cancer development and to the discovery of potential diagnostic and prognostic biomarkers. Here, we present FuSViz, an application for visualization, interpretation, and prioritization of SVs. The tool provides multiple data view approaches in a user-friendly interface, allowing the investigation of prevalence and recurrence of SVs and relevant partner genes in a sample cohort. It integrates SV calls from DNA and RNA sequencing datasets to comprehensively illustrate the biological impact of SVs on the implicated genes and associated genomic regions. The functionality of FuSViz is intended for interrogation of both recurrent and private SVs, effectively assisting with pathogenicity evaluation and biomarker discovery in cancer sequencing projects.
Collapse
Affiliation(s)
- Sen Zhao
- Department of Pathology, Oslo University Hospital, 0424 Oslo, Norway
| | - Sigve Nakken
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, 0424 Oslo, Norway
- Centre for Cancer Cell Reprogramming, Institute of Clinical Medicine, Faculty of Medicine, University of Oslo, 0379 Oslo, Norway
- Center for Bioinformatics, Department of Informatics, University of Oslo, 0316 Oslo, Norway
| | - Daniel Vodak
- Department of Pathology, Oslo University Hospital, 0424 Oslo, Norway
| | - Eivind Hovig
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, 0424 Oslo, Norway
- Center for Bioinformatics, Department of Informatics, University of Oslo, 0316 Oslo, Norway
| |
Collapse
|
2
|
Linderman MD, Wallace J, van der Heyde A, Wieman E, Brey D, Shi Y, Hansen P, Shamsi Z, Liu J, Gelb BD, Bashir A. NPSV-deep: a deep learning method for genotyping structural variants in short read genome sequencing data. Bioinformatics 2024; 40:btae129. [PMID: 38444093 PMCID: PMC10955255 DOI: 10.1093/bioinformatics/btae129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 01/15/2024] [Accepted: 03/04/2024] [Indexed: 03/07/2024] Open
Abstract
MOTIVATION Structural variants (SVs) play a causal role in numerous diseases but can be difficult to detect and accurately genotype (determine zygosity) with short-read genome sequencing data (SRS). Improving SV genotyping accuracy in SRS data, particularly for the many SVs first detected with long-read sequencing, will improve our understanding of genetic variation. RESULTS NPSV-deep is a deep learning-based approach for genotyping previously reported insertion and deletion SVs that recasts this task as an image similarity problem. NPSV-deep predicts the SV genotype based on the similarity between pileup images generated from the actual SRS data and matching SRS simulations. We show that NPSV-deep consistently matches or improves upon the state-of-the-art for SV genotyping accuracy across different SV call sets, samples and variant types, including a 25% reduction in genotyping errors for the Genome-in-a-Bottle (GIAB) high-confidence SVs. NPSV-deep is not limited to the SVs as described; it improves deletion genotyping concordance a further 1.5 percentage points for GIAB SVs (92%) by automatically correcting imprecise/incorrectly described SVs. AVAILABILITY AND IMPLEMENTATION Python/C++ source code and pre-trained models freely available at https://github.com/mlinderm/npsv2.
Collapse
Affiliation(s)
- Michael D Linderman
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | - Jacob Wallace
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | - Alderik van der Heyde
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | - Eliza Wieman
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | - Daniel Brey
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | - Yiran Shi
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | - Peter Hansen
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | | | | | - Bruce D Gelb
- Mindich Child Health and Development Institute and the Departments of Pediatrics and Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, United States
| | - Ali Bashir
- Google, Mountain View, CA 94043, United States
| |
Collapse
|
3
|
Lecomte L, Árnyasi M, Ferchaud A, Kent M, Lien S, Stenløkk K, Sylvestre F, Bernatchez L, Mérot C. Investigating structural variant, indel and single nucleotide polymorphism differentiation between locally adapted Atlantic salmon populations. Evol Appl 2024; 17:e13653. [PMID: 38495945 PMCID: PMC10940791 DOI: 10.1111/eva.13653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 12/14/2023] [Accepted: 01/13/2024] [Indexed: 03/19/2024] Open
Abstract
Genomic structural variants (SVs) are now recognized as an integral component of intraspecific polymorphism and are known to contribute to evolutionary processes in various organisms. However, they are inherently difficult to detect and genotype from readily available short-read sequencing data, and therefore remain poorly documented in wild populations. Salmonid species displaying strong interpopulation variability in both life history traits and habitat characteristics, such as Atlantic salmon (Salmo salar), offer a prime context for studying adaptive polymorphism, but the contribution of SVs to fine-scale local adaptation has yet to be explored. Here, we performed a comparative analysis of SVs, single nucleotide polymorphisms (SNPs) and small indels (<50 bp) segregating in the Romaine and Puyjalon salmon, two putatively locally adapted populations inhabiting neighboring rivers (Québec, Canada) and showing pronounced variation in life history traits, namely growth, fecundity, and age at maturity and smoltification. We first catalogued polymorphism using a hybrid SV characterization approach pairing both short- (16X) and long-read sequencing (20X) for variant discovery with graph-based genotyping of SVs across 60 salmon genomes, along with characterization of SNPs and small indels from short reads. We thus identified 115,907 SVs, 8,777,832 SNPs and 1,089,321 short indels, with SVs covering 4.8 times more base pairs than SNPs. All three variant types revealed a highly congruent population structure and similar patterns of F ST and density variation along the genome. Finally, we performed outlier detection and redundancy analysis (RDA) to identify variants of interest in the putative local adaptation of Romaine and Puyjalon salmon. Genes located near these variants were enriched for biological processes related to nervous system function, suggesting that observed variation in traits such as age at smoltification could arise from differences in neural development. This study therefore demonstrates the feasibility of large-scale SV characterization and highlights its relevance for salmonid population genomics.
Collapse
Affiliation(s)
- Laurie Lecomte
- Institut de Biologie Intégrative et des Systèmes (IBIS)Université LavalQuébecCanada
- Département de BiologieUniversité LavalQuébecCanada
| | - Mariann Árnyasi
- Department of Animal and Aquacultural Sciences (IHA), Faculty of Life Sciences (BIOVIT), Centre for Integrative Genetics (CIGENE)Norwegian University of Life Sciences (NMBU)ÅsNorway
| | - Anne‐Laure Ferchaud
- Institut de Biologie Intégrative et des Systèmes (IBIS)Université LavalQuébecCanada
- Département de BiologieUniversité LavalQuébecCanada
- Present address:
Parks Canada, Office of the Chief Ecosystem ScientistQuébecQCCanada
| | - Matthew Kent
- Department of Animal and Aquacultural Sciences (IHA), Faculty of Life Sciences (BIOVIT), Centre for Integrative Genetics (CIGENE)Norwegian University of Life Sciences (NMBU)ÅsNorway
| | - Sigbjørn Lien
- Department of Animal and Aquacultural Sciences (IHA), Faculty of Life Sciences (BIOVIT), Centre for Integrative Genetics (CIGENE)Norwegian University of Life Sciences (NMBU)ÅsNorway
| | - Kristina Stenløkk
- Department of Animal and Aquacultural Sciences (IHA), Faculty of Life Sciences (BIOVIT), Centre for Integrative Genetics (CIGENE)Norwegian University of Life Sciences (NMBU)ÅsNorway
| | - Florent Sylvestre
- Institut de Biologie Intégrative et des Systèmes (IBIS)Université LavalQuébecCanada
- Département de BiologieUniversité LavalQuébecCanada
| | - Louis Bernatchez
- Institut de Biologie Intégrative et des Systèmes (IBIS)Université LavalQuébecCanada
- Département de BiologieUniversité LavalQuébecCanada
| | - Claire Mérot
- Institut de Biologie Intégrative et des Systèmes (IBIS)Université LavalQuébecCanada
- Département de BiologieUniversité LavalQuébecCanada
- Present address:
UMR 6553 Ecobio, OSUR, CNRSUniversité de RennesRennesFrance
| |
Collapse
|
4
|
Ahsan MU, Liu Q, Perdomo JE, Fang L, Wang K. A survey of algorithms for the detection of genomic structural variants from long-read sequencing data. Nat Methods 2023; 20:1143-1158. [PMID: 37386186 PMCID: PMC11208083 DOI: 10.1038/s41592-023-01932-w] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 05/31/2023] [Indexed: 07/01/2023]
Abstract
As long-read sequencing technologies are becoming increasingly popular, a number of methods have been developed for the discovery and analysis of structural variants (SVs) from long reads. Long reads enable detection of SVs that could not be previously detected from short-read sequencing, but computational methods must adapt to the unique challenges and opportunities presented by long-read sequencing. Here, we summarize over 50 long-read-based methods for SV detection, genotyping and visualization, and discuss how new telomere-to-telomere genome assemblies and pangenome efforts can improve the accuracy and drive the development of SV callers in the future.
Collapse
Affiliation(s)
- Mian Umair Ahsan
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Qian Liu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Jonathan Elliot Perdomo
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- School of Biomedical Engineering, Drexel University, Philadelphia, PA, USA
| | - Li Fang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Genetics and Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA.
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
5
|
Dolzhenko E, Weisburd B, Ibañez K, Rajan-Babu IS, Anyansi C, Bennett MF, Billingsley K, Carroll A, Clamons S, Danzi MC, Deshpande V, Ding J, Fazal S, Halman A, Jadhav B, Qiu Y, Richmond PA, Saunders CT, Scheffler K, van Vugt JJFA, Zwamborn RRAJ, Genomics England Research Consortium, Chong SS, Friedman JM, Tucci A, Rehm HL, Eberle MA. REViewer: haplotype-resolved visualization of read alignments in and around tandem repeats. Genome Med 2022; 14:84. [PMID: 35948990 PMCID: PMC9367089 DOI: 10.1186/s13073-022-01085-z] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Accepted: 07/11/2022] [Indexed: 01/28/2023] Open
Abstract
BACKGROUND Expansions of short tandem repeats are the cause of many neurogenetic disorders including familial amyotrophic lateral sclerosis, Huntington disease, and many others. Multiple methods have been recently developed that can identify repeat expansions in whole genome or exome sequencing data. Despite the widely recognized need for visual assessment of variant calls in clinical settings, current computational tools lack the ability to produce such visualizations for repeat expansions. Expanded repeats are difficult to visualize because they correspond to large insertions relative to the reference genome and involve many misaligning and ambiguously aligning reads. RESULTS We implemented REViewer, a computational method for visualization of sequencing data in genomic regions containing long repeat expansions and FlipBook, a companion image viewer designed for manual curation of large collections of REViewer images. To generate a read pileup, REViewer reconstructs local haplotype sequences and distributes reads to these haplotypes in a way that is most consistent with the fragment lengths and evenness of read coverage. To create appropriate training materials for onboarding new users, we performed a concordance study involving 12 scientists involved in short tandem repeat research. We used the results of this study to create a user guide that describes the basic principles of using REViewer as well as a guide to the typical features of read pileups that correspond to low confidence repeat genotype calls. Additionally, we demonstrated that REViewer can be used to annotate clinically relevant repeat interruptions by comparing visual assessment results of 44 FMR1 repeat alleles with the results of triplet repeat primed PCR. For 38 of these alleles, the results of visual assessment were consistent with triplet repeat primed PCR. CONCLUSIONS Read pileup plots generated by REViewer offer an intuitive way to visualize sequencing data in regions containing long repeat expansions. Laboratories can use REViewer and FlipBook to assess the quality of repeat genotype calls as well as to visually detect interruptions or other imperfections in the repeat sequence and the surrounding flanking regions. REViewer and FlipBook are available under open-source licenses at https://github.com/illumina/REViewer and https://github.com/broadinstitute/flipbook respectively.
Collapse
Affiliation(s)
- Egor Dolzhenko
- grid.185669.50000 0004 0507 3954Illumina Inc., San Diego, CA 92122 USA
| | - Ben Weisburd
- grid.66859.340000 0004 0546 1623Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, USA ,grid.32224.350000 0004 0386 9924Center for Genomic Medicine, Massachusetts General Hospital, Boston, USA
| | - Kristina Ibañez
- grid.4868.20000 0001 2171 1133William Harvey Research Institute, Queen Mary University of London, London, EC1M 6BQ UK
| | - Indhu-Shree Rajan-Babu
- grid.17091.3e0000 0001 2288 9830Department of Medical Genetics, University of British Columbia and Children’s & Women’s Hospital, Vancouver, BC V6H3N1 Canada ,grid.13097.3c0000 0001 2322 6764Department of Medical and Molecular Genetics, King’s College London, Strand, London, WC2R 2LS UK
| | - Christine Anyansi
- grid.185669.50000 0004 0507 3954Illumina Inc., San Diego, CA 92122 USA
| | - Mark F. Bennett
- grid.1042.70000 0004 0432 4889Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052 Australia ,grid.1008.90000 0001 2179 088XDepartment of Medical Biology, University of Melbourne, Parkville, VIC 3052 Australia ,grid.410678.c0000 0000 9374 3516Epilepsy Research Centre, Department of Medicine, University of Melbourne, Austin Health, Heidelberg, VIC 3084 Australia
| | - Kimberley Billingsley
- grid.419475.a0000 0000 9372 4913Center for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, MD USA ,grid.419475.a0000 0000 9372 4913Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD USA
| | - Ashley Carroll
- grid.185669.50000 0004 0507 3954Illumina Inc., San Diego, CA 92122 USA
| | - Samuel Clamons
- grid.185669.50000 0004 0507 3954Illumina Inc., San Diego, CA 92122 USA
| | - Matt C. Danzi
- grid.26790.3a0000 0004 1936 8606Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami, Miller School of Medicine, Miami, FL 33136 USA
| | - Viraj Deshpande
- grid.185669.50000 0004 0507 3954Illumina Inc., San Diego, CA 92122 USA
| | - Jinhui Ding
- grid.419475.a0000 0000 9372 4913Computational Biology Group, Laboratory of Neurogenetics, National Institute on Aging, NIH, Bethesda, MD 20892 USA
| | - Sarah Fazal
- grid.26790.3a0000 0004 1936 8606Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami, Miller School of Medicine, Miami, FL 33136 USA
| | - Andreas Halman
- grid.1055.10000000403978434Peter MacCallum Cancer Centre, Melbourne, VIC 3000 Australia ,grid.1008.90000 0001 2179 088XSir Peter MacCallum Department of Oncology, The University of Melbourne, Parkville, VIC 3010 Australia
| | - Bharati Jadhav
- grid.59734.3c0000 0001 0670 2351Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029 USA
| | - Yunjiang Qiu
- grid.185669.50000 0004 0507 3954Illumina Inc., San Diego, CA 92122 USA
| | - Phillip A. Richmond
- grid.414137.40000 0001 0684 7788BC Children’s Hospital Research Institute, Vancouver, BC V5Z 4H4 Canada
| | | | - Konrad Scheffler
- grid.185669.50000 0004 0507 3954Illumina Inc., San Diego, CA 92122 USA
| | - Joke J. F. A. van Vugt
- grid.5477.10000000120346234Department of Neurology, University Medical Center Utrecht Brain Center, Utrecht University, Utrecht, The Netherlands
| | - Ramona R. A. J. Zwamborn
- grid.5477.10000000120346234Department of Neurology, University Medical Center Utrecht Brain Center, Utrecht University, Utrecht, The Netherlands
| | | | - Samuel S. Chong
- grid.4280.e0000 0001 2180 6431Department of Pediatrics, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, 119228 Singapore ,grid.4280.e0000 0001 2180 6431Department of Obstetrics and Gynecology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, 119228 Singapore ,grid.412106.00000 0004 0621 9599Department of Laboratory Medicine, National University Hospital, Singapore, 119074 Singapore
| | - Jan M. Friedman
- grid.17091.3e0000 0001 2288 9830Department of Medical Genetics, University of British Columbia and Children’s & Women’s Hospital, Vancouver, BC V6H3N1 Canada
| | - Arianna Tucci
- grid.4868.20000 0001 2171 1133William Harvey Research Institute, Queen Mary University of London, London, EC1M 6BQ UK
| | - Heidi L. Rehm
- grid.66859.340000 0004 0546 1623Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, USA ,grid.32224.350000 0004 0386 9924Center for Genomic Medicine, Massachusetts General Hospital, Boston, USA
| | - Michael A. Eberle
- grid.185669.50000 0004 0507 3954Illumina Inc., San Diego, CA 92122 USA
| |
Collapse
|
6
|
Linderman MD, Paudyal C, Shakeel M, Kelley W, Bashir A, Gelb BD. NPSV: A simulation-driven approach to genotyping structural variants in whole-genome sequencing data. Gigascience 2021; 10:giab046. [PMID: 34195837 PMCID: PMC8246072 DOI: 10.1093/gigascience/giab046] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 05/04/2021] [Accepted: 06/07/2021] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Structural variants (SVs) play a causal role in numerous diseases but are difficult to detect and accurately genotype (determine zygosity) in whole-genome next-generation sequencing data. SV genotypers that assume that the aligned sequencing data uniformly reflect the underlying SV or use existing SV call sets as training data can only partially account for variant and sample-specific biases. RESULTS We introduce NPSV, a machine learning-based approach for genotyping previously discovered SVs that uses next-generation sequencing simulation to model the combined effects of the genomic region, sequencer, and alignment pipeline on the observed SV evidence. We evaluate NPSV alongside existing SV genotypers on multiple benchmark call sets. We show that NPSV consistently achieves or exceeds state-of-the-art genotyping accuracy across SV call sets, samples, and variant types. NPSV can specifically identify putative de novo SVs in a trio context and is robust to offset SV breakpoints. CONCLUSIONS Growing SV databases and the increasing availability of SV calls from long-read sequencing make stand-alone genotyping of previously identified SVs an increasingly important component of genome analyses. By treating potential biases as a "black box" that can be simulated, NPSV provides a framework for accurately genotyping a broad range of SVs in both targeted and genome-scale applications.
Collapse
Affiliation(s)
- Michael D Linderman
- Department of Computer Science, Middlebury College, 14 Old Chapel Road, Middlebury, VT 05753, USA
| | - Crystal Paudyal
- Department of Computer Science, Middlebury College, 14 Old Chapel Road, Middlebury, VT 05753, USA
| | - Musab Shakeel
- Department of Computer Science, Middlebury College, 14 Old Chapel Road, Middlebury, VT 05753, USA
| | - William Kelley
- Department of Computer Science, Middlebury College, 14 Old Chapel Road, Middlebury, VT 05753, USA
| | - Ali Bashir
- Google, 1600 Amphitheatre Parkway, Mountain View, CA 94043, USA
| | - Bruce D Gelb
- Mindich Child Health and Development Institute and the Departments of Pediatrics and Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, One Gustave Levy Place, Box 1040, New York, NY 10029, USA
| |
Collapse
|
7
|
Belyeu JR, Chowdhury M, Brown J, Pedersen BS, Cormier MJ, Quinlan AR, Layer RM. Samplot: a platform for structural variant visual validation and automated filtering. Genome Biol 2021; 22:161. [PMID: 34034781 PMCID: PMC8145817 DOI: 10.1186/s13059-021-02380-5] [Citation(s) in RCA: 59] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Accepted: 05/10/2021] [Indexed: 12/15/2022] Open
Abstract
Visual validation is an important step to minimize false-positive predictions from structural variant (SV) detection. We present Samplot, a tool for creating images that display the read depth and sequence alignments necessary to adjudicate purported SVs across samples and sequencing technologies. These images can be rapidly reviewed to curate large SV call sets. Samplot is applicable to many biological problems such as SV prioritization in disease studies, analysis of inherited variation, or de novo SV review. Samplot includes a machine learning package that dramatically decreases the number of false positives without human review. Samplot is available at https://github.com/ryanlayer/samplot .
Collapse
Affiliation(s)
- Jonathan R Belyeu
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
- Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Murad Chowdhury
- BioFrontiers Institute, University of Colorado, Boulder, CO, USA
| | - Joseph Brown
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
- Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Brent S Pedersen
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
- Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Michael J Cormier
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
- Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Aaron R Quinlan
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
- Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | - Ryan M Layer
- BioFrontiers Institute, University of Colorado, Boulder, CO, USA.
- Department of Computer Science, University of Colorado, Boulder, CO, USA.
| |
Collapse
|
8
|
Nattestad M, Aboukhalil R, Chin CS, Schatz MC. Ribbon: intuitive visualization for complex genomic variation. Bioinformatics 2021; 37:413-415. [PMID: 32766814 DOI: 10.1093/bioinformatics/btaa680] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 06/15/2020] [Accepted: 07/21/2020] [Indexed: 01/08/2023] Open
Abstract
SUMMARY Ribbon is an alignment visualization tool that shows how alignments are positioned within both the reference and read contexts, giving an intuitive view that enables a better understanding of structural variants and the read evidence supporting them. Ribbon was born out of a need to curate complex structural variant calls and determine whether each was well supported by long-read evidence, and it uses the same intuitive visualization method to shed light on contig alignments from genome-to-genome comparisons. AVAILABILITY AND IMPLEMENTATION Ribbon is freely available online at http://genomeribbon.com/ and is open-source at https://github.com/marianattestad/ribbon. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Maria Nattestad
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | | | | | - Michael C Schatz
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA.,Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA.,Department of Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| |
Collapse
|
9
|
Minoche AE, Lundie B, Peters GB, Ohnesorg T, Pinese M, Thomas DM, Zankl A, Roscioli T, Schonrock N, Kummerfeld S, Burnett L, Dinger ME, Cowley MJ. ClinSV: clinical grade structural and copy number variant detection from whole genome sequencing data. Genome Med 2021; 13:32. [PMID: 33632298 PMCID: PMC7908648 DOI: 10.1186/s13073-021-00841-x] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Accepted: 02/02/2021] [Indexed: 01/09/2023] Open
Abstract
Whole genome sequencing (WGS) has the potential to outperform clinical microarrays for the detection of structural variants (SV) including copy number variants (CNVs), but has been challenged by high false positive rates. Here we present ClinSV, a WGS based SV integration, annotation, prioritization, and visualization framework, which identified 99.8% of simulated pathogenic ClinVar CNVs > 10 kb and 11/11 pathogenic variants from matched microarrays. The false positive rate was low (1.5-4.5%) and reproducibility high (95-99%). In clinical practice, ClinSV identified reportable variants in 22 of 485 patients (4.7%) of which 35-63% were not detectable by current clinical microarray designs. ClinSV is available at https://github.com/KCCG/ClinSV .
Collapse
Affiliation(s)
- Andre E Minoche
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, 370 Victoria Street, Darlinghurst, NSW, Australia.
- St Vincent's Clinical School, UNSW, Sydney, NSW, Australia.
| | - Ben Lundie
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, 370 Victoria Street, Darlinghurst, NSW, Australia
| | - Greg B Peters
- Sydney Genome Diagnostics, The Children's Hospital at Westmead, Hawkesbury Road & Hainsworth Street, Westmead, NSW, Australia
| | - Thomas Ohnesorg
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, 370 Victoria Street, Darlinghurst, NSW, Australia
- Genome.One, Darlinghurst, NSW, Australia
| | - Mark Pinese
- Children's Cancer Institute, University of New South Wales, Randwick, Sydney, NSW, Australia
- School of Women's and Children's Health, UNSW, Sydney, NSW, Australia
| | - David M Thomas
- St Vincent's Clinical School, UNSW, Sydney, NSW, Australia
- The Kinghorn Cancer Centre and Cancer Division, Garvan Institute of Medical Research, 370 Victoria Street, Darlinghurst, NSW, Australia
| | - Andreas Zankl
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, 370 Victoria Street, Darlinghurst, NSW, Australia
- Department of Clinical Genetics, The Children's Hospital at Westmead, Hawkesbury Road, Westmead, NSW, Australia
- Sydney Medical School, The University of Sydney, Camperdown, NSW, Australia
| | - Tony Roscioli
- NSW Health Pathology Randwick, Sydney, NSW, Australia
- Centre for Clinical Genetics, Sydney Children's Hospital, Randwick, NSW, Australia
- Prince of Wales Clinical School, University of New South Wales, Sydney, NSW, Australia
- Neuroscience Research Australia, University of New South Wales, Randwick, Sydney, NSW, Australia
| | - Nicole Schonrock
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, 370 Victoria Street, Darlinghurst, NSW, Australia
- Genome.One, Darlinghurst, NSW, Australia
| | - Sarah Kummerfeld
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, 370 Victoria Street, Darlinghurst, NSW, Australia
- St Vincent's Clinical School, UNSW, Sydney, NSW, Australia
| | - Leslie Burnett
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, 370 Victoria Street, Darlinghurst, NSW, Australia
- St Vincent's Clinical School, UNSW, Sydney, NSW, Australia
- Genome.One, Darlinghurst, NSW, Australia
- Sydney Medical School, The University of Sydney, Camperdown, NSW, Australia
| | - Marcel E Dinger
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, 370 Victoria Street, Darlinghurst, NSW, Australia
- School of Biotechnology and Biomolecular Sciences, UNSW, Sydney, NSW, Australia
| | - Mark J Cowley
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, 370 Victoria Street, Darlinghurst, NSW, Australia.
- St Vincent's Clinical School, UNSW, Sydney, NSW, Australia.
- Children's Cancer Institute, University of New South Wales, Randwick, Sydney, NSW, Australia.
- School of Women's and Children's Health, UNSW, Sydney, NSW, Australia.
| |
Collapse
|
10
|
Zhou X, Zhang L, Weng Z, Dill DL, Sidow A. Aquila enables reference-assisted diploid personal genome assembly and comprehensive variant detection based on linked reads. Nat Commun 2021; 12:1077. [PMID: 33597536 PMCID: PMC7889865 DOI: 10.1038/s41467-021-21395-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Accepted: 01/20/2021] [Indexed: 01/19/2023] Open
Abstract
We introduce Aquila, a new approach to variant discovery in personal genomes, which is critical for uncovering the genetic contributions to health and disease. Aquila uses a reference sequence and linked-read data to generate a high quality diploid genome assembly, from which it then comprehensively detects and phases personal genetic variation. The contigs of the assemblies from our libraries cover >95% of the human reference genome, with over 98% of that in a diploid state. Thus, the assemblies support detection and accurate genotyping of the most prevalent types of human genetic variation, including single nucleotide polymorphisms (SNPs), small insertions and deletions (small indels), and structural variants (SVs), in all but the most difficult regions. All heterozygous variants are phased in blocks that can approach arm-level length. The final output of Aquila is a diploid and phased personal genome sequence, and a phased Variant Call Format (VCF) file that also contains homozygous and a few unphased heterozygous variants. Aquila represents a cost-effective approach that can be applied to cohorts for variation discovery or association studies, or to single individuals with rare phenotypes that could be caused by SVs or compound heterozygosity.
Collapse
Affiliation(s)
- Xin Zhou
- Department of Computer Science, Stanford University, Stanford, CA, USA.
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN, USA.
| | - Lu Zhang
- Department of Pathology, Stanford University, Stanford, CA, USA
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong
| | - Ziming Weng
- Department of Pathology, Stanford University, Stanford, CA, USA
| | - David L Dill
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Arend Sidow
- Department of Pathology, Stanford University, Stanford, CA, USA.
- Department of Genetics, Stanford University, Stanford, CA, USA.
| |
Collapse
|
11
|
Chu S, Skidmore ZL, Kunisaki J, Walker JR, Griffith M, Griffith OL, Bryan JN. Unraveling the chaotic genomic landscape of primary and metastatic canine appendicular osteosarcoma with current sequencing technologies and bioinformatic approaches. PLoS One 2021; 16:e0246443. [PMID: 33556121 PMCID: PMC7870011 DOI: 10.1371/journal.pone.0246443] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Accepted: 01/19/2021] [Indexed: 12/03/2022] Open
Abstract
Osteosarcoma is a rare disease in children but is one of the most common cancers in adult large breed dogs. The mutational landscape of both the primary and pulmonary metastatic tumor in two dogs with appendicular osteosarcoma (OSA) was comprehensively evaluated using an automated whole genome sequencing, exome and RNA-seq pipeline that was adapted for this study for use in dogs. Chromosomal lesions were the most common type of mutation. The mutational landscape varied substantially between dogs but the lesions within the same patient were similar. Copy number neutral loss of heterozygosity in mutant TP53 was the most significant driver mutation and involved a large region in the middle of chromosome 5. Canine and human OSA is characterized by loss of cell cycle checkpoint integrity and DNA damage response pathways. Mutational profiling of individual patients with canine OSA would be recommended prior to targeted therapy, given the heterogeneity seen in our study and previous studies.
Collapse
Affiliation(s)
- Shirley Chu
- Department of Veterinary Medicine and Surgery, University of Missouri, Columbia, MO, United States of America
- * E-mail:
| | - Zachary L. Skidmore
- McDonnell Genome Institute, Washington University, St. Louis, MO, United States of America
| | - Jason Kunisaki
- McDonnell Genome Institute, Washington University, St. Louis, MO, United States of America
| | - Jason R. Walker
- McDonnell Genome Institute, Washington University, St. Louis, MO, United States of America
| | - Malachi Griffith
- McDonnell Genome Institute, Washington University, St. Louis, MO, United States of America
- Department of Medicine, Washington University, St. Louis, MO, United States of America
| | - Obi L. Griffith
- McDonnell Genome Institute, Washington University, St. Louis, MO, United States of America
- Department of Medicine, Washington University, St. Louis, MO, United States of America
| | - Jeffrey N. Bryan
- Department of Veterinary Medicine and Surgery, University of Missouri, Columbia, MO, United States of America
| |
Collapse
|
12
|
Cyrius: accurate CYP2D6 genotyping using whole-genome sequencing data. THE PHARMACOGENOMICS JOURNAL 2021; 21:251-261. [PMID: 33462347 PMCID: PMC7997805 DOI: 10.1038/s41397-020-00205-5] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Revised: 11/13/2020] [Accepted: 12/04/2020] [Indexed: 12/03/2022]
Abstract
Responsible for the metabolism of ~21% of clinically used drugs, CYP2D6 is a critical component of personalized medicine initiatives. Genotyping CYP2D6 is challenging due to sequence similarity with its pseudogene paralog CYP2D7 and a high number and variety of common structural variants (SVs). Here we describe a novel bioinformatics method, Cyrius, that accurately genotypes CYP2D6 using whole-genome sequencing (WGS) data. We show that Cyrius has superior performance (96.5% concordance with truth genotypes) compared to existing methods (84–86.8%). After implementing the improvements identified from the comparison against the truth data, Cyrius’s accuracy has since been improved to 99.3%. Using Cyrius, we built a haplotype frequency database from 2504 ethnically diverse samples and estimate that SV-containing star alleles are more frequent than previously reported. Cyrius will be an important tool to incorporate pharmacogenomics in WGS-based precision medicine initiatives.
Collapse
|
13
|
Zarate S, Carroll A, Mahmoud M, Krasheninina O, Jun G, Salerno WJ, Schatz MC, Boerwinkle E, Gibbs RA, Sedlazeck FJ. Parliament2: Accurate structural variant calling at scale. Gigascience 2020; 9:giaa145. [PMID: 33347570 PMCID: PMC7751401 DOI: 10.1093/gigascience/giaa145] [Citation(s) in RCA: 53] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Revised: 09/17/2020] [Accepted: 11/18/2020] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Structural variants (SVs) are critical contributors to genetic diversity and genomic disease. To predict the phenotypic impact of SVs, there is a need for better estimates of both the occurrence and frequency of SVs, preferably from large, ethnically diverse cohorts. Thus, the current standard approach requires the use of short paired-end reads, which remain challenging to detect, especially at the scale of hundreds to thousands of samples. FINDINGS We present Parliament2, a consensus SV framework that leverages multiple best-in-class methods to identify high-quality SVs from short-read DNA sequence data at scale. Parliament2 incorporates pre-installed SV callers that are optimized for efficient execution in parallel to reduce the overall runtime and costs. We demonstrate the accuracy of Parliament2 when applied to data from NovaSeq and HiSeq X platforms with the Genome in a Bottle (GIAB) SV call set across all size classes. The reported quality score per SV is calibrated across different SV types and size classes. Parliament2 has the highest F1 score (74.27%) measured across the independent gold standard from GIAB. We illustrate the compute performance by processing all 1000 Genomes samples (2,691 samples) in <1 day on GRCH38. Parliament2 improves the runtime performance of individual methods and is open source (https://github.com/slzarate/parliament2), and a Docker image, as well as a WDL implementation, is available. CONCLUSION Parliament2 provides both a highly accurate single-sample SV call set from short-read DNA sequence data and enables cost-efficient application over cloud or cluster environments, processing thousands of samples.
Collapse
Affiliation(s)
- Samantha Zarate
- DNAnexus, 1975 W El Camino Real #204, Mountain View, CA 94040, USA
- Department of Computer Science, 3400 N. Charles St. Johns Hopkins University, Baltimore, MD 21218, USA
| | - Andrew Carroll
- DNAnexus, 1975 W El Camino Real #204, Mountain View, CA 94040, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center, One Baylor Plaza, Baylor College of Medicine, Houston, TX 77030, USA
| | - Olga Krasheninina
- Human Genome Sequencing Center, One Baylor Plaza, Baylor College of Medicine, Houston, TX 77030, USA
| | - Goo Jun
- Human Genetics Center, 1200 Pressler Street, University of Texas Health Science Center at Houston, Houston, TX 77040, USA
| | - William J Salerno
- Human Genome Sequencing Center, One Baylor Plaza, Baylor College of Medicine, Houston, TX 77030, USA
| | - Michael C Schatz
- Department of Computer Science, 3400 N. Charles St. Johns Hopkins University, Baltimore, MD 21218, USA
| | - Eric Boerwinkle
- Human Genome Sequencing Center, One Baylor Plaza, Baylor College of Medicine, Houston, TX 77030, USA
- Human Genetics Center, 1200 Pressler Street, University of Texas Health Science Center at Houston, Houston, TX 77040, USA
| | - Richard A Gibbs
- Human Genome Sequencing Center, One Baylor Plaza, Baylor College of Medicine, Houston, TX 77030, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, One Baylor Plaza, Baylor College of Medicine, Houston, TX 77030, USA
| |
Collapse
|
14
|
Chapman LM, Spies N, Pai P, Lim CS, Carroll A, Narzisi G, Watson CM, Proukakis C, Clarke WE, Nariai N, Dawson E, Jones G, Blankenberg D, Brueffer C, Xiao C, Kolora SRR, Alexander N, Wolujewicz P, Ahmed AE, Smith G, Shehreen S, Wenger AM, Salit M, Zook JM. A crowdsourced set of curated structural variants for the human genome. PLoS Comput Biol 2020; 16:e1007933. [PMID: 32559231 PMCID: PMC7329145 DOI: 10.1371/journal.pcbi.1007933] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2019] [Revised: 07/01/2020] [Accepted: 05/07/2020] [Indexed: 11/19/2022] Open
Abstract
A high quality benchmark for small variants encompassing 88 to 90% of the reference genome has been developed for seven Genome in a Bottle (GIAB) reference samples. However a reliable benchmark for large indels and structural variants (SVs) is more challenging. In this study, we manually curated 1235 SVs, which can ultimately be used to evaluate SV callers or train machine learning models. We developed a crowdsourcing app-SVCurator-to help GIAB curators manually review large indels and SVs within the human genome, and report their genotype and size accuracy. SVCurator displays images from short, long, and linked read sequencing data from the GIAB Ashkenazi Jewish Trio son [NIST RM 8391/HG002]. We asked curators to assign labels describing SV type (deletion or insertion), size accuracy, and genotype for 1235 putative insertions and deletions sampled from different size bins between 20 and 892,149 bp. 'Expert' curators were 93% concordant with each other, and 37 of the 61 curators had at least 78% concordance with a set of 'expert' curators. The curators were least concordant for complex SVs and SVs that had inaccurate breakpoints or size predictions. After filtering events with low concordance among curators, we produced high confidence labels for 935 events. The SVCurator crowdsourced labels were 94.5% concordant with the heuristic-based draft benchmark SV callset from GIAB. We found that curators can successfully evaluate putative SVs when given evidence from multiple sequencing technologies.
Collapse
Affiliation(s)
- Lesley M. Chapman
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, Maryland, United States of America
| | - Noah Spies
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, Maryland, United States of America
- The Joint Initiative for Metrology in Biology, Stanford University, Stanford, California, United States of America
- Departments of Genetics and Pathology, Stanford University, Stanford, California, United States of America
| | - Patrick Pai
- University of Maryland - College Park, College Park, Maryland, United States of America
| | - Chun Shen Lim
- Department of Biochemistry, School of Biomedical Sciences, University of Otago, Dunedin, New Zealand
| | - Andrew Carroll
- DNAnexus Inc, Mountain View, California, United States of America
| | - Giuseppe Narzisi
- New York Genome Center, New York, New York, United States of America
| | - Christopher M. Watson
- School of Medicine, University of Leeds, Saint James's University Hospital, Leeds, Leeds, United Kingdom
- Yorkshire Regional Genetics Service, The Leeds Teaching Hospitals NHS Trust, Saint James's University Hospital, Leeds, United Kingdom
| | - Christos Proukakis
- University College London, Institute of Neurology, London, United Kingdom
| | - Wayne E. Clarke
- New York Genome Center, New York, New York, United States of America
| | - Naoki Nariai
- Illumina, Inc. San Diego, California, United States of America
| | - Eric Dawson
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, Maryland, United States of America
- Department of Genetics, University of Cambridge, Cambridge, United Kingdom
| | - Garan Jones
- University of Exeter Medical School, Epidemiology and Public Health Group, Barrack Road, Exeter, Devon, United Kingdom
| | - Daniel Blankenberg
- Genomic Medicine Institute Lerner Research Institute Cleveland Clinic, Cleveland, Ohio, United States of America
| | - Christian Brueffer
- Division of Oncology and Pathology, Department of Clinical Sciences Lund, Lund University, Lund, Sweden
| | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Sree Rohit Raj Kolora
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Leipzig, Germany
- Molecular Evolution and Systematics of Animals, Institute of Biology, University of Leipzig, Leipzig, Germany
| | - Noah Alexander
- Molecular Biology Institute, University of California Los Angeles, Los Angeles, California, United States of America
| | - Paul Wolujewicz
- Weill Cornell, Belfer Research Building, New York, New York, United States of America
| | - Azza E. Ahmed
- Center for Bioinformatics and Systems Biology, Faculty of Science, University of Khartoum and Department of Electrical and Electronic Engineering, Faculty of Engineering, University of Khartoum, Khartoum, Sudan
| | - Graeme Smith
- Guy's Hospital and St Thomas's NHS Foundation Trust Great Maze Pond, London, United Kingdom
| | - Saadlee Shehreen
- Department of Genetic Engineering & Biotechnology, University of Dhaka, Bangladesh
| | - Aaron M. Wenger
- Pacific Biosciences, Menlo Park, California, United States of America
| | - Marc Salit
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, Maryland, United States of America
- The Joint Initiative for Metrology in Biology, Stanford University, Stanford, California, United States of America
| | - Justin M. Zook
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, Maryland, United States of America
| |
Collapse
|
15
|
Uguen K, Jubin C, Duffourd Y, Bardel C, Malan V, Dupont JM, El Khattabi L, Chatron N, Vitobello A, Rollat-Farnier PA, Baulard C, Lelorch M, Leduc A, Tisserant E, Tran Mau-Them F, Danjean V, Delepine M, Till M, Meyer V, Lyonnet S, Mosca-Boidron AL, Thevenon J, Faivre L, Thauvin-Robinet C, Schluth-Bolard C, Boland A, Olaso R, Callier P, Romana S, Deleuze JF, Sanlaville D. Genome sequencing in cytogenetics: Comparison of short-read and linked-read approaches for germline structural variant detection and characterization. Mol Genet Genomic Med 2020; 8:e1114. [PMID: 31985172 PMCID: PMC7057128 DOI: 10.1002/mgg3.1114] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Accepted: 12/20/2019] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Structural variants (SVs) include copy number variants (CNVs) and apparently balanced chromosomal rearrangements (ABCRs). Genome sequencing (GS) enables SV detection at base-pair resolution, but the use of short-read sequencing is limited by repetitive sequences, and long-read approaches are not yet validated for diagnosis. Recently, 10X Genomics proposed Chromium, a technology providing linked-reads to reconstruct long DNA fragments and which could represent a good alternative. No study has compared short-read to linked-read technologies to detect SVs in a constitutional diagnostic setting yet. The aim of this work was to determine whether the 10X Genomics technology enables better detection and comprehension of SVs than short-read WGS. METHODS We included 13 patients carrying various SVs. Whole genome analyses were performed using paired-end HiSeq X sequencing with (linked-read strategy) or without (short-read strategy) Chromium library preparation. Two different bioinformatic pipelines were used: Variants are called using BreakDancer for short-read strategy and LongRanger for long-read strategy. Variant interpretations were first blinded. RESULTS The short-read strategy allowed diagnosis of known SV in 10/13 patients. After unblinding, the linked-read strategy identified 10/13 SVs, including one (patient 7) missed by the short-read strategy. CONCLUSION In conclusion, regarding the results of this study, 10X Genomics solution did not improve the detection and characterization of SV.
Collapse
Affiliation(s)
- Kévin Uguen
- Service de Génétique Médicale, CHRU de Brest, Brest, France.,HCL, Service de Génétique, BRON Cedex, France
| | - Claire Jubin
- Centre National de Recherche en Génomique Humaine (CNRGH), CEA, Evry, France.,Labex GenMed, Evry, France
| | - Yannis Duffourd
- UMR1231 GAD, Inserm - Université Bourgogne-Franche Comté, Dijon, France
| | - Claire Bardel
- HCL, Cellule bioinformatique de la plateforme NGS du CHU Lyon, BRON Cedex, France.,Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive, UMR5558, Villeurbanne, France
| | - Valérie Malan
- Service de Cytogénétique, Hôpital Necker-Enfants Malades, APHP, Paris, France
| | - Jean-Michel Dupont
- Institut Cochin, INSERM U1016, Université Paris Descartes, Faculté de Médecine, APHP, HUPC, site Cochin, Laboratoire de Cytogénétique, Paris, France
| | - Laila El Khattabi
- Institut Cochin, INSERM U1016, Université Paris Descartes, Faculté de Médecine, APHP, HUPC, site Cochin, Laboratoire de Cytogénétique, Paris, France
| | | | - Antonio Vitobello
- UMR1231 GAD, Inserm - Université Bourgogne-Franche Comté, Dijon, France.,Unité Fonctionnelle d'Innovation en Diagnostic Génomique des Maladies Rares, FHU-TRANSLAD, CHU Dijon Bourgogne, Dijon, France
| | | | - Céline Baulard
- Centre National de Recherche en Génomique Humaine (CNRGH), CEA, Evry, France.,Labex GenMed, Evry, France
| | - Marc Lelorch
- Service de Cytogénétique, Hôpital Necker-Enfants Malades, APHP, Paris, France
| | - Aurélie Leduc
- Centre National de Recherche en Génomique Humaine (CNRGH), CEA, Evry, France.,Labex GenMed, Evry, France
| | - Emilie Tisserant
- UMR1231 GAD, Inserm - Université Bourgogne-Franche Comté, Dijon, France
| | - Frédéric Tran Mau-Them
- UMR1231 GAD, Inserm - Université Bourgogne-Franche Comté, Dijon, France.,Unité Fonctionnelle d'Innovation en Diagnostic Génomique des Maladies Rares, FHU-TRANSLAD, CHU Dijon Bourgogne, Dijon, France
| | - Vincent Danjean
- Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG, Grenoble, France
| | - Marc Delepine
- Centre National de Recherche en Génomique Humaine (CNRGH), CEA, Evry, France.,Labex GenMed, Evry, France
| | | | - Vincent Meyer
- Centre National de Recherche en Génomique Humaine (CNRGH), CEA, Evry, France.,Labex GenMed, Evry, France
| | - Stanislas Lyonnet
- Fédération de Génétique et Institut Imagine, UMR-1163, Université de Paris, Hôpital Necker-Enfants Malades, APHP Paris, France
| | - Anne-Laure Mosca-Boidron
- UMR1231 GAD, Inserm - Université Bourgogne-Franche Comté, Dijon, France.,Laboratoire de génétique chromosomique et moléculaire, FHU-TRANSLAD, CHU Dijon Bourgogne, Dijon, France
| | - Julien Thevenon
- Centre de génétique, Hôpital Couple-Enfant, CHU Grenoble Alpes, La Tronche, Grenoble, France
| | - Laurence Faivre
- UMR1231 GAD, Inserm - Université Bourgogne-Franche Comté, Dijon, France.,Centre de génétique, FHU-TRANSLAD, CHU Dijon Bourgogne, Dijon, France
| | - Christel Thauvin-Robinet
- UMR1231 GAD, Inserm - Université Bourgogne-Franche Comté, Dijon, France.,Centre de génétique, FHU-TRANSLAD, CHU Dijon Bourgogne, Dijon, France
| | | | - Anne Boland
- Centre National de Recherche en Génomique Humaine (CNRGH), CEA, Evry, France.,Labex GenMed, Evry, France
| | - Robert Olaso
- Centre National de Recherche en Génomique Humaine (CNRGH), CEA, Evry, France.,Labex GenMed, Evry, France
| | - Patrick Callier
- UMR1231 GAD, Inserm - Université Bourgogne-Franche Comté, Dijon, France.,Laboratoire de génétique chromosomique et moléculaire, FHU-TRANSLAD, CHU Dijon Bourgogne, Dijon, France
| | - Serge Romana
- Service de Cytogénétique, Hôpital Necker-Enfants Malades, APHP, Paris, France
| | - Jean-François Deleuze
- Centre National de Recherche en Génomique Humaine (CNRGH), CEA, Evry, France.,Labex GenMed, Evry, France
| | | |
Collapse
|
16
|
Yokoyama TT, Kasahara M. Visualization tools for human structural variations identified by whole-genome sequencing. J Hum Genet 2020; 65:49-60. [PMID: 31666648 PMCID: PMC8075883 DOI: 10.1038/s10038-019-0687-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Revised: 09/27/2019] [Accepted: 10/02/2019] [Indexed: 01/02/2023]
Abstract
Visualizing structural variations (SVs) is a critical step for finding associations between SVs and human traits or diseases. Given that there are many sequencing platforms used for SV identification and given that how best to visualize SVs together with other data, such as read alignments and annotations, depends on research goals, there are dozens of SV visualization tools designed for different research goals and sequencing platforms. Here, we provide a comprehensive survey of over 30 SV visualization tools to help users choose which tools to use. This review targets users who wish to visualize a set of SVs identified from the massively parallel sequencing reads of an individual human genome. We first categorize the ways in which SV visualization tools display SVs into ten major categories, which we denote as view modules. View modules allow readers to understand the features of each SV visualization tool quickly. Next, we introduce the features of individual SV visualization tools from several aspects, including whether SV views are integrated with annotations, whether long-read alignment is displayed, whether underlying data structures are graph-based, the type of SVs shown, whether auditing is possible, whether bird's eye view is available, sequencing platforms, and the number of samples. We hope that this review will serve as a guide for readers on the currently available SV visualization tools and lead to the development of new SV visualization tools in the near future.
Collapse
Affiliation(s)
- Toshiyuki T Yokoyama
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Masahiro Kasahara
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan.
| |
Collapse
|
17
|
Zhang L, Zhou X, Weng Z, Sidow A. De novo diploid genome assembly for genome-wide structural variant detection. NAR Genom Bioinform 2019; 2:lqz018. [PMID: 33575568 PMCID: PMC7671403 DOI: 10.1093/nargab/lqz018] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Revised: 10/09/2019] [Accepted: 12/02/2019] [Indexed: 12/30/2022] Open
Abstract
Detection of structural variants (SVs) on the basis of read alignment to a reference genome remains a difficult problem. De novo assembly, traditionally used to generate reference genomes, offers an alternative for SV detection. However, it has not been applied broadly to human genomes because of fundamental limitations of short-fragment approaches and high cost of long-read technologies. We here show that 10× linked-read sequencing supports accurate SV detection. We examined variants in six de novo 10× assemblies with diverse experimental parameters from two commonly used human cell lines: NA12878 and NA24385. The assemblies are effective for detecting mid-size SVs, which were discovered by simple pairwise alignment of the assemblies’ contigs to the reference (hg38). Our study also shows that the base-pair level SV breakpoint accuracy is high, with a majority of SVs having precisely correct sizes and breakpoints. Setting the ancestral state of SV loci by comparing to ape orthologs allows inference of the actual molecular mechanism (insertion or deletion) causing the mutation. In about half of cases, the mechanism is the opposite of the reference-based call. We uncover 214 SVs that may have been maintained as polymorphisms in the human lineage since before our divergence from chimp. Overall, we show that de novo assembly of 10× linked-read data can achieve cost-effective SV detection for personal genomes.
Collapse
Affiliation(s)
- Lu Zhang
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong.,Department of Pathology, 300 Pasteur Dr, Stanford University, Stanford, CA 94305, USA.,Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Xin Zhou
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Ziming Weng
- Department of Pathology, 300 Pasteur Dr, Stanford University, Stanford, CA 94305, USA
| | - Arend Sidow
- Department of Pathology, 300 Pasteur Dr, Stanford University, Stanford, CA 94305, USA.,Department of Genetics, 300 Pasteur Dr, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
18
|
Yokoyama TT, Sakamoto Y, Seki M, Suzuki Y, Kasahara M. MoMI-G: modular multi-scale integrated genome graph browser. BMC Bioinformatics 2019; 20:548. [PMID: 31690272 PMCID: PMC6833150 DOI: 10.1186/s12859-019-3145-2] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2019] [Accepted: 10/09/2019] [Indexed: 01/30/2023] Open
Abstract
Background Genome graph is an emerging approach for representing structural variants on genomes with branches. For example, representing structural variants of cancer genomes as a genome graph is more natural than representing such genomes as differences from the linear reference genome. While more and more structural variants are being identified by long-read sequencing, many of them are difficult to visualize using existing structural variants visualization tools. To this end, visualization method for large genome graphs such as human cancer genome graphs is demanded. Results We developed MOdular Multi-scale Integrated Genome graph browser, MoMI-G, a web-based genome graph browser that can visualize genome graphs with structural variants and supporting evidences such as read alignments, read depth, and annotations. This browser allows more intuitive recognition of large, nested, and potentially more complex structural variations. MoMI-G has view modules for different scales, which allow users to view the whole genome down to nucleotide-level alignments of long reads. Alignments spanning reference alleles and those spanning alternative alleles are shown in the same view. Users can customize the view, if they are not satisfied with the preset views. In addition, MoMI-G has Interval Card Deck, a feature for rapid manual inspection of hundreds of structural variants. Herein, we describe the utility of MoMI-G by using representative examples of large and nested structural variations found in two cell lines, LC-2/ad and CHM1. Conclusions Users can inspect complex and large structural variations found by long-read analysis in large genomes such as human genomes more smoothly and more intuitively. In addition, users can easily filter out false positives by manually inspecting hundreds of identified structural variants with supporting long-read alignments and annotations in a short time. Software availability MoMI-G is freely available at https://github.com/MoMI-G/MoMI-G under the MIT license.
Collapse
Affiliation(s)
- Toshiyuki T Yokoyama
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Yoshitaka Sakamoto
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Masahide Seki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Yutaka Suzuki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Masahiro Kasahara
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan.
| |
Collapse
|
19
|
Li D, Kim W, Wang L, Yoon KA, Park B, Park C, Kong SY, Hwang Y, Baek D, Lee ES, Won S. Comparison of INDEL Calling Tools with Simulation Data and Real Short-Read Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1635-1644. [PMID: 30004886 DOI: 10.1109/tcbb.2018.2854793] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Insertions and deletions (INDELs) comprise a significant proportion of human genetic variation, and recent papers have revealed that many human diseases may be attributable to INDELs. With the development of next-generation sequencing (NGS) technology, many statistical/computational tools have been developed for calling INDELs. However, there are differences among those tools, and comparisons among them have been limited. In order to better understand these inter-tool differences, five popular and publicly available INDEL calling tools-GATK HaplotypeCaller, Platypus, VarScan2, Scalpel, and GotCloud-were evaluated using simulation data, 1000 Genomes Project data, and family-based sequencing data. The accuracy of INDEL calling by each tool was mainly evaluated by concordance rates. Family-based sequencing data, which consisted of 49 individuals from eight Korean families, were used to calculate Mendelian error rates. Our comparison results show that GATK HaplotypeCaller usually performs the best and that joint calling with Platypus can lead to additional improvements in accuracy. The result of this study provides important information regarding future directions for the variant detection and the algorithms development.
Collapse
|
20
|
Hanlon K, Thompson A, Pantano L, Hutchinson JN, Al-Obeidi A, Wang S, Bliss-Moreau M, Helble J, Alexe G, Stegmaier K, Bauer DE, Croker BA. Single-cell cloning of human T-cell lines reveals clonal variation in cell death responses to chemotherapeutics. Cancer Genet 2019; 237:69-77. [PMID: 31447068 DOI: 10.1016/j.cancergen.2019.06.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2019] [Revised: 04/18/2019] [Accepted: 06/09/2019] [Indexed: 12/12/2022]
Abstract
Genetic modification of human leukemic cell lines using CRISPR-Cas9 has become a staple of gene-function studies. Single-cell cloning of modified cells is frequently used to facilitate studies of gene function. Inherent in this approach is an assumption that the genetic drift, amplified in some cell lines by mutations in DNA replication and repair machinery, as well as non-genetic factors will not introduce significant levels of experimental cellular heterogeneity in clones derived from parental populations. In this study, we characterize the variation in cell death of fifty clonal cell lines generated from human Jurkat and MOLT-4 T-cells edited by CRISPR-Cas9. We demonstrate a wide distribution of sensitivity to chemotherapeutics between non-edited clonal human leukemia T-cell lines, and also following CRISPR-Cas9 editing at the NLRP1 locus, or following transfection with non-targeting sgRNA controls. The cell death sensitivity profile of clonal cell lines was consistent across experiments and failed to revert to the non-clonal parental phenotype. Whole genome sequencing of two clonal cell lines edited by CRISPR-Cas9 revealed unique and shared genetic variants, which had minimal read support in the non-clonal parental population and were not suspected CRISPR-Cas9 off-target effects. These variants included genes related to cell death and drug metabolism. The variation in cell death phenotype of clonal populations of human T-cell lines may be a consequence of T-cell line genetic instability, and to a lesser extent clonal heterogeneity in the parental population or CRISPR-Cas9 off-target effects not predicted by current models. This work highlights the importance of genetic variation between clonal T-cell lines in the design, conduct, and analysis of experiments to investigate gene function after single-cell cloning.
Collapse
Affiliation(s)
- Kathleen Hanlon
- Division of Hematology/Oncology, Boston Children's Hospital, Harvard Medical School, 300 Longwood Avenue, Boston, MA 02115, United States
| | - Alex Thompson
- Division of Hematology/Oncology, Boston Children's Hospital, Harvard Medical School, 300 Longwood Avenue, Boston, MA 02115, United States
| | - Lorena Pantano
- Department of Biostatistics, Harvard Chan School of Public Health, Boston, MA, United States
| | - John N Hutchinson
- Department of Biostatistics, Harvard Chan School of Public Health, Boston, MA, United States
| | - Arshed Al-Obeidi
- Division of Hematology/Oncology, Boston Children's Hospital, Harvard Medical School, 300 Longwood Avenue, Boston, MA 02115, United States
| | - Shu Wang
- Division of Hematology/Oncology, Boston Children's Hospital, Harvard Medical School, 300 Longwood Avenue, Boston, MA 02115, United States
| | - Meghan Bliss-Moreau
- Division of Hematology/Oncology, Boston Children's Hospital, Harvard Medical School, 300 Longwood Avenue, Boston, MA 02115, United States
| | - Jennifer Helble
- Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA, United States
| | - Gabriela Alexe
- Department of Pediatric Oncology, Dana-Farber Cancer Institute and Boston Children's Hospital, Boston, MA, United States
| | - Kimberly Stegmaier
- Department of Pediatric Oncology, Dana-Farber Cancer Institute and Boston Children's Hospital, Boston, MA, United States
| | - Daniel E Bauer
- Division of Hematology/Oncology, Boston Children's Hospital, Harvard Medical School, 300 Longwood Avenue, Boston, MA 02115, United States; Department of Pediatrics, Harvard Medical School, Boston, MA, United States
| | - Ben A Croker
- Division of Hematology/Oncology, Boston Children's Hospital, Harvard Medical School, 300 Longwood Avenue, Boston, MA 02115, United States; Department of Pediatrics, Harvard Medical School, Boston, MA, United States.
| |
Collapse
|
21
|
Belyeu JR, Nicholas TJ, Pedersen BS, Sasani TA, Havrilla JM, Kravitz SN, Conway ME, Lohman BK, Quinlan AR, Layer RM. SV-plaudit: A cloud-based framework for manually curating thousands of structural variants. Gigascience 2018; 7:5026174. [PMID: 29860504 PMCID: PMC6030999 DOI: 10.1093/gigascience/giy064] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2018] [Accepted: 05/25/2018] [Indexed: 01/21/2023] Open
Abstract
SV-plaudit is a framework for rapidly curating structural variant (SV) predictions. For each SV, we generate an image that visualizes the coverage and alignment signals from a set of samples. Images are uploaded to our cloud framework where users assess the quality of each image using a client-side web application. Reports can then be generated as a tab-delimited file or annotated Variant Call Format (VCF) file. As a proof of principle, nine researchers collaborated for 1 hour to evaluate 1,350 SVs each. We anticipate that SV-plaudit will become a standard step in variant calling pipelines and the crowd-sourced curation of other biological results.Code available at https://github.com/jbelyeu/SV-plauditDemonstration video available at https://www.youtube.com/watch?v=ono8kHMKxDs.
Collapse
Affiliation(s)
- Jonathan R Belyeu
- Department of Human Genetics, University of Utah, 15 S 2030 E, Salt Lake City, UT, USA.,USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Thomas J Nicholas
- Department of Human Genetics, University of Utah, 15 S 2030 E, Salt Lake City, UT, USA.,USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Brent S Pedersen
- Department of Human Genetics, University of Utah, 15 S 2030 E, Salt Lake City, UT, USA.,USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Thomas A Sasani
- Department of Human Genetics, University of Utah, 15 S 2030 E, Salt Lake City, UT, USA.,USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - James M Havrilla
- Department of Human Genetics, University of Utah, 15 S 2030 E, Salt Lake City, UT, USA.,USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Stephanie N Kravitz
- Department of Human Genetics, University of Utah, 15 S 2030 E, Salt Lake City, UT, USA.,USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Megan E Conway
- Department of Human Genetics, University of Utah, 15 S 2030 E, Salt Lake City, UT, USA
| | - Brian K Lohman
- Department of Human Genetics, University of Utah, 15 S 2030 E, Salt Lake City, UT, USA.,USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Aaron R Quinlan
- Department of Human Genetics, University of Utah, 15 S 2030 E, Salt Lake City, UT, USA.,USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA.,Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | - Ryan M Layer
- Department of Human Genetics, University of Utah, 15 S 2030 E, Salt Lake City, UT, USA.,USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| |
Collapse
|
22
|
Gross AM, Ajay SS, Rajan V, Brown C, Bluske K, Burns NJ, Chawla A, Coffey AJ, Malhotra A, Scocchia A, Thorpe E, Dzidic N, Hovanes K, Sahoo T, Dolzhenko E, Lajoie B, Khouzam A, Chowdhury S, Belmont J, Roller E, Ivakhno S, Tanner S, McEachern J, Hambuch T, Eberle M, Hagelstrom RT, Bentley DR, Perry DL, Taft RJ. Copy-number variants in clinical genome sequencing: deployment and interpretation for rare and undiagnosed disease. Genet Med 2018; 21:1121-1130. [PMID: 30293986 PMCID: PMC6752263 DOI: 10.1038/s41436-018-0295-y] [Citation(s) in RCA: 84] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2018] [Accepted: 08/28/2018] [Indexed: 11/17/2022] Open
Abstract
Purpose Current diagnostic testing for genetic disorders involves serial use of specialized assays spanning multiple technologies. In principle, genome sequencing (GS) can detect all genomic pathogenic variant types on a single platform. Here we evaluate copy-number variant (CNV) calling as part of a clinically accredited GS test. Methods We performed analytical validation of CNV calling on 17 reference samples, compared the sensitivity of GS-based variants with those from a clinical microarray, and set a bound on precision using orthogonal technologies. We developed a protocol for family-based analysis of GS-based CNV calls, and deployed this across a clinical cohort of 79 rare and undiagnosed cases. Results We found that CNV calls from GS are at least as sensitive as those from microarrays, while only creating a modest increase in the number of variants interpreted (~10 CNVs per case). We identified clinically significant CNVs in 15% of the first 79 cases analyzed, all of which were confirmed by an orthogonal approach. The pipeline also enabled discovery of a uniparental disomy (UPD) and a 50% mosaic trisomy 14. Directed analysis of select CNVs enabled breakpoint level resolution of genomic rearrangements and phasing of de novo CNVs. Conclusion Robust identification of CNVs by GS is possible within a clinical testing environment.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | - Natasa Dzidic
- CombiMatrix Diagnostics (currently Invitae), Irvine, CA, USA
| | - Karine Hovanes
- CombiMatrix Diagnostics (currently Invitae), Irvine, CA, USA
| | - Trilochan Sahoo
- CombiMatrix Diagnostics (currently Invitae), Irvine, CA, USA
| | | | | | | | - Shimul Chowdhury
- Rady Children's Institute for Genomic Medicine and Rady Children's Hospital, Encinitas, CA, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
23
|
Ahdesmäki MJ, Chapman BA, Cingolani P, Hofmann O, Sidoruk A, Lai Z, Zakharov G, Rodichenko M, Alperovich M, Jenkins D, Carr TH, Stetson D, Dougherty B, Barrett JC, Johnson JH. Prioritisation of structural variant calls in cancer genomes. PeerJ 2017; 5:e3166. [PMID: 28392986 PMCID: PMC5382922 DOI: 10.7717/peerj.3166] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2016] [Accepted: 03/09/2017] [Indexed: 12/24/2022] Open
Abstract
Sensitivity of short read DNA-sequencing for gene fusion detection is improving, but is hampered by the significant amount of noise composed of uninteresting or false positive hits in the data. In this paper we describe a tiered prioritisation approach to extract high impact gene fusion events from existing structural variant calls. Using cell line and patient DNA sequence data we improve the annotation and interpretation of structural variant calls to best highlight likely cancer driving fusions. We also considerably improve on the automated visualisation of the high impact structural variants to highlight the effects of the variants on the resulting transcripts. The resulting framework greatly improves on readily detecting clinically actionable structural variants.
Collapse
Affiliation(s)
- Miika J Ahdesmäki
- Innovative Medicines and Early Development, Oncology, AstraZeneca , Cambridge , United Kingdom
| | - Brad A Chapman
- Harvard T.H. Chan School of Public Health, Harvard University , Boston , MA , United States
| | | | - Oliver Hofmann
- Centre for Cancer Research, University of Melbourne , Melbourne , Australia
| | - Aleksandr Sidoruk
- EPAM Systems Inc., Newtown, PA, United States; Department of software engineering, St. Petersburg State University, St. Petersburg, Russia
| | - Zhongwu Lai
- Innovative Medicines and Early Development, Oncology, AstraZeneca , Waltham , MA , United States
| | - Gennadii Zakharov
- EPAM Systems Inc., Newtown, PA, United States; Pavlov Institute of Physiology, Russian Academy of Sciences, St. Petersburg, Russia
| | | | | | | | - T Hedley Carr
- Innovative Medicines and Early Development, Oncology, AstraZeneca , Cambridge , United Kingdom
| | - Daniel Stetson
- Innovative Medicines and Early Development, Oncology, AstraZeneca , Waltham , MA , United States
| | - Brian Dougherty
- Innovative Medicines and Early Development, Oncology, AstraZeneca , Waltham , MA , United States
| | - J Carl Barrett
- Innovative Medicines and Early Development, Oncology, AstraZeneca , Waltham , MA , United States
| | - Justin H Johnson
- Innovative Medicines and Early Development, Oncology, AstraZeneca , Waltham , MA , United States
| |
Collapse
|
24
|
Noll AC, Miller NA, Smith LD, Yoo B, Fiedler S, Cooley LD, Willig LK, Petrikin JE, Cakici J, Lesko J, Newton A, Detherage K, Thiffault I, Saunders CJ, Farrow EG, Kingsmore SF. Clinical detection of deletion structural variants in whole-genome sequences. NPJ Genom Med 2016; 1:16026. [PMID: 29263817 PMCID: PMC5685307 DOI: 10.1038/npjgenmed.2016.26] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2016] [Revised: 06/22/2016] [Accepted: 06/22/2016] [Indexed: 12/13/2022] Open
Abstract
Optimal management of acutely ill infants with monogenetic diseases requires rapid identification of causative haplotypes. Whole-genome sequencing (WGS) has been shown to identify pathogenic nucleotide variants in such infants. Deletion structural variants (DSVs, >50 nt) are implicated in many genetic diseases, and tools have been designed to identify DSVs using short-read WGS. Optimisation and integration of these tools into a WGS pipeline could improve diagnostic sensitivity and specificity of WGS. In addition, it may improve turnaround time when compared with current CNV assays, enhancing utility in acute settings. Here we describe DSV detection methods for use in WGS for rapid diagnosis in acutely ill infants: SKALD (Screening Konsensus and Annotation of Large Deletions) combines calls from two tools (Breakdancer and GenomeStrip) with calibrated filters and clinical interpretation rules. In four WGS runs, the average analytic precision (positive predictive value) of SKALD was 78%, and recall (sensitivity) was 27%, when compared with validated reference DSV calls. When retrospectively applied to a cohort of 36 families with acutely ill infants SKALD identified causative DSVs in two. The first was heterozygous deletion of exons 1–3 of MMP21 in trans with a heterozygous frame-shift deletion in two siblings with transposition of the great arteries and heterotaxy. In a newborn female with dysmorphic features, ventricular septal defect and persistent pulmonary hypertension, SKALD identified the breakpoints of a heterozygous, de novo 1p36.32p36.13 deletion. In summary, consensus DSV calling, implemented in an 8-h computational pipeline with parameterised filtering, has the potential to increase the diagnostic yield of WGS in acutely ill neonates and discover novel disease genes.
Collapse
Affiliation(s)
- Aaron C Noll
- Center for Pediatric Genomic Medicine, Children's Mercy Kansas City, Kansas City, MO, USA.,Heartland Institute for Clinical and Translational Research, University of Kansas Medical Center, Kansas City, KS, USA.,Department of Pediatrics, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Neil A Miller
- Center for Pediatric Genomic Medicine, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Laurie D Smith
- Center for Pediatric Genomic Medicine, Children's Mercy Kansas City, Kansas City, MO, USA.,Heartland Institute for Clinical and Translational Research, University of Kansas Medical Center, Kansas City, KS, USA.,Department of Pediatrics, University of Missouri-Kansas City, Kansas City, MO, USA
| | - Byunggil Yoo
- Center for Pediatric Genomic Medicine, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Stephanie Fiedler
- Department of Pathology, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Linda D Cooley
- Department of Pediatrics, University of Missouri-Kansas City, Kansas City, MO, USA.,Department of Pathology, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Laurel K Willig
- Center for Pediatric Genomic Medicine, Children's Mercy Kansas City, Kansas City, MO, USA.,Department of Pediatrics, Children's Mercy Kansas City, Kansas City, MO, USA.,Department of Pediatrics, University of Missouri-Kansas City, Kansas City, MO, USA
| | - Josh E Petrikin
- Center for Pediatric Genomic Medicine, Children's Mercy Kansas City, Kansas City, MO, USA.,Department of Pediatrics, Children's Mercy Kansas City, Kansas City, MO, USA.,Department of Pediatrics, University of Missouri-Kansas City, Kansas City, MO, USA
| | - Julie Cakici
- Rady Children's Institute for Genomic Medicine, San Diego, CA, USA
| | - John Lesko
- Center for Pediatric Genomic Medicine, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Angela Newton
- Center for Pediatric Genomic Medicine, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Kali Detherage
- Center for Pediatric Genomic Medicine, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Isabelle Thiffault
- Center for Pediatric Genomic Medicine, Children's Mercy Kansas City, Kansas City, MO, USA.,Department of Pediatrics, University of Missouri-Kansas City, Kansas City, MO, USA.,Department of Pathology, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Carol J Saunders
- Center for Pediatric Genomic Medicine, Children's Mercy Kansas City, Kansas City, MO, USA.,Department of Pediatrics, University of Missouri-Kansas City, Kansas City, MO, USA.,Department of Pathology, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Emily G Farrow
- Center for Pediatric Genomic Medicine, Children's Mercy Kansas City, Kansas City, MO, USA.,Department of Pediatrics, Children's Mercy Kansas City, Kansas City, MO, USA.,Department of Pediatrics, University of Missouri-Kansas City, Kansas City, MO, USA
| | - Stephen F Kingsmore
- Heartland Institute for Clinical and Translational Research, University of Kansas Medical Center, Kansas City, KS, USA.,Rady Children's Institute for Genomic Medicine, San Diego, CA, USA
| |
Collapse
|
25
|
Parikh H, Mohiyuddin M, Lam HYK, Iyer H, Chen D, Pratt M, Bartha G, Spies N, Losert W, Zook JM, Salit M. svclassify: a method to establish benchmark structural variant calls. BMC Genomics 2016; 17:64. [PMID: 26772178 PMCID: PMC4715349 DOI: 10.1186/s12864-016-2366-2] [Citation(s) in RCA: 67] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2015] [Accepted: 01/05/2016] [Indexed: 01/24/2023] Open
Abstract
Background The human genome contains variants ranging in size from small single nucleotide polymorphisms (SNPs) to large structural variants (SVs). High-quality benchmark small variant calls for the pilot National Institute of Standards and Technology (NIST) Reference Material (NA12878) have been developed by the Genome in a Bottle Consortium, but no similar high-quality benchmark SV calls exist for this genome. Since SV callers output highly discordant results, we developed methods to combine multiple forms of evidence from multiple sequencing technologies to classify candidate SVs into likely true or false positives. Our method (svclassify) calculates annotations from one or more aligned bam files from many high-throughput sequencing technologies, and then builds a one-class model using these annotations to classify candidate SVs as likely true or false positives. Results We first used pedigree analysis to develop a set of high-confidence breakpoint-resolved large deletions. We then used svclassify to cluster and classify these deletions as well as a set of high-confidence deletions from the 1000 Genomes Project and a set of breakpoint-resolved complex insertions from Spiral Genetics. We find that likely SVs cluster separately from likely non-SVs based on our annotations, and that the SVs cluster into different types of deletions. We then developed a supervised one-class classification method that uses a training set of random non-SV regions to determine whether candidate SVs have abnormal annotations different from most of the genome. To test this classification method, we use our pedigree-based breakpoint-resolved SVs, SVs validated by the 1000 Genomes Project, and assembly-based breakpoint-resolved insertions, along with semi-automated visualization using svviz. Conclusions We find that candidate SVs with high scores from multiple technologies have high concordance with PCR validation and an orthogonal consensus method MetaSV (99.7 % concordant), and candidate SVs with low scores are questionable. We distribute a set of 2676 high-confidence deletions and 68 high-confidence insertions with high svclassify scores from these call sets for benchmarking SV callers. We expect these methods to be particularly useful for establishing high-confidence SV calls for benchmark samples that have been characterized by multiple technologies. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2366-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Hemang Parikh
- Genome-Scale Measurements Group, Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8313, Gaithersburg, MD, 20899, USA. .,Dakota Consulting Inc., 1110 Bonifant Street, Suite 310, Silver Spring, MD, 20910, USA.
| | | | - Hugo Y K Lam
- Bina Technologies, Roche Sequencing, Redwood City, CA, 94065, USA.
| | - Hariharan Iyer
- Statistical Engineering Division, National Institute of Standards and Technology, Gaithersburg, MD, 20899, USA.
| | - Desu Chen
- Institute for Research in Electronics and Applied Physics, University of Maryland, College Park, MD, 20742, USA.
| | - Mark Pratt
- Personalis Inc., 1350 Willow Road, Suite 202, Menlo Park, CA, 94025, USA.
| | - Gabor Bartha
- Personalis Inc., 1350 Willow Road, Suite 202, Menlo Park, CA, 94025, USA.
| | - Noah Spies
- Genome-Scale Measurements Group, Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8313, Gaithersburg, MD, 20899, USA. .,Department of Pathology, Stanford University, Stanford, CA, USA.
| | - Wolfgang Losert
- Institute for Research in Electronics and Applied Physics, University of Maryland, College Park, MD, 20742, USA.
| | - Justin M Zook
- Genome-Scale Measurements Group, Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8313, Gaithersburg, MD, 20899, USA.
| | - Marc Salit
- Genome-Scale Measurements Group, Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8313, Gaithersburg, MD, 20899, USA. .,Bioengineering Department, Stanford University, Stanford, CA, USA.
| |
Collapse
|