1
|
Del Gobbo GF, Boycott KM. The additional diagnostic yield of long-read sequencing in undiagnosed rare diseases. Genome Res 2025; 35:559-571. [PMID: 39900460 PMCID: PMC12047273 DOI: 10.1101/gr.279970.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2025]
Abstract
Long-read sequencing (LRS) is a promising technology positioned to study the significant proportion of rare diseases (RDs) that remain undiagnosed as it addresses many of the limitations of short-read sequencing, detecting and clarifying additional disease-associated variants that may be missed by the current standard diagnostic workflow for RDs. Some key areas where additional diagnostic yields may be realized include: (1) detection and resolution of structural variants (SVs); (2) detection and characterization of tandem repeat expansions; (3) coverage of regions of high sequence similarity; (4) variant phasing; (5) the use of de novo genome assemblies for reference-based or graph genome variant detection; and (6) epigenetic and transcriptomic evaluations. Examples from over 50 studies support that the main areas of added diagnostic yield currently lie in SV detection and characterization, repeat expansion assessment, and phasing (with or without DNA methylation information). Several emerging studies applying LRS in cohorts of undiagnosed RDs also demonstrate that LRS can boost diagnostic yields following negative standard-of-care clinical testing and provide an added yield of 7%-17% following negative short-read genome sequencing. With this evidence of improved diagnostic yield, we discuss the incorporation of LRS into the diagnostic care pathway for undiagnosed RDs, including current challenges and considerations, with the ultimate goal of ending the diagnostic odyssey for countless individuals with RDs.
Collapse
Affiliation(s)
- Giulia F Del Gobbo
- Children's Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, Ontario, Canada K1H 5B2
| | - Kym M Boycott
- Children's Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, Ontario, Canada K1H 5B2;
- Department of Genetics, Children's Hospital of Eastern Ontario, Ottawa, Ontario, Canada K1H 8L1
| |
Collapse
|
2
|
Luo C, Peters BA, Zhou XM. Large indel detection in region-based phased diploid assemblies from linked-reads. BMC Genomics 2025; 26:263. [PMID: 40102722 PMCID: PMC11916464 DOI: 10.1186/s12864-025-11398-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 02/21/2025] [Indexed: 03/20/2025] Open
Abstract
BACKGROUND Linked-reads improve de novo assembly, haplotype phasing, structural variant (SV) detection, and other applications through highly-multiplexed genome partitioning and barcoding. Whole genome assembly and assembly-based variant detection based on linked-reads often require intensive computation costs and are not suitable for large population studies. Here we propose an efficient pipeline, RegionIndel, a region-based diploid assembly approach to characterize large indel SVs. This pipeline only focuses on target regions (50kb by default) to extract barcoded reads as input and then integrates a haplotyping algorithm and local assembly to generate phased diploid contiguous sequences (contigs). Finally, it detects variants in the contigs through a pairwise contig-to-reference comparison. RESULTS We applied RegionIndel on two linked-reads libraries of sample HG002, one using 10x and the other stLFR. HG002 is a well-studied sample and the Genome in a Bottle (GiaB) community provides a gold standard SV set for it. RegionIndel outperformed several assembly and alignment-based SV callers in our benchmark experiments. After assembling all indel SVs, RegionIndel achieved an overall F1 score of 74.8% in deletions and 61.8% in insertions for 10x linked-reads, and 64.3% in deletions and 36.7% in insertions for stLFR linked-reads, respectively. Furthermore, it achieved an overall genotyping accuracy of 83.6% and 80.8% for 10x and stLFR linked-reads, respectively. CONCLUSIONS RegionIndel can achieve diploid assembly and detect indel SVs in each target region. The phased diploid contigs can further allow us to investigate indel SVs with nearby linked single nucleotide polymorphism (SNPs) and small indels in the same haplotype.
Collapse
Affiliation(s)
- Can Luo
- Department of Biomedical Engineering, Vanderbilt University, Nashville, 37235, TN, USA
| | - Brock A Peters
- Advanced Genomics Technology Lab, Complete Genomics Inc, 2904 Orchard Parkway, San Jose, 95134, CA, USA
| | - Xin Maizie Zhou
- Department of Biomedical Engineering, Vanderbilt University, Nashville, 37235, TN, USA.
- Department of Computer Science, Vanderbilt University, Nashville, 37235, TN, USA.
| |
Collapse
|
3
|
Naghinejad M, Parvizpour S, Khaniani MS, Mehri M, Derakhshan SM, Amirfiroozy A. The known structural variations in hearing loss and their diagnostic approaches: a comprehensive review. Mol Biol Rep 2025; 52:131. [PMID: 39821465 DOI: 10.1007/s11033-025-10231-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2024] [Accepted: 01/07/2025] [Indexed: 01/19/2025]
Abstract
Hearing loss (HL) is the most common sensory disorder, characterized by a wide range of causes, including both environmental and genetic factors. While single-nucleotide variants (SNVs) and small insertions/deletions have been extensively studied, the role of structural variations (SVs) in hearing impairment has gained increasing recognition. This review article aims to provide a comprehensive overview of the importance of SVs in HL, by exploring the SVs associated with HL and their underlying pathogenic mechanisms. Additionally, diagnostic methods of SVs have been briefly evaluated and compared in general. Three major mechanisms by which SVs can lead to HL are gene disruption, gene dosage imbalance, and position effect. Furthermore, to facilitate the detection of SVs in HL, this review presents a table highlighting the key genes and genomic regions implicated in SVs and their diagnostic approaches associated with HL patients. In the next step, indications for the use of SV diagnostic techniques are compiled in another table in this article, which will help experts in choosing the most appropriate technique. At last, the comprehensive review presented here underscores the significant role of SVs in HL. Further research is required to fully elucidate the spectrum of SVs in HL and optimize the clinical use of SV detection methods in routine diagnostic procedures.
Collapse
Affiliation(s)
- Maryam Naghinejad
- Department of Medical Genetics, Faculty of Medicine, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Sepideh Parvizpour
- Research Center for Pharmaceutical Nanotechnology, Biomedicine Institute, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Mahmoud Shekari Khaniani
- Department of Medical Genetics, Faculty of Medicine, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Maghsood Mehri
- Department of Medical Genetics, Faculty of Medicine, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Sima Mansoori Derakhshan
- Department of Medical Genetics, Faculty of Medicine, Tabriz University of Medical Sciences, Tabriz, Iran.
| | - Akbar Amirfiroozy
- Department of Medical Genetics, Faculty of Medicine, Tabriz University of Medical Sciences, Tabriz, Iran.
| |
Collapse
|
4
|
Meleshko D, Yang R, Maharjan S, Danko DC, Korobeynikov A, Hajirasouliha I. Blackbird: structural variant detection using synthetic and low-coverage long-reads. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.17.624011. [PMID: 39605582 PMCID: PMC11601376 DOI: 10.1101/2024.11.17.624011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
Motivation Recent benchmarks of structural variant (SV) detection tools revealed that the majority of human genome structural variations (SVs), especially the medium-range (50-10,000 bp) SVs cannot be resolved with short-read sequencing, but long-read SV callers achieve great results on the same datasets. While improvements have been made, high-coverage long-read sequencing is associated with higher costs and input DNA requirements. To decrease the cost one can lower the sequence coverage, but the current long-read SV callers perform poorly with coverage below 10×. Synthetic long-read (SLR) technologies hold great potential for structural variant (SV) detection, although utilizing their long-range information for events smaller than 50 kbp has been challenging. Results In this work, we propose a hybrid novel integrated alignment- and local-assembly-based algorithm, Blackbird, that uses SLR together with low-coverage long reads to improve SV detection and assembly. Without the need for a computationally expensive whole genome assembly, Blackbird uses a sliding window approach and barcode information encoded in SLR to accurately assemble small segments and use long reads for an improved gap closing and contig assembly. We evaluated Blackbird on simulated and real human genome datasets. Using the HG002 GIAB benchmark set, we demonstrated that in hybrid mode, Blackbird demonstrated results comparable to state-of-the-art long-read tools, while using less long-read coverage. Blackbird requires only 5× coverage to achieve F1 scores (0.835 and 0.808 for deletions and insertions) similar to PBSV (0.856 and 0.812) and Sniffles2 (0.839 and 0.804) using 10× Pacbio Hi-Fi long-read coverage.
Collapse
Affiliation(s)
- Dmitry Meleshko
- Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medical College, 10021, New York, USA
- Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, 10021, New York, USA
| | - Rui Yang
- Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medical College, 10021, New York, USA
| | - Salil Maharjan
- Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, 10021, New York, USA
| | - David C. Danko
- Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medical College, 10021, New York, USA
- Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, 10021, New York, USA
| | | | - Iman Hajirasouliha
- Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, 10021, New York, USA
- Englander Institute for Precision Medicine, Weill Cornell Medicine of Cornell University, 10021, New York, USA
| |
Collapse
|
5
|
Jiang L, Quail MA, Fraser-Govil J, Wang H, Shi X, Oliver K, Mellado Gomez E, Yang F, Ning Z. The Bioinformatic Applications of Hi-C and Linked Reads. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae048. [PMID: 38905513 PMCID: PMC11580686 DOI: 10.1093/gpbjnl/qzae048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 05/07/2024] [Accepted: 06/19/2024] [Indexed: 06/23/2024]
Abstract
Long-range sequencing grants insight into additional genetic information beyond what can be accessed by both short reads and modern long-read technology. Several new sequencing technologies, such as "Hi-C" and "Linked Reads", produce long-range datasets for high-throughput and high-resolution genome analyses, which are rapidly advancing the field of genome assembly, genome scaffolding, and more comprehensive variant identification. In this review, we focused on five major long-range sequencing technologies: high-throughput chromosome conformation capture (Hi-C), 10X Genomics Linked Reads, haplotagging, transposase enzyme linked long-read sequencing (TELL-seq), and single- tube long fragment read (stLFR). We detailed the mechanisms and data products of the five platforms and their important applications, evaluated the quality of sequencing data from different platforms, and discussed the currently available bioinformatics tools. This work will benefit the selection of appropriate long-range technology for specific biological studies.
Collapse
Affiliation(s)
- Libo Jiang
- School of Life Sciences and Medicine, Shandong University of Technology, Zibo 255049, China
| | - Michael A Quail
- The Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Jack Fraser-Govil
- The Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Haipeng Wang
- School of Life Sciences and Medicine, Shandong University of Technology, Zibo 255049, China
| | - Xuequn Shi
- College of Food Science and Technology, Hainan University, Haikou 570228, China
| | - Karen Oliver
- The Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Esther Mellado Gomez
- The Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Fengtang Yang
- School of Life Sciences and Medicine, Shandong University of Technology, Zibo 255049, China
| | - Zemin Ning
- The Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| |
Collapse
|
6
|
Gunn JC, Christensen BM, Bueno EM, Cohen ZP, Kissonergis AS, Chen YH. Agricultural insect pests as models for studying stress-induced evolutionary processes. INSECT MOLECULAR BIOLOGY 2024; 33:432-443. [PMID: 38655882 DOI: 10.1111/imb.12915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 04/14/2024] [Indexed: 04/26/2024]
Abstract
Agricultural insect pests (AIPs) are widely successful in adapting to natural and anthropogenic stressors, repeatedly overcoming population bottlenecks and acquiring resistance to intensive management practices. Although they have been largely overlooked in evolutionary studies, AIPs are ideal systems for understanding rapid adaptation under novel environmental conditions. Researchers have identified several genomic mechanisms that likely contribute to adaptive stress responses, including positive selection on de novo mutations, polygenic selection on standing allelic variation and phenotypic plasticity (e.g., hormesis). However, new theory suggests that stress itself may induce epigenetic modifications, which may confer heritable physiological changes (i.e., stress-resistant phenotypes). In this perspective, we discuss how environmental stress from agricultural management generates the epigenetic and genetic modifications that are associated with rapid adaptation in AIPs. We summarise existing evidence for stress-induced evolutionary processes in the context of insecticide resistance. Ultimately, we propose that studying AIPs offers new opportunities and resources for advancing our knowledge of stress-induced evolution.
Collapse
Affiliation(s)
- Joe C Gunn
- Department of Plant and Soil Science, University of Vermont, Burlington, Vermont, USA
| | - Blair M Christensen
- Department of Plant and Soil Science, University of Vermont, Burlington, Vermont, USA
| | - Erika M Bueno
- Department of Plant and Soil Science, University of Vermont, Burlington, Vermont, USA
| | - Zachary P Cohen
- Insect Control and Cotton Disease Research, USDA ARS, College Station, Texas, USA
| | | | - Yolanda H Chen
- Department of Plant and Soil Science, University of Vermont, Burlington, Vermont, USA
| |
Collapse
|
7
|
Tsai CY, Hsu JSJ, Chen PL, Wu CC. Implementing next-generation sequencing for diagnosis and management of hereditary hearing impairment: a comprehensive review. Expert Rev Mol Diagn 2024; 24:753-765. [PMID: 39194060 DOI: 10.1080/14737159.2024.2396866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2024] [Accepted: 08/22/2024] [Indexed: 08/29/2024]
Abstract
INTRODUCTION Sensorineural hearing impairment (SNHI), a common childhood disorder with heterogeneous genetic causes, can lead to delayed language development and psychosocial problems. Next-generation sequencing (NGS) offers high-throughput screening and high-sensitivity detection of genetic etiologies of SNHI, enabling clinicians to make informed medical decisions, provide tailored treatments, and improve prognostic outcomes. AREAS COVERED This review covers the diverse etiologies of HHI and the utility of different NGS modalities (targeted sequencing and whole exome/genome sequencing), and includes HHI-related studies on newborn screening, genetic counseling, prognostic prediction, and personalized treatment. Challenges such as the trade-off between cost and diagnostic yield, detection of structural variants, and exploration of the non-coding genome are also highlighted. EXPERT OPINION In the current landscape of NGS-based diagnostics for HHI, there are both challenges (e.g. detection of structural variants and non-coding genome variants) and opportunities (e.g. the emergence of medical artificial intelligence tools). The authors advocate the use of technological advances such as long-read sequencing for structural variant detection, multi-omics analysis for non-coding variant exploration, and medical artificial intelligence for pathogenicity assessment and outcome prediction. By integrating these innovations into clinical practice, precision medicine in the diagnosis and management of HHI can be further improved.
Collapse
Affiliation(s)
- Cheng-Yu Tsai
- Graduate Institute of Medical Genomics and Proteomics, National Taiwan University College of Medicine, Taipei, Taiwan
- Department of Otolaryngology, National Taiwan University Hospital, Taipei, Taiwan
| | - Jacob Shu-Jui Hsu
- Graduate Institute of Medical Genomics and Proteomics, National Taiwan University College of Medicine, Taipei, Taiwan
| | - Pei-Lung Chen
- Graduate Institute of Medical Genomics and Proteomics, National Taiwan University College of Medicine, Taipei, Taiwan
- Graduate Institute of Clinical Medicine, National Taiwan University College of Medicine, Taipei, Taiwan
- Institute of Molecular Medicine, National Taiwan University College of Medicine, Taipei, Taiwan
- Department of Medical Genetics, National Taiwan University Hospital, Taipei, Taiwan
| | - Chen-Chi Wu
- Department of Otolaryngology, National Taiwan University Hospital, Taipei, Taiwan
- Graduate Institute of Clinical Medicine, National Taiwan University College of Medicine, Taipei, Taiwan
- Department of Medical Research, National Taiwan University Hospital Hsin-Chu Branch, Hsinchu, Taiwan
- Department of Otolaryngology, National Taiwan University Hospital Hsin-Chu Branch, Hsinchu, Taiwan
| |
Collapse
|
8
|
Foltz SM, Li Y, Yao L, Terekhanova NV, Weerasinghe A, Gao Q, Dong G, Schindler M, Cao S, Sun H, Jayasinghe RG, Fulton RS, Fronick CC, King J, Kohnen DR, Fiala MA, Chen K, DiPersio JF, Vij R, Ding L. Somatic mutation phasing and haplotype extension using linked-reads in multiple myeloma. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.09.607342. [PMID: 39149342 PMCID: PMC11326269 DOI: 10.1101/2024.08.09.607342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
Somatic mutation phasing informs our understanding of cancer-related events, like driver mutations. We generated linked-read whole genome sequencing data for 23 samples across disease stages from 14 multiple myeloma (MM) patients and systematically assigned somatic mutations to haplotypes using linked-reads. Here, we report the reconstructed cancer haplotypes and phase blocks from several MM samples and show how phase block length can be extended by integrating samples from the same individual. We also uncover phasing information in genes frequently mutated in MM, including DIS3, HIST1H1E, KRAS, NRAS, and TP53, phasing 79.4% of 20,705 high-confidence somatic mutations. In some cases, this enabled us to interpret clonal evolution models at higher resolution using pairs of phased somatic mutations. For example, our analysis of one patient suggested that two NRAS hotspot mutations occurred on the same haplotype but were independent events in different subclones. Given sufficient tumor purity and data quality, our framework illustrates how haplotype-aware analysis of somatic mutations in cancer can be beneficial for some cancer cases.
Collapse
Affiliation(s)
- Steven M. Foltz
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Yize Li
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Lijun Yao
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Nadezhda V. Terekhanova
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Amila Weerasinghe
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Qingsong Gao
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Guanlan Dong
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Moses Schindler
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Song Cao
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Hua Sun
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Reyka G. Jayasinghe
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Robert S. Fulton
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Catrina C. Fronick
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Justin King
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
| | - Daniel R. Kohnen
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
| | - Mark A. Fiala
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
| | - Ken Chen
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - John F. DiPersio
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, 63110, USA
| | - Ravi Vij
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, 63110, USA
| | - Li Ding
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
- Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, 63110, USA
- Department of Genetics, Washington University in St. Louis, St. Louis, MO, 63110, USA
| |
Collapse
|
9
|
Wang D, Rastas P, Yi X, Löytynoja A, Kivikoski M, Feng X, Reid K, Merilä J. Improved assembly of the Pungitius pungitius reference genome. G3 (BETHESDA, MD.) 2024; 14:jkae126. [PMID: 38861393 PMCID: PMC11304971 DOI: 10.1093/g3journal/jkae126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Revised: 05/23/2024] [Accepted: 05/30/2024] [Indexed: 06/13/2024]
Abstract
The nine-spined stickleback (Pungitius pungitius) has been increasingly used as a model system in studies of local adaptation and sex chromosome evolution but its current reference genome assembly is far from perfect, lacking distinct sex chromosomes. We generated an improved assembly of the nine-spined stickleback reference genome (98.3% BUSCO completeness) with the aid of linked-read mapping. While the new assembly (v8) was of similar size as the earlier version (v7), we were able to assign 4.4 times more contigs to the linkage groups and improve the contiguity of the genome. Moreover, the new assembly contains a ∼22.8 Mb Y-linked scaffold (LG22) consisting mainly of previously assigned X-contigs, putative Y-contigs, putative centromere contigs, and highly repetitive elements. The male individual showed an even mapping depth on LG12 (pseudo X chromosome) and LG22 (Y-linked scaffold) in the segregating sites, suggesting near-pure X and Y representation in the v8 assembly. A total of 26,803 genes were annotated, and about 33% of the assembly was found to consist of repetitive elements. The high proportion of repetitive elements in LG22 (53.10%) suggests it can be difficult to assemble the complete sequence of the species' Y chromosome. Nevertheless, the new assembly is a significant improvement over the previous version and should provide a valuable resource for genomic studies of stickleback fishes.
Collapse
Affiliation(s)
- Dandan Wang
- Area of Ecology and Biodiversity, School of Biological Sciences, The University of Hong Kong, 999077, Hong Kong SAR
| | - Pasi Rastas
- Institute of Biotechnology, University of Helsinki, Helsinki FI-00014, Finland
| | - Xueling Yi
- Area of Ecology and Biodiversity, School of Biological Sciences, The University of Hong Kong, 999077, Hong Kong SAR
| | - Ari Löytynoja
- Institute of Biotechnology, University of Helsinki, Helsinki FI-00014, Finland
- Organismal and Evolutionary Biology Research Programme, Faculty of Biological and Environmental Sciences, University of Helsinki, Helsinki FI-00014, Finland
| | - Mikko Kivikoski
- Ecological Genetics Research Unit, Organismal and Evolutionary Biology Research Programme, Faculty of Biological and Environmental Sciences, University of Helsinki, Helsinki FI-00014, Finland
- Department of Computer Science, University of Helsinki, Helsinki FI-00014, Finland
| | - Xueyun Feng
- Institute of Biotechnology, University of Helsinki, Helsinki FI-00014, Finland
- Ecological Genetics Research Unit, Organismal and Evolutionary Biology Research Programme, Faculty of Biological and Environmental Sciences, University of Helsinki, Helsinki FI-00014, Finland
| | - Kerry Reid
- Area of Ecology and Biodiversity, School of Biological Sciences, The University of Hong Kong, 999077, Hong Kong SAR
| | - Juha Merilä
- Area of Ecology and Biodiversity, School of Biological Sciences, The University of Hong Kong, 999077, Hong Kong SAR
- Ecological Genetics Research Unit, Organismal and Evolutionary Biology Research Programme, Faculty of Biological and Environmental Sciences, University of Helsinki, Helsinki FI-00014, Finland
| |
Collapse
|
10
|
Orteu A, Kucka M, Gordon IJ, Ng’iru I, van der Heijden ESM, Talavera G, Warren IA, Collins S, ffrench-Constant RH, Martins DJ, Chan YF, Jiggins CD, Martin SH. Transposable Element Insertions Are Associated with Batesian Mimicry in the Pantropical Butterfly Hypolimnas misippus. Mol Biol Evol 2024; 41:msae041. [PMID: 38401262 PMCID: PMC10924252 DOI: 10.1093/molbev/msae041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 02/14/2024] [Accepted: 02/16/2024] [Indexed: 02/26/2024] Open
Abstract
Hypolimnas misippus is a Batesian mimic of the toxic African Queen butterfly (Danaus chrysippus). Female H. misippus butterflies use two major wing patterning loci (M and A) to imitate three color morphs of D. chrysippus found in different regions of Africa. In this study, we examine the evolution of the M locus and identify it as an example of adaptive atavism. This phenomenon involves a morphological reversion to an ancestral character that results in an adaptive phenotype. We show that H. misippus has re-evolved an ancestral wing pattern present in other Hypolimnas species, repurposing it for Batesian mimicry of a D. chrysippus morph. Using haplotagging, a linked-read sequencing technology, and our new analytical tool, Wrath, we discover two large transposable element insertions located at the M locus and establish that these insertions are present in the dominant allele responsible for producing mimetic phenotype. By conducting a comparative analysis involving additional Hypolimnas species, we demonstrate that the dominant allele is derived. This suggests that, in the derived allele, the transposable elements disrupt a cis-regulatory element, leading to the reversion to an ancestral phenotype that is then utilized for Batesian mimicry of a distinct model, a different morph of D. chrysippus. Our findings present a compelling instance of convergent evolution and adaptive atavism, in which the same pattern element has independently evolved multiple times in Hypolimnas butterflies, repeatedly playing a role in Batesian mimicry of diverse model species.
Collapse
Affiliation(s)
- Anna Orteu
- Department of Zoology, University of Cambridge, Cambridge CB2 3EJ, UK
- Tree of Life Programme, Wellcome Sanger Institute, Hinxton, UK
| | - Marek Kucka
- Friedrich Miescher Laboratory of the Max Planck Society, Tübingen, Germany
| | - Ian J Gordon
- Centre of Excellence in Biodiversity, University of Rwanda, Huye, Rwanda
| | - Ivy Ng’iru
- Mpala Research Centre, Nanyuki 10400, Laikipia, Kenya
- School of Biosciences, Cardiff University, Cardiff CF 10 3AX, UK
- UK Centre for Ecology and Hydrology, Wallingford OX10 8BB, UK
| | - Eva S M van der Heijden
- Department of Zoology, University of Cambridge, Cambridge CB2 3EJ, UK
- Tree of Life Programme, Wellcome Sanger Institute, Hinxton, UK
| | - Gerard Talavera
- Institut Botànic de Barcelona (IBB), CSIC-CMCNB, Barcelona, Catalonia, Spain
| | - Ian A Warren
- Department of Zoology, University of Cambridge, Cambridge CB2 3EJ, UK
| | - Steve Collins
- African Butterfly Research Institute, Nairobi, Kenya
| | | | - Dino J Martins
- Turkana Basin Institute, Stony Brook University, Stony Brook, NY 11794, USA
| | | | - Chris D Jiggins
- Department of Zoology, University of Cambridge, Cambridge CB2 3EJ, UK
| | - Simon H Martin
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK
| |
Collapse
|
11
|
Höjer P, Frick T, Siga H, Pourbozorgi P, Aghelpasand H, Martin M, Ahmadian A. BLR: a flexible pipeline for haplotype analysis of multiple linked-read technologies. Nucleic Acids Res 2023; 51:e114. [PMID: 37941142 PMCID: PMC10711428 DOI: 10.1093/nar/gkad1010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 10/04/2023] [Accepted: 10/18/2023] [Indexed: 11/10/2023] Open
Abstract
Linked-read sequencing promises a one-method approach for genome-wide insights including single nucleotide variants (SNVs), structural variants, and haplotyping. We introduce Barcode Linked Reads (BLR), an open-source haplotyping pipeline capable of handling millions of barcodes and data from multiple linked-read technologies including DBS, 10× Genomics, TELL-seq and stLFR. Running BLR on DBS linked-reads yielded megabase-scale phasing with low (<0.2%) switch error rates. Of 13616 protein-coding genes phased in the GIAB benchmark set (v4.2.1), 98.6% matched the BLR phasing. In addition, large structural variants showed concordance with HPRC-HG002 reference assembly calls. Compared to diploid assembly with PacBio HiFi reads, BLR phasing was more continuous when considering switch errors. We further show that integrating long reads at low coverage (∼10×) can improve phasing contiguity and reduce switch errors in tandem repeats. When compared to Long Ranger on 10× Genomics data, BLR showed an increase in phase block N50 with low switch-error rates. For TELL-Seq and stLFR linked reads, BLR generated longer or similar phase block lengths and low switch error rates compared to results presented in the original publications. In conclusion, BLR provides a flexible workflow for comprehensive haplotype analysis of linked reads from multiple platforms.
Collapse
Affiliation(s)
- Pontus Höjer
- Royal Institute of Technology (KTH), School of Engineering Sciences in Chemistry, Biotechnology and Health, Department of Gene Technology, Science for Life Laboratory, SE-171 65, Solna, Sweden
| | - Tobias Frick
- Royal Institute of Technology (KTH), School of Engineering Sciences in Chemistry, Biotechnology and Health, Department of Gene Technology, Science for Life Laboratory, SE-171 65, Solna, Sweden
| | - Humam Siga
- Royal Institute of Technology (KTH), School of Engineering Sciences in Chemistry, Biotechnology and Health, Department of Gene Technology, Science for Life Laboratory, SE-171 65, Solna, Sweden
| | - Parham Pourbozorgi
- Royal Institute of Technology (KTH), School of Engineering Sciences in Chemistry, Biotechnology and Health, Department of Gene Technology, Science for Life Laboratory, SE-171 65, Solna, Sweden
| | - Hooman Aghelpasand
- Royal Institute of Technology (KTH), School of Engineering Sciences in Chemistry, Biotechnology and Health, Department of Gene Technology, Science for Life Laboratory, SE-171 65, Solna, Sweden
| | - Marcel Martin
- Stockholm University, Department of Biochemistry and Biophysics, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, SE-171 65, Solna, Sweden
| | - Afshin Ahmadian
- Royal Institute of Technology (KTH), School of Engineering Sciences in Chemistry, Biotechnology and Health, Department of Gene Technology, Science for Life Laboratory, SE-171 65, Solna, Sweden
| |
Collapse
|
12
|
Choo ZN, Behr JM, Deshpande A, Hadi K, Yao X, Tian H, Takai K, Zakusilo G, Rosiene J, Da Cruz Paula A, Weigelt B, Setton J, Riaz N, Powell SN, Busam K, Shoushtari AN, Ariyan C, Reis-Filho J, de Lange T, Imieliński M. Most large structural variants in cancer genomes can be detected without long reads. Nat Genet 2023; 55:2139-2148. [PMID: 37945902 PMCID: PMC10703688 DOI: 10.1038/s41588-023-01540-6] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 09/19/2023] [Indexed: 11/12/2023]
Abstract
Short-read sequencing is the workhorse of cancer genomics yet is thought to miss many structural variants (SVs), particularly large chromosomal alterations. To characterize missing SVs in short-read whole genomes, we analyzed 'loose ends'-local violations of mass balance between adjacent DNA segments. In the landscape of loose ends across 1,330 high-purity cancer whole genomes, most large (>10-kb) clonal SVs were fully resolved by short reads in the 87% of the human genome where copy number could be reliably measured. Some loose ends represent neotelomeres, which we propose as a hallmark of the alternative lengthening of telomeres phenotype. These pan-cancer findings were confirmed by long-molecule profiles of 38 breast cancer and melanoma cases. Our results indicate that aberrant homologous recombination is unlikely to drive the majority of large cancer SVs. Furthermore, analysis of mass balance in short-read whole genome data provides a surprisingly complete picture of cancer chromosomal structure.
Collapse
Affiliation(s)
- Zi-Ning Choo
- New York Genome Center, New York, NY, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
- Tri-institutional MD PhD Program, Weill Cornell Medicine, New York, NY, USA
- Physiology and Biophysics PhD Program, Weill Cornell Medicine, New York, NY, USA
| | - Julie M Behr
- New York Genome Center, New York, NY, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
- Tri-institutional PhD Program in Computational Biology and Medicine, New York, NY, USA
| | - Aditya Deshpande
- New York Genome Center, New York, NY, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
- Tri-institutional PhD Program in Computational Biology and Medicine, New York, NY, USA
| | - Kevin Hadi
- New York Genome Center, New York, NY, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
- Physiology and Biophysics PhD Program, Weill Cornell Medicine, New York, NY, USA
| | - Xiaotong Yao
- New York Genome Center, New York, NY, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
- Tri-institutional PhD Program in Computational Biology and Medicine, New York, NY, USA
| | - Huasong Tian
- New York Genome Center, New York, NY, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
- Perlmutter Cancer Center, NYU Grossman School of Medicine, New York, NY, USA
| | - Kaori Takai
- Laboratory of Cell Biology and Genetics, Rockefeller University, New York, NY, USA
| | - George Zakusilo
- Laboratory of Cell Biology and Genetics, Rockefeller University, New York, NY, USA
| | - Joel Rosiene
- New York Genome Center, New York, NY, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
| | | | - Britta Weigelt
- Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Jeremy Setton
- Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Nadeem Riaz
- Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Simon N Powell
- Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Klaus Busam
- Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | | | | | | | - Titia de Lange
- Laboratory of Cell Biology and Genetics, Rockefeller University, New York, NY, USA
| | - Marcin Imieliński
- New York Genome Center, New York, NY, USA.
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA.
- Perlmutter Cancer Center, NYU Grossman School of Medicine, New York, NY, USA.
- Department of Pathology, NYU Grossman School of Medicine, New York, NY, USA.
| |
Collapse
|
13
|
Zivanovic A, Miller J, Munro S, Knutson T, Li Y, Passow C, Simonaitis P, Lynch M, Oseth L, Zhao S, Feng F, Wikström P, Corey E, Morrissey C, Henzler C, Raphael B, Dehm S. Co-evolution of AR gene copy number and structural complexity in endocrine therapy resistant prostate cancer. NAR Cancer 2023; 5:zcad045. [PMID: 37636316 PMCID: PMC10448862 DOI: 10.1093/narcan/zcad045] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 07/17/2023] [Accepted: 08/09/2023] [Indexed: 08/29/2023] Open
Abstract
Androgen receptor (AR) inhibition is standard of care for advanced prostate cancer (PC). However, efficacy is limited by progression to castration-resistant PC (CRPC), usually due to AR re-activation via mechanisms that include AR amplification and structural rearrangement. These two classes of AR alterations often co-occur in CRPC tumors, but it is unclear whether this reflects intercellular or intracellular heterogeneity of AR. Resolving this is important for developing new therapies and predictive biomarkers. Here, we analyzed 41 CRPC tumors and 6 patient-derived xenografts (PDXs) using linked-read DNA-sequencing, and identified 7 tumors that developed complex, multiply-rearranged AR gene structures in conjunction with very high AR copy number. Analysis of PDX models by optical genome mapping and fluorescence in situ hybridization showed that AR residing on extrachromosomal DNA (ecDNA) was an underlying mechanism, and was associated with elevated levels and diversity of AR expression. This study identifies co-evolution of AR gene copy number and structural complexity via ecDNA as a mechanism associated with endocrine therapy resistance.
Collapse
Affiliation(s)
- Andrej Zivanovic
- Masonic Cancer Center, University of Minnesota, Minneapolis, MN, USA
| | - Jeffrey T Miller
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN, USA
| | - Sarah A Munro
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN, USA
| | - Todd P Knutson
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN, USA
| | - Yingming Li
- Masonic Cancer Center, University of Minnesota, Minneapolis, MN, USA
| | - Courtney N Passow
- University of Minnesota Genomics Center, University of Minnesota, Minneapolis, MN, USA
| | - Pijus Simonaitis
- Department of Computer Science, Princeton University, Princeton, NJ, USA
| | - Molly Lynch
- Masonic Cancer Center, University of Minnesota, Minneapolis, MN, USA
| | - LeAnn Oseth
- Masonic Cancer Center, University of Minnesota, Minneapolis, MN, USA
| | - Shuang G Zhao
- Department of Human Oncology, University of Wisconsin-Madison, Madison, WI, USA
- Carbone Cancer Center, University of Wisconsin-Madison, Madison, WI, USA
- William S. Middleton Memorial Veterans Hospital, Madison, Madison, WI, USA
| | - Felix Y Feng
- Departments of Radiation Oncology, Urology, and Medicine, University of California San Francisco, San Francisco, CA, USA
- Helen Diller Family Comprehensive Cancer Center, University of California at San Francisco, San Francisco, CA, USA
| | - Pernilla Wikström
- Department of Medical Biosciences, Pathology, Umeå University, Umeå, Sweden
| | - Eva Corey
- Department of Urology, University of Washington, Seattle, WA, USA
| | - Colm Morrissey
- Department of Urology, University of Washington, Seattle, WA, USA
| | - Christine Henzler
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN, USA
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, NJ, USA
| | - Scott M Dehm
- Masonic Cancer Center, University of Minnesota, Minneapolis, MN, USA
- Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, MN, USA
- Department of Urology, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
14
|
Alkailani MI, Gibbings D. The Regulation and Immune Signature of Retrotransposons in Cancer. Cancers (Basel) 2023; 15:4340. [PMID: 37686616 PMCID: PMC10486412 DOI: 10.3390/cancers15174340] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Revised: 08/14/2023] [Accepted: 08/18/2023] [Indexed: 09/10/2023] Open
Abstract
Advances in sequencing technologies and the bioinformatic analysis of big data facilitate the study of jumping genes' activity in the human genome in cancer from a broad perspective. Retrotransposons, which move from one genomic site to another by a copy-and-paste mechanism, are regulated by various molecular pathways that may be disrupted during tumorigenesis. Active retrotransposons can stimulate type I IFN responses. Although accumulated evidence suggests that retrotransposons can induce inflammation, the research investigating the exact mechanism of triggering these responses is ongoing. Understanding these mechanisms could improve the therapeutic management of cancer through the use of retrotransposon-induced inflammation as a tool to instigate immune responses to tumors.
Collapse
Affiliation(s)
- Maisa I. Alkailani
- College of Health and Life Sciences, Hamad Bin Khalifa University, Qatar Foundation, Doha P.O. Box 34110, Qatar
| | - Derrick Gibbings
- Department of Cellular and Molecular Medicine, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada;
| |
Collapse
|
15
|
Weisweiler M, Stich B. Benchmarking of structural variant detection in the tetraploid potato genome using linked-read sequencing. Genomics 2023; 115:110568. [PMID: 36702293 DOI: 10.1016/j.ygeno.2023.110568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 01/12/2023] [Accepted: 01/18/2023] [Indexed: 01/25/2023]
Abstract
It has recently been shown that structural variants (SV) can have a higher impact on gene expression variation compared to single nucleotide variants (SNV) in different plant species. Additionally, SV were associated with phenotypic variation in several crops. However, compared to the established SV detection based on short-read sequencing, less approaches were described for linked-read based SV calling. We therefore evaluated the performance of six linked-read SV callers compared to an established short-read SV caller based on simulated linked-reads in tetraploid potato. The objectives of our study were to i) compare the performance of SV callers based on linked-read sequencing to short-read sequencing, ii) examine the influence of SV type, SV length, haplotype incidence (HI), as well as sequencing coverage on the SV calling performance in the tetraploid potato genome, and iii) evaluate the accuracy of detecting insertions by linked-read compared to short-read sequencing. We observed high break point resolutions (BPR) detecting short SV and slightly lower BPR for large SV. Our observations highlighted the importance of short-read signals provided by Manta and LinkedSV to detect short SV. Manta and NAIBR performed well for detecting larger deletions, inversions, and duplications. Detected large SV were weakly influenced by the HI. Furthermore, we illustrated that large insertions can be assembled by Novel-X. Our results suggest the usage of the short-read and linked-read SV callers Manta, NAIBR, LinkedSV, and Novel-X based on at least 90x linked-read sequencing coverage to ensure the detection of a broad range of SV in the tetraploid potato genome.
Collapse
Affiliation(s)
- Marius Weisweiler
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225 Düsseldorf, Germany
| | - Benjamin Stich
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225 Düsseldorf, Germany; Cluster of Excellence on Plant Sciences, From Complex Traits towards Synthetic Modules, Universitätsstraße 1, 40225 Düsseldorf, Germany; Max Planck Institute for Plant Breeding Research, Carl-von-Linne-Weg 10, 50829 Köln, Germany.
| |
Collapse
|
16
|
Zwaig M, Baguette A, Hu B, Johnston M, Lakkis H, Nakada EM, Faury D, Juretic N, Ellezam B, Weil AG, Karamchandani J, Majewski J, Blanchette M, Taylor MD, Gallo M, Kleinman CL, Jabado N, Ragoussis J. Detection and genomic analysis of BRAF fusions in Juvenile Pilocytic Astrocytoma through the combination and integration of multi-omic data. BMC Cancer 2022; 22:1297. [PMID: 36503484 PMCID: PMC9743522 DOI: 10.1186/s12885-022-10359-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Accepted: 11/22/2022] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Juvenile Pilocytic Astrocytomas (JPAs) are one of the most common pediatric brain tumors, and they are driven by aberrant activation of the mitogen-activated protein kinase (MAPK) signaling pathway. RAF-fusions are the most common genetic alterations identified in JPAs, with the prototypical KIAA1549-BRAF fusion leading to loss of BRAF's auto-inhibitory domain and subsequent constitutive kinase activation. JPAs are highly vascular and show pervasive immune infiltration, which can lead to low tumor cell purity in clinical samples. This can result in gene fusions that are difficult to detect with conventional omics approaches including RNA-Seq. METHODS To this effect, we applied RNA-Seq as well as linked-read whole-genome sequencing and in situ Hi-C as new approaches to detect and characterize low-frequency gene fusions at the genomic, transcriptomic and spatial level. RESULTS Integration of these datasets allowed the identification and detailed characterization of two novel BRAF fusion partners, PTPRZ1 and TOP2B, in addition to the canonical fusion with partner KIAA1549. Additionally, our Hi-C datasets enabled investigations of 3D genome architecture in JPAs which showed a high level of correlation in 3D compartment annotations between JPAs compared to other pediatric tumors, and high similarity to normal adult astrocytes. We detected interactions between BRAF and its fusion partners exclusively in tumor samples containing BRAF fusions. CONCLUSIONS We demonstrate the power of integrating multi-omic datasets to identify low frequency fusions and characterize the JPA genome at high resolution. We suggest that linked-reads and Hi-C could be used in clinic for the detection and characterization of JPAs.
Collapse
Affiliation(s)
- Melissa Zwaig
- grid.14709.3b0000 0004 1936 8649McGill Genome Centre and Department of Human Genetics, McGill University, Montreal, Canada
| | - Audrey Baguette
- grid.414980.00000 0000 9401 2774Quantitative Life Sciences and Lady Davis Institute for Medical Research, Montreal, Quebec Canada
| | - Bo Hu
- grid.14709.3b0000 0004 1936 8649McGill Genome Centre and Department of Human Genetics, McGill University, Montreal, Canada
| | - Michael Johnston
- grid.22072.350000 0004 1936 7697Alberta Children‘s Hospital Research Institute, Charbonneau Cancer Institute, and Department of Biochemistry and Molecular Biology, Cumming School of Medicine, University of Calgary, Calgary, AB Canada
| | - Hussein Lakkis
- grid.414980.00000 0000 9401 2774Department of Human Genetics and Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Quebec Canada
| | - Emily M. Nakada
- grid.63984.300000 0000 9064 4811The Research Institute of the McGill University Health Centre, Montreal, Canada
| | - Damien Faury
- grid.63984.300000 0000 9064 4811The Research Institute of the McGill University Health Centre, Montreal, Canada
| | - Nikoleta Juretic
- grid.63984.300000 0000 9064 4811The Research Institute of the McGill University Health Centre, Montreal, Canada
| | - Benjamin Ellezam
- grid.14848.310000 0001 2292 3357Department of Pathology, Centre Hospitalier Universitaire Sainte-Justine, Université de Montréal, Montréal, QC, H3T 1C5 Canada
| | - Alexandre G. Weil
- grid.14848.310000 0001 2292 3357Department of Pediatric Neurosurgery, Centre Hospitalier Universitaire Sainte-Justine, Université de Montréal, Montréal, QC H3T 1C5 Canada
| | - Jason Karamchandani
- grid.14709.3b0000 0004 1936 8649Department of Pathology, Montreal Neurological Institute, McGill University, Montreal, QC H3A 2B4 Canada
| | - Jacek Majewski
- grid.14709.3b0000 0004 1936 8649McGill Genome Centre and Department of Human Genetics, McGill University, Montreal, Canada
| | - Mathieu Blanchette
- grid.14709.3b0000 0004 1936 8649School of Computer Science and McGill Center for Bioinformatics, McGill University, Montréal, Québec Canada
| | - Michael D. Taylor
- grid.42327.300000 0004 0473 9646Arthur and Sonia Labatt Brain Tumour Research Centre, Hospital for Sick Children Research Institute, Toronto, Canada
| | - Marco Gallo
- grid.22072.350000 0004 1936 7697Alberta Children‘s Hospital Research Institute, Charbonneau Cancer Institute, and Department of Biochemistry and Molecular Biology, Cumming School of Medicine, University of Calgary, Calgary, AB Canada
| | - Claudia L. Kleinman
- grid.414980.00000 0000 9401 2774Department of Human Genetics and Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Quebec Canada
| | - Nada Jabado
- grid.63984.300000 0000 9064 4811Department of Human Genetics, Department of Pediatrics, and The Research Institute of the McGill University Health Centre, Montreal, Canada
| | - Jiannis Ragoussis
- grid.14709.3b0000 0004 1936 8649McGill Genome Centre and Department of Human Genetics, McGill University, Montreal, Canada
| |
Collapse
|
17
|
Weisweiler M, Arlt C, Wu PY, Van Inghelandt D, Hartwig T, Stich B. Structural variants in the barley gene pool: precision and sensitivity to detect them using short-read sequencing and their association with gene expression and phenotypic variation. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2022; 135:3511-3529. [PMID: 36029318 PMCID: PMC9519679 DOI: 10.1007/s00122-022-04197-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Accepted: 08/03/2022] [Indexed: 06/15/2023]
Abstract
Structural variants (SV) of 23 barley inbreds, detected by the best combination of SV callers based on short-read sequencing, were associated with genome-wide and gene-specific gene expression and, thus, were evaluated to predict agronomic traits. In human genetics, several studies have shown that phenotypic variation is more likely to be caused by structural variants (SV) than by single nucleotide variants. However, accurate while cost-efficient discovery of SV in complex genomes remains challenging. The objectives of our study were to (i) facilitate SV discovery studies by benchmarking SV callers and their combinations with respect to their sensitivity and precision to detect SV in the barley genome, (ii) characterize the occurrence and distribution of SV clusters in the genomes of 23 barley inbreds that are the parents of a unique resource for mapping quantitative traits, the double round robin population, (iii) quantify the association of SV clusters with transcript abundance, and (iv) evaluate the use of SV clusters for the prediction of phenotypic traits. In our computer simulations based on a sequencing coverage of 25x, a sensitivity > 70% and precision > 95% was observed for all combinations of SV types and SV length categories if the best combination of SV callers was used. We observed a significant (P < 0.05) association of gene-associated SV clusters with global gene-specific gene expression. Furthermore, about 9% of all SV clusters that were within 5 kb of a gene were significantly (P < 0.05) associated with the gene expression of the corresponding gene. The prediction ability of SV clusters was higher compared to that of single-nucleotide polymorphisms from an array across the seven studied phenotypic traits. These findings suggest the usefulness of exploiting SV information when fine mapping and cloning the causal genes underlying quantitative traits as well as the high potential of using SV clusters for the prediction of phenotypes in diverse germplasm sets.
Collapse
Affiliation(s)
- Marius Weisweiler
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225, Düsseldorf, Germany
| | - Christopher Arlt
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225, Düsseldorf, Germany
| | - Po-Ya Wu
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225, Düsseldorf, Germany
| | - Delphine Van Inghelandt
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225, Düsseldorf, Germany
| | - Thomas Hartwig
- Institute for Molecular Physiology, Universitätsstraße 1, 40225, Düsseldorf, Germany
| | - Benjamin Stich
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225, Düsseldorf, Germany.
- Cluster of Excellence on Plant Sciences, From Complex Traits towards Synthetic Modules, Universitätsstraße 1, 40225, Düsseldorf, Germany.
| |
Collapse
|
18
|
Eisfeldt J, Schuy J, Stattin EL, Kvarnung M, Falk A, Feuk L, Lindstrand A. Multi-Omic Investigations of a 17-19 Translocation Links MINK1 Disruption to Autism, Epilepsy and Osteoporosis. Int J Mol Sci 2022; 23:ijms23169392. [PMID: 36012658 PMCID: PMC9408972 DOI: 10.3390/ijms23169392] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 08/09/2022] [Accepted: 08/17/2022] [Indexed: 11/23/2022] Open
Abstract
Balanced structural variants, such as reciprocal translocations, are sometimes hard to detect with sequencing, especially when the breakpoints are located in repetitive or insufficiently mapped regions of the genome. In such cases, long-range information is required to resolve the rearrangement, identify disrupted genes and, in symptomatic carriers, pinpoint the disease-causing mechanisms. Here, we report an individual with autism, epilepsy and osteoporosis and a de novo balanced reciprocal translocation: t(17;19) (p13;p11). The genomic DNA was analyzed by short-, linked- and long-read genome sequencing, as well as optical mapping. Transcriptional consequences were assessed by transcriptome sequencing of patient-specific neuroepithelial stem cells derived from induced pluripotent stem cells (iPSC). The translocation breakpoints were only detected by long-read sequencing, the first on 17p13, located between exon 1 and exon 2 of MINK1 (Misshapen-like kinase 1), and the second in the chromosome 19 centromere. Functional validation in induced neural cells showed that MINK1 expression was reduced by >50% in the patient’s cells compared to healthy control cells. Furthermore, pathway analysis revealed an enrichment of changed neural pathways in the patient’s cells. Altogether, our multi-omics experiments highlight MINK1 as a candidate monogenic disease gene and show the advantages of long-read genome sequencing in capturing centromeric translocations.
Collapse
Affiliation(s)
- Jesper Eisfeldt
- Department of Molecular Medicine and Surgery, Karolinska Institutet, 171 76 Stockholm, Sweden
- Department of Clinical Genetics, Karolinska University Hospital, 171 76 Stockholm, Sweden
- Science for Life Laboratory, Karolinska Institutet Science Park, 171 65 Solna, Sweden
| | - Jakob Schuy
- Department of Molecular Medicine and Surgery, Karolinska Institutet, 171 76 Stockholm, Sweden
| | - Eva-Lena Stattin
- Department of Immunology, Genetics and Pathology, Uppsala University, 751 08 Uppsala, Sweden
| | - Malin Kvarnung
- Department of Molecular Medicine and Surgery, Karolinska Institutet, 171 76 Stockholm, Sweden
- Department of Clinical Genetics, Karolinska University Hospital, 171 76 Stockholm, Sweden
| | - Anna Falk
- Department of Neuroscience, Biomedicum, Karolinska Institutet, 171 77 Stockholm, Sweden
- Lund Stem Cell Center, Department of Experimental Medical Science, Lund University, 221 84 Lund, Sweden
| | - Lars Feuk
- Department of Immunology, Genetics and Pathology, Uppsala University, 751 08 Uppsala, Sweden
- Science for Life Laboratory, Uppsala University, 752 37 Uppsala, Sweden
| | - Anna Lindstrand
- Department of Molecular Medicine and Surgery, Karolinska Institutet, 171 76 Stockholm, Sweden
- Department of Clinical Genetics, Karolinska University Hospital, 171 76 Stockholm, Sweden
- Correspondence: ; Tel.: +46-70-543-6593
| |
Collapse
|
19
|
Meleshko D, Yang R, Marks P, Williams S, Hajirasouliha I. Efficient detection and assembly of non-reference DNA sequences with synthetic long reads. Nucleic Acids Res 2022; 50:e108. [PMID: 35924489 PMCID: PMC9561269 DOI: 10.1093/nar/gkac653] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Revised: 06/10/2022] [Accepted: 08/01/2022] [Indexed: 11/14/2022] Open
Abstract
Recent pan-genome studies have revealed an abundance of DNA sequences in human genomes that are not present in the reference genome. A lion's share of these non-reference sequences (NRSs) cannot be reliably assembled or placed on the reference genome. Improvements in long-read and synthetic long-read (aka linked-read) technologies have great potential for the characterization of NRSs. While synthetic long reads require less input DNA than long-read datasets, they are algorithmically more challenging to use. Except for computationally expensive whole-genome assembly methods, there is no synthetic long-read method for NRS detection. We propose a novel integrated alignment-based and local assembly-based algorithm, Novel-X, that uses the barcode information encoded in synthetic long reads to improve the detection of such events without a whole-genome de novo assembly. Our evaluations demonstrate that Novel-X finds many non-reference sequences that cannot be found by state-of-the-art short-read methods. We applied Novel-X to a diverse set of 68 samples from the Polaris HiSeq 4000 PGx cohort. Novel-X discovered 16 691 NRS insertions of size > 300 bp (total length 18.2 Mb). Many of them are population specific or may have a functional impact.
Collapse
Affiliation(s)
- Dmitry Meleshko
- Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medical College, NY 10021, USA.,Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, NY 10021, USA
| | - Rui Yang
- Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medical College, NY 10021, USA.,Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, NY 10021, USA
| | - Patrick Marks
- 10x Genomics Inc., Stoneridge Mall Road, Pleasanton, CA 94566, USA
| | - Stephen Williams
- 10x Genomics Inc., Stoneridge Mall Road, Pleasanton, CA 94566, USA
| | - Iman Hajirasouliha
- Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, NY 10021, USA.,Englander Institute for Precision Medicine, The Meyer Cancer Center, Weill Cornell Medicine, NY 10021, USA
| |
Collapse
|
20
|
Lian Q, Chen Y, Chang F, Fu Y, Qi J. inGAP-family: Accurate Detection of Meiotic Recombination Loci and Causal Mutations by Filtering Out Artificial Variants due to Genome Complexities. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022; 20:524-535. [PMID: 33711466 PMCID: PMC9801030 DOI: 10.1016/j.gpb.2019.11.014] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Revised: 09/04/2019] [Accepted: 11/08/2019] [Indexed: 01/26/2023]
Abstract
Accurately identifying DNA polymorphisms can bridge the gap between phenotypes and genotypes and is essential for molecular marker assisted genetic studies. Genome complexities, including large-scale structural variations, bring great challenges to bioinformatic analysis for obtaining high-confidence genomic variants, as sequence differences between non-allelic loci of two or more genomes can be misinterpreted as polymorphisms. It is important to correctly filter out artificial variants to avoid false genotyping or estimation of allele frequencies. Here, we present an efficient and effective framework, inGAP-family, to discover, filter, and visualize DNA polymorphisms and structural variants (SVs) from alignment of short reads. Applying this method to polymorphism detection on real datasets shows that elimination of artificial variants greatly facilitates the precise identification of meiotic recombination points as well as causal mutations in mutant genomes or quantitative trait loci. In addition, inGAP-family provides a user-friendly graphical interface for detecting polymorphisms and SVs, further evaluating predicted variants and identifying mutations related to genotypes. It is accessible at https://sourceforge.net/projects/ingap-family/.
Collapse
|
21
|
Gordeeva V, Sharova E, Arapidi G. Progress in Methods for Copy Number Variation Profiling. Int J Mol Sci 2022; 23:ijms23042143. [PMID: 35216262 PMCID: PMC8879278 DOI: 10.3390/ijms23042143] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 02/09/2022] [Accepted: 02/11/2022] [Indexed: 02/04/2023] Open
Abstract
Copy number variations (CNVs) are the predominant class of structural genomic variations involved in the processes of evolutionary adaptation, genomic disorders, and disease progression. Compared with single-nucleotide variants, there have been challenges associated with the detection of CNVs owing to their diverse sizes. However, the field has seen significant progress in the past 20–30 years. This has been made possible due to the rapid development of molecular diagnostic methods which ensure a more detailed view of the genome structure, further complemented by recent advances in computational methods. Here, we review the major approaches that have been used to routinely detect CNVs, ranging from cytogenetics to the latest sequencing technologies, and then cover their specific features.
Collapse
Affiliation(s)
- Veronika Gordeeva
- Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russia
- Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russia; (E.S.); (G.A.)
- Moscow Institute of Physics and Technology, National Research University, Moscow Oblast, 141701 Moscow, Russia
- Correspondence:
| | - Elena Sharova
- Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russia; (E.S.); (G.A.)
| | - Georgij Arapidi
- Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russia; (E.S.); (G.A.)
- Moscow Institute of Physics and Technology, National Research University, Moscow Oblast, 141701 Moscow, Russia
- Shemyakin–Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 117997 Moscow, Russia
| |
Collapse
|
22
|
Methods to Improve Molecular Diagnosis in Genomic Cold Cases in Pediatric Neurology. Genes (Basel) 2022; 13:genes13020333. [PMID: 35205378 PMCID: PMC8871714 DOI: 10.3390/genes13020333] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 02/06/2022] [Accepted: 02/07/2022] [Indexed: 02/04/2023] Open
Abstract
During the last decade, genetic testing has emerged as an important etiological diagnostic tool for Mendelian diseases, including pediatric neurological conditions. A genetic diagnosis has a considerable impact on disease management and treatment; however, many cases remain undiagnosed after applying standard diagnostic sequencing techniques. This review discusses various methods to improve the molecular diagnostic rates in these genomic cold cases. We discuss extended analysis methods to consider, non-Mendelian inheritance models, mosaicism, dual/multiple diagnoses, periodic re-analysis, artificial intelligence tools, and deep phenotyping, in addition to integrating various omics methods to improve variant prioritization. Last, novel genomic technologies, including long-read sequencing, artificial long-read sequencing, and optical genome mapping are discussed. In conclusion, a more comprehensive molecular analysis and a timely re-analysis of unsolved cases are imperative to improve diagnostic rates. In addition, our current understanding of the human genome is still limited due to restrictions in technologies. Novel technologies are now available that improve upon some of these limitations and can capture all human genomic variation more accurately. Last, we recommend a more routine implementation of high molecular weight DNA extraction methods that is coherent with the ability to use and/or optimally benefit from these novel genomic methods.
Collapse
|
23
|
Trost B, Loureiro LO, Scherer SW. Discovery of genomic variation across a generation. Hum Mol Genet 2021; 30:R174-R186. [PMID: 34296264 PMCID: PMC8490016 DOI: 10.1093/hmg/ddab209] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 07/09/2021] [Accepted: 07/19/2021] [Indexed: 11/12/2022] Open
Abstract
Over the past 30 years (the timespan of a generation), advances in genomics technologies have revealed tremendous and unexpected variation in the human genome and have provided increasingly accurate answers to long-standing questions of how much genetic variation exists in human populations and to what degree the DNA complement changes between parents and offspring. Tracking the characteristics of these inherited and spontaneous (or de novo) variations has been the basis of the study of human genetic disease. From genome-wide microarray and next-generation sequencing scans, we now know that each human genome contains over 3 million single nucleotide variants when compared with the ~ 3 billion base pairs in the human reference genome, along with roughly an order of magnitude more DNA—approximately 30 megabase pairs (Mb)—being ‘structurally variable’, mostly in the form of indels and copy number changes. Additional large-scale variations include balanced inversions (average of 18 Mb) and complex, difficult-to-resolve alterations. Collectively, ~1% of an individual’s genome will differ from the human reference sequence. When comparing across a generation, fewer than 100 new genetic variants are typically detected in the euchromatic portion of a child’s genome. Driven by increasingly higher-resolution and higher-throughput sequencing technologies, newer and more accurate databases of genetic variation (for instance, more comprehensive structural variation data and phasing of combinations of variants along chromosomes) of worldwide populations will emerge to underpin the next era of discovery in human molecular genetics.
Collapse
Affiliation(s)
- Brett Trost
- The Centre for Applied Genomics and Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Livia O Loureiro
- The Centre for Applied Genomics and Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Stephen W Scherer
- The Centre for Applied Genomics and Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada.,McLaughlin Centre and Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada
| |
Collapse
|
24
|
Extreme Y chromosome polymorphism corresponds to five male reproductive morphs of a freshwater fish. Nat Ecol Evol 2021; 5:939-948. [PMID: 33958755 DOI: 10.1038/s41559-021-01452-w] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Accepted: 03/23/2021] [Indexed: 02/02/2023]
Abstract
Loss of recombination between sex chromosomes often depletes Y chromosomes of functional content and genetic variation, which might limit their potential to generate adaptive diversity. Males of the freshwater fish Poecilia parae occur as one of five discrete morphs, all of which shoal together in natural populations where morph frequency has been stable for over 50 years. Each morph uses a different complex reproductive strategy and morphs differ dramatically in colour, body size and mating behaviour. Morph phenotype is passed perfectly from father to son, indicating there are five Y haplotypes segregating in the species, which encode the complex male morph characteristics. Here, we examine Y diversity in natural populations of P. parae. Using linked-read sequencing on multiple P. parae females and males of all five morphs, we find that the genetic architecture of the male morphs evolved on the Y chromosome after recombination suppression had occurred with the X. Comparing Y chromosomes between each of the morphs, we show that, although the Ys of the three minor morphs that differ in colour are highly similar, there are substantial amounts of unique genetic material and divergence between the Ys of the three major morphs that differ in reproductive strategy, body size and mating behaviour. Altogether, our results suggest that the Y chromosome is able to overcome the constraints of recombination loss to generate extreme diversity, resulting in five discrete Y chromosomes that control complex reproductive strategies.
Collapse
|
25
|
Seaby EG, Ennis S. Challenges in the diagnosis and discovery of rare genetic disorders using contemporary sequencing technologies. Brief Funct Genomics 2021; 19:243-258. [PMID: 32393978 DOI: 10.1093/bfgp/elaa009] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Next generation sequencing (NGS) has revolutionised rare disease diagnostics. Concomitant with advancing technologies has been a rise in the number of new gene disorders discovered and diagnoses made for patients and their families. However, despite the trend towards whole exome and whole genome sequencing, diagnostic rates remain suboptimal. On average, only ~30% of patients receive a molecular diagnosis. National sequencing projects launched in the last 5 years are integrating clinical diagnostic testing with research avenues to widen the spectrum of known genetic disorders. Consequently, efforts to diagnose genetic disorders in a clinical setting are now often shared with efforts to prioritise candidate variants for the detection of new disease genes. Herein we discuss some of the biggest obstacles precluding molecular diagnosis and discovery of new gene disorders. We consider bioinformatic and analytical challenges faced when interpreting next generation sequencing data and showcase some of the newest tools available to mitigate these issues. We consider how incomplete penetrance, non-coding variation and structural variants are likely to impact diagnostic rates, and we further discuss methods for uplifting novel gene discovery by adopting a gene-to-patient-based approach.
Collapse
|
26
|
Liu YH, Grubbs GL, Zhang L, Fang X, Dill DL, Sidow A, Zhou X. Aquila_stLFR: diploid genome assembly based structural variant calling package for stLFR linked-reads. BIOINFORMATICS ADVANCES 2021; 1:vbab007. [PMID: 36700103 PMCID: PMC9710574 DOI: 10.1093/bioadv/vbab007] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 06/07/2021] [Accepted: 06/14/2021] [Indexed: 01/28/2023]
Abstract
Motivation Identifying structural variants (SVs) is critical in health and disease, however, detecting them remains a challenge. Several linked-read sequencing technologies, including 10X Genomics, TELL-Seq and single tube long fragment read (stLFR), have been recently developed as cost-effective approaches to reconstruct multi-megabase haplotypes (phase blocks) from sequence data of a single sample. These technologies provide an optimal sequencing platform to characterize SVs, though few computational algorithms can utilize them. Thus, we developed Aquila_stLFR, an approach that resolves SVs through haplotype-based assembly of stLFR linked-reads. Results Aquila_stLFR first partitions long fragment reads into two haplotype-specific blocks with the assistance of the high-quality reference genome, by taking advantage of the potential phasing ability of the linked-read itself. Each haplotype is then assembled independently, to achieve a complete diploid assembly to finally reconstruct the genome-wide SVs. We benchmarked Aquila_stLFR on a well-studied sample, NA24385, and showed Aquila_stLFR can detect medium to large size deletions (50 bp-10 kb) with high sensitivity and medium-size insertions (50 bp-1 kb) with high specificity. Availability and implementation Source code and documentation are available on https://github.com/maiziex/Aquila_stLFR. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Yichen Henry Liu
- Department of Computer Science, Vanderbilt University, Nashville, TN 37235, USA
| | - Griffin L Grubbs
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN 37235, USA
| | - Lu Zhang
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong
| | | | - David L Dill
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Arend Sidow
- Department of Pathology, Stanford University, Stanford, CA 94305, USA
| | - Xin Zhou
- Department of Computer Science, Vanderbilt University, Nashville, TN 37235, USA,Department of Biomedical Engineering, Vanderbilt University, Nashville, TN 37235, USA,To whom correspondence should be addressed.
| |
Collapse
|
27
|
Schwarz JM, Lüpken R, Seelow D, Kehr B. Novel sequencing technologies and bioinformatic tools for deciphering the non-coding genome. MED GENET-BERLIN 2021; 33:133-145. [PMID: 38836034 PMCID: PMC11006320 DOI: 10.1515/medgen-2021-2072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Accepted: 06/24/2021] [Indexed: 06/06/2024]
Abstract
High-throughput sequencing techniques have significantly increased the molecular diagnosis rate for patients with monogenic disorders. This is primarily due to a substantially increased identification rate of disease mutations in the coding sequence, primarily SNVs and indels. Further progress is hampered by difficulties in the detection of structural variants and the interpretation of variants outside the coding sequence. In this review, we provide an overview about how novel sequencing techniques and state-of-the-art algorithms can be used to discover small and structural variants across the whole genome and introduce bioinformatic tools for the prediction of effects variants may have in the non-coding part of the genome.
Collapse
Affiliation(s)
- Jana Marie Schwarz
- Department of Neuropediatrics, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- NeuroCure Cluster of Excellence, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Richard Lüpken
- BIH-Junior Research Group Genome Informatics, Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Dominik Seelow
- BIH-Bioinformatics and Translational Genetics, Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Birte Kehr
- BIH-Junior Research Group Genome Informatics, Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
- Algorithmic Bioinformatics, Regensburg Center for Interventional Immunology (RCI), Franz-Josef-Strauß-Allee 11, 93053 Regensburg, Germany
- University Regensburg, Regensburg, Germany
| |
Collapse
|
28
|
Thomas C, Soschinski P, Zwaig M, Oikonomopoulos S, Okonechnikov K, Pajtler KW, Sill M, Schweizer L, Koch A, Neumann J, Schüller U, Sahm F, Rauschenbach L, Keyvani K, Proescholdt M, Riemenschneider MJ, Segewiß J, Ruckert C, Grauer O, Monoranu CM, Lamszus K, Patrizi A, Kordes U, Siebert R, Kool M, Ragoussis J, Foulkes WD, Paulus W, Rivera B, Hasselblatt M. The genetic landscape of choroid plexus tumors in children and adults. Neuro Oncol 2021; 23:650-660. [PMID: 33249490 DOI: 10.1093/neuonc/noaa267] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Choroid plexus tumors (CPTs) are intraventricular brain tumors predominantly arising in children but also affecting adults. In most cases, driver mutations have not been identified, although there are reports of frequent chromosome-wide copy-number alterations and TP53 mutations, especially in choroid plexus carcinomas (CPCs). METHODS DNA methylation profiling and RNA-sequencing was performed in a series of 47 CPTs. Samples comprised 35 choroid plexus papillomas (CPPs), 6 atypical choroid plexus papillomas (aCPPs) and 6 CPCs plus three recurrences thereof. Targeted TP53 and TERT promotor sequencing was performed in all samples. Whole exome sequencing (WES) and linked-read whole genome sequencing (WGS) was performed in 25 and 4 samples, respectively. RESULTS Tumors comprised the molecular subgroups "pediatric A" (N=11), "pediatric B" (N=12) and "adult" (N=27). Copy-number alterations mainly represented whole-chromosomal alterations with subgroup-specific enrichments (gains of Chr1, 2 and 21q in "pediatric B" and gains of Chr5 and 9 and loss of Chr21q in "adult"). RNA sequencing yielded a novel CCDC47-PRKCA fusion transcript in one adult choroid plexus papilloma patient with aggressive clinical course; an underlying Chr17 inversion was demonstrated by linked-read WGS. WES and targeted sequencing showed TP53 mutations in 7/47 CPTs (15%), five of which were children. On the contrary, TERT promoter mutations were encountered in 7/28 adult patients (25%) and associated with shorter progression-free survival (log-rank test, p=0.015). CONCLUSION Pediatric CPTs lack recurrent driver alterations except for TP53, whereas CPTs in adults show TERT promoter mutations or a novel CCDC47-PRKCA gene fusion, being associated with a more unfavorable clinical course.
Collapse
Affiliation(s)
- Christian Thomas
- Institute of Neuropathology, University Hospital Münster, Münster, Germany
| | - Patrick Soschinski
- Institute of Neuropathology, University Hospital Münster, Münster, Germany
| | - Melissa Zwaig
- McGill University Genome Centre, Department of Human Genetics, McGill University, Montreal, Canada
| | - Spyridon Oikonomopoulos
- McGill University Genome Centre, Department of Human Genetics, McGill University, Montreal, Canada
| | - Konstantin Okonechnikov
- Hopp Children's Cancer Center (KiTZ), Heidelberg, Germany.,Division of Pediatric Neurooncology, German Cancer Research Center (DKFZ), and German Cancer Consortium (DKTK), Heidelberg, Germany
| | - Kristian W Pajtler
- Hopp Children's Cancer Center (KiTZ), Heidelberg, Germany.,Division of Pediatric Neurooncology, German Cancer Research Center (DKFZ), and German Cancer Consortium (DKTK), Heidelberg, Germany.,Department of Pediatric Oncology, Hematology and Immunology, University Hospital, Heidelberg, Germany
| | - Martin Sill
- Hopp Children's Cancer Center (KiTZ), Heidelberg, Germany
| | - Leonille Schweizer
- Department of Neuropathology, Charité - Universitätsmedizin Berlin, Germany.,German Cancer Consortium (DKTK), Heidelberg, Germany, Partner Site Charité Berlin, Berlin, Germany.,Berlin Institute of Health (BIH), Berlin, Germany
| | - Arend Koch
- Department of Neuropathology, Charité - Universitätsmedizin Berlin, Germany.,German Cancer Consortium (DKTK), Heidelberg, Germany, Partner Site Charité Berlin, Berlin, Germany.,Berlin Institute of Health (BIH), Berlin, Germany
| | - Julia Neumann
- Department of Neuropathology, University Hospital Hamburg-Eppendorf, Hamburg, Germany
| | - Ulrich Schüller
- Department of Neuropathology, University Hospital Hamburg-Eppendorf, Hamburg, Germany.,Department of Pediatric Hematology and Oncology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany.,Research Institute Children's Cancer Center Hamburg, Hamburg, Germany
| | - Felix Sahm
- Department of Neuropathology, Institute of Pathology, University Hospital Heidelberg, Heidelberg, Germany.,Clinical Cooperation Unit Neuropathology, German Consortium for Translational Cancer Research (DKTK), German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Laurèl Rauschenbach
- Department of Neurosurgery and Spine Surgery, University Hospital Essen, University Duisburg-Essen, Essen, Germany.,DKFZ Division Translational Neurooncology, DKTK partner site, University Hospital Essen, University Duisburg-Essen, Essen, Germany
| | - Kathy Keyvani
- Institute of Neuropathology, University of Duisburg-Essen, Essen, Germany
| | - Martin Proescholdt
- Division of Pediatric Neurooncology, German Cancer Research Center (DKFZ), and German Cancer Consortium (DKTK), Heidelberg, Germany.,Department of Neurosurgery, Regensburg University Hospital, Regensburg, Germany
| | | | - Jochen Segewiß
- Institute of Human Genetics, University Hospital Münster, Münster, Germany
| | - Christian Ruckert
- Institute of Human Genetics, University Hospital Münster, Münster, Germany
| | - Oliver Grauer
- Department of Neurology with Institute of Translational Neurology, University Hospital Münster, Münster, Germany
| | | | - Katrin Lamszus
- Department of Neurosurgery, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Annarita Patrizi
- Schaller Research Group Leader at the German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Uwe Kordes
- Department of Pediatric Hematology and Oncology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Reiner Siebert
- Institute of Human Genetics, Ulm University and Ulm University Medical Center, Ulm, Germany
| | - Marcel Kool
- Hopp Children's Cancer Center (KiTZ), Heidelberg, Germany.,Division of Pediatric Neurooncology, German Cancer Research Center (DKFZ), and German Cancer Consortium (DKTK), Heidelberg, Germany.,Princess Máxima Center for Pediatric Oncology, Utrecht, the Netherlands
| | - Jiannis Ragoussis
- McGill University Genome Centre, Department of Human Genetics, McGill University, Montreal, Canada
| | - William D Foulkes
- Department of Human Genetics, McGill University, Montreal, QC, Canada
| | - Werner Paulus
- Institute of Neuropathology, University Hospital Münster, Münster, Germany
| | - Barbara Rivera
- Program in Molecular Mechanisms and Experimental Therapy in Oncology (Oncobell), IDIBELL, Hospitalet de Llobregat, Barcelona, Spain.,Gerald Bronfman Department of Oncology, McGill University, Montreal, QC, Canada
| | - Martin Hasselblatt
- Institute of Neuropathology, University Hospital Münster, Münster, Germany
| |
Collapse
|
29
|
Generalovic TN, McCarthy SA, Warren IA, Wood JMD, Torrance J, Sims Y, Quail M, Howe K, Pipan M, Durbin R, Jiggins CD. A high-quality, chromosome-level genome assembly of the Black Soldier Fly (Hermetia illucens L.). G3 (BETHESDA, MD.) 2021; 11:jkab085. [PMID: 33734373 PMCID: PMC8104945 DOI: 10.1093/g3journal/jkab085] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Accepted: 03/09/2021] [Indexed: 01/15/2023]
Abstract
Hermetia illucens L. (Diptera: Stratiomyidae), the Black Soldier Fly (BSF) is an increasingly important species for bioconversion of organic material into animal feed. We generated a high-quality chromosome-scale genome assembly of the BSF using Pacific Bioscience, 10X Genomics linked read and high-throughput chromosome conformation capture sequencing technology. Scaffolding the final assembly with Hi-C data produced a highly contiguous 1.01 Gb genome with 99.75% of scaffolds assembled into pseudochromosomes representing seven chromosomes with 16.01 Mb contig and 180.46 Mb scaffold N50 values. The highly complete genome obtained a Benchmarking Universal Single-Copy Orthologs (BUSCO) completeness of 98.6%. We masked 67.32% of the genome as repetitive sequences and annotated a total of 16,478 protein-coding genes using the BRAKER2 pipeline. We analyzed an established lab population to investigate the genomic variation and architecture of the BSF revealing six autosomes and an X chromosome. Additionally, we estimated the inbreeding coefficient (1.9%) of the lab population by assessing runs of homozygosity. This provided evidence for inbreeding events including long runs of homozygosity on chromosome 5. The release of this novel chromosome-scale BSF genome assembly will provide an improved resource for further genomic studies, functional characterization of genes of interest and genetic modification of this economically important species.
Collapse
Affiliation(s)
| | - Shane A McCarthy
- Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK
| | - Ian A Warren
- Department of Zoology, University of Cambridge, Cambridge CB2 3EJ, UK
| | - Jonathan M D Wood
- Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - James Torrance
- Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Ying Sims
- Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Michael Quail
- Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Kerstin Howe
- Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Miha Pipan
- Better Origin, Entomics Biosystems Limited, Cambridge CB3 0ES, UK
| | - Richard Durbin
- Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK
| | - Chris D Jiggins
- Department of Zoology, University of Cambridge, Cambridge CB2 3EJ, UK
| |
Collapse
|
30
|
Guo L, Xu M, Wang W, Gu S, Zhao X, Chen F, Wang O, Xu X, Seim I, Fan G, Deng L, Liu X. SLR-superscaffolder: a de novo scaffolding tool for synthetic long reads using a top-to-bottom scheme. BMC Bioinformatics 2021; 22:158. [PMID: 33765921 PMCID: PMC7993450 DOI: 10.1186/s12859-021-04081-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Accepted: 03/16/2021] [Indexed: 12/30/2022] Open
Abstract
Background Synthetic long reads (SLR) with long-range co-barcoding information are now widely applied in genomics research. Although several tools have been developed for each specific SLR technique, a robust standalone scaffolder with high efficiency is warranted for hybrid genome assembly. Results In this work, we developed a standalone scaffolding tool, SLR-superscaffolder, to link together contigs in draft assemblies using co-barcoding and paired-end read information. Our top-to-bottom scheme first builds a global scaffold graph based on Jaccard Similarity to determine the order and orientation of contigs, and then locally improves the scaffolds with the aid of paired-end information. We also exploited a screening algorithm to reduce the negative effect of misassembled contigs in the input assembly. We applied SLR-superscaffolder to a human single tube long fragment read sequencing dataset and increased the scaffold NG50 of its corresponding draft assembly 1349 fold. Moreover, benchmarking on different input contigs showed that this approach overall outperformed existing SLR scaffolders, providing longer contiguity and fewer misassemblies, especially for short contigs assembled by next-generation sequencing data. The open-source code of SLR-superscaffolder is available at https://github.com/BGI-Qingdao/SLR-superscaffolder. Conclusions SLR-superscaffolder can dramatically improve the contiguity of a draft assembly by integrating a hybrid assembly strategy.
Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04081-z.
Collapse
Affiliation(s)
- Lidong Guo
- BGI Education Center, University of Chinese Academy of Sciences, Shenzhen, 518083, China.,BGI-Qingdao, BGI-Shenzhen, Qingdao, 266555, China.,State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen, 518083, China
| | - Mengyang Xu
- BGI-Qingdao, BGI-Shenzhen, Qingdao, 266555, China.,State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen, 518083, China.,BGI-Shenzhen, Shenzhen, 518083, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, 518120, China
| | - Wenchao Wang
- BGI-Qingdao, BGI-Shenzhen, Qingdao, 266555, China
| | - Shengqiang Gu
- BGI Education Center, University of Chinese Academy of Sciences, Shenzhen, 518083, China
| | - Xia Zhao
- MGI, BGI-Shenzhen, Shenzhen, 518083, China
| | - Fang Chen
- MGI, BGI-Shenzhen, Shenzhen, 518083, China
| | - Ou Wang
- BGI-Shenzhen, Shenzhen, 518083, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, 518120, China
| | - Xun Xu
- BGI-Shenzhen, Shenzhen, 518083, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, 518120, China
| | - Inge Seim
- Integrative Biology Laboratory, College of Life Sciences, Nanjing Normal University, Nanjing, 210046, China.,School of Biology and Environmental Science, Queensland University of Technology, Brisbane, 4000, Australia
| | - Guangyi Fan
- BGI-Qingdao, BGI-Shenzhen, Qingdao, 266555, China.,State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen, 518083, China.,BGI-Shenzhen, Shenzhen, 518083, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, 518120, China
| | - Li Deng
- BGI-Qingdao, BGI-Shenzhen, Qingdao, 266555, China. .,State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen, 518083, China. .,BGI-Shenzhen, Shenzhen, 518083, China. .,China National GeneBank, BGI-Shenzhen, Shenzhen, 518120, China.
| | - Xin Liu
- BGI-Qingdao, BGI-Shenzhen, Qingdao, 266555, China. .,State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen, 518083, China. .,BGI-Shenzhen, Shenzhen, 518083, China. .,China National GeneBank, BGI-Shenzhen, Shenzhen, 518120, China.
| |
Collapse
|
31
|
Guo J, Shi C, Chen X, Wang O, Liu P, Yang H, Xu X, Zhang W, Zhu H. stLFRsv: A Germline Structural Variant Analysis Pipeline Using Co-barcoded Reads. Front Genet 2021; 12:636239. [PMID: 33815469 PMCID: PMC8012683 DOI: 10.3389/fgene.2021.636239] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Accepted: 02/04/2021] [Indexed: 11/13/2022] Open
Abstract
Co-barcoded reads originating from long DNA fragments (mean length >30 kbp) maintain both single base level accuracy and long-range genomic information. We propose a pipeline, stLFRsv, to detect structural variation using co-barcoded reads. stLFRsv identifies abnormal large gaps between co-barcoded reads to detect potential breakpoints and reconstruct complex structural variants (SVs). Haplotype phasing by co-barcoded reads increases the signal to noise ratio, and barcode sharing profiles are used to filter out false positives. We integrate the short read SV caller smoove for smaller variants with stLFRsv. The integrated pipeline was evaluated on the well-characterized genome HG002/NA24385, and 74.5% precision and a 22.4% recall rate were obtained for deletions. stLFRsv revealed some large variants not included in the benchmark set that were verified by long reads or assembly. For the HG001/NA12878 genome, stLFRsv also achieved the best performance for both resource usage and the detection of large variants. Our work indicates that co-barcoded read technology has the potential to improve genome completeness.
Collapse
Affiliation(s)
- Junfu Guo
- BGI-Tianjin, BGI-Shenzhen, Tianjin, China
| | - Chang Shi
- BGI-Tianjin, BGI-Shenzhen, Tianjin, China
| | - Xi Chen
- BGI-Tianjin, BGI-Shenzhen, Tianjin, China
| | - Ou Wang
- BGI-Shenzhen, Shenzhen, China
| | - Ping Liu
- MGI, BGI-Shenzhen, Shenzhen, China
| | - Huanming Yang
- Guangdong Provincial Academician Workstation of BGI Synthetic Genomics, BGI-Shenzhen, Shenzhen, China
| | - Xun Xu
- Guangdong Provincial Key Laboratory of Genome Read and Write, BGI-Shenzhen, Shenzhen, China
| | | | | |
Collapse
|
32
|
Kumar A, Adhikari S, Kankainen M, Heckman CA. Comparison of Structural and Short Variants Detected by Linked-Read and Whole-Exome Sequencing in Multiple Myeloma. Cancers (Basel) 2021; 13:1212. [PMID: 33802025 PMCID: PMC7999337 DOI: 10.3390/cancers13061212] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2020] [Revised: 03/07/2021] [Accepted: 03/08/2021] [Indexed: 02/07/2023] Open
Abstract
Linked-read sequencing was developed to aid the detection of large structural variants (SVs) from short-read sequencing efforts. We performed a systematic evaluation to determine if linked-read exome sequencing provides more comprehensive and clinically relevant information than whole-exome sequencing (WES) when applied to the same set of multiple myeloma patient samples. We report that linked-read sequencing detected a higher number of SVs (n = 18,455) than WES (n = 4065). However, linked-read predictions were dominated by inversions (92.4%), leading to poor detection of other types of SVs. In contrast, WES detected 56.3% deletions, 32.6% insertions, 6.7% translocations, 3.3% duplications and 1.2% inversions. Surprisingly, the quantitative performance assessment suggested a higher performance for WES (AUC = 0.791) compared to linked-read sequencing (AUC = 0.766) for detecting clinically validated cytogenetic alterations. We also found that linked-read sequencing detected more short variants (n = 704) compared to WES (n = 109). WES detected somatic mutations in all MM-related genes while linked-read sequencing failed to detect certain mutations. The comparison of somatic mutations detected using linked-read, WES and RNA-seq revealed that WES and RNA-seq detected more mutations than linked-read sequencing. These data indicate that WES outperforms and is more efficient than linked-read sequencing for detecting clinically relevant SVs and MM-specific short variants.
Collapse
Affiliation(s)
- Ashwini Kumar
- Institute for Molecular Medicine Finland-FIMM, HiLIFE-Helsinki Institute of Life Science, iCAN Digital Cancer Medicine Flagship, University of Helsinki, Tukholmankatu 8, 00290 Helsinki, Finland; (A.K.); (S.A.)
- iCAN Digital Precision Cancer Medicine, University of Helsinki, 00014 Helsinki, Finland;
| | - Sadiksha Adhikari
- Institute for Molecular Medicine Finland-FIMM, HiLIFE-Helsinki Institute of Life Science, iCAN Digital Cancer Medicine Flagship, University of Helsinki, Tukholmankatu 8, 00290 Helsinki, Finland; (A.K.); (S.A.)
- iCAN Digital Precision Cancer Medicine, University of Helsinki, 00014 Helsinki, Finland;
| | - Matti Kankainen
- iCAN Digital Precision Cancer Medicine, University of Helsinki, 00014 Helsinki, Finland;
- Medical and Clinical Genetics, University of Helsinki, Helsinki University Hospital, 00029 Helsinki, Finland
- Translational Immunology Research Program and Department of Clinical Chemistry, University of Helsinki, 00290 Helsinki, Finland
- Hematology Research Unit Helsinki, Department of Hematology, Helsinki University Hospital Comprehensive Cancer Center, 00290 Helsinki, Finland
| | - Caroline A. Heckman
- Institute for Molecular Medicine Finland-FIMM, HiLIFE-Helsinki Institute of Life Science, iCAN Digital Cancer Medicine Flagship, University of Helsinki, Tukholmankatu 8, 00290 Helsinki, Finland; (A.K.); (S.A.)
- iCAN Digital Precision Cancer Medicine, University of Helsinki, 00014 Helsinki, Finland;
| |
Collapse
|
33
|
Zhou X, Zhang L, Weng Z, Dill DL, Sidow A. Aquila enables reference-assisted diploid personal genome assembly and comprehensive variant detection based on linked reads. Nat Commun 2021; 12:1077. [PMID: 33597536 PMCID: PMC7889865 DOI: 10.1038/s41467-021-21395-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Accepted: 01/20/2021] [Indexed: 01/19/2023] Open
Abstract
We introduce Aquila, a new approach to variant discovery in personal genomes, which is critical for uncovering the genetic contributions to health and disease. Aquila uses a reference sequence and linked-read data to generate a high quality diploid genome assembly, from which it then comprehensively detects and phases personal genetic variation. The contigs of the assemblies from our libraries cover >95% of the human reference genome, with over 98% of that in a diploid state. Thus, the assemblies support detection and accurate genotyping of the most prevalent types of human genetic variation, including single nucleotide polymorphisms (SNPs), small insertions and deletions (small indels), and structural variants (SVs), in all but the most difficult regions. All heterozygous variants are phased in blocks that can approach arm-level length. The final output of Aquila is a diploid and phased personal genome sequence, and a phased Variant Call Format (VCF) file that also contains homozygous and a few unphased heterozygous variants. Aquila represents a cost-effective approach that can be applied to cohorts for variation discovery or association studies, or to single individuals with rare phenotypes that could be caused by SVs or compound heterozygosity.
Collapse
Affiliation(s)
- Xin Zhou
- Department of Computer Science, Stanford University, Stanford, CA, USA.
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN, USA.
| | - Lu Zhang
- Department of Pathology, Stanford University, Stanford, CA, USA
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong
| | - Ziming Weng
- Department of Pathology, Stanford University, Stanford, CA, USA
| | - David L Dill
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Arend Sidow
- Department of Pathology, Stanford University, Stanford, CA, USA.
- Department of Genetics, Stanford University, Stanford, CA, USA.
| |
Collapse
|
34
|
Cao C, He J, Mak L, Perera D, Kwok D, Wang J, Li M, Mourier T, Gavriliuc S, Greenberg M, Morrissy AS, Sycuro LK, Yang G, Jeffares DC, Long Q. Reconstruction of Microbial Haplotypes by Integration of Statistical and Physical Linkage in Scaffolding. Mol Biol Evol 2021; 38:2660-2672. [PMID: 33547786 PMCID: PMC8136496 DOI: 10.1093/molbev/msab037] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
DNA sequencing technologies provide unprecedented opportunities to analyze within-host evolution of microorganism populations. Often, within-host populations are analyzed via pooled sequencing of the population, which contains multiple individuals or "haplotypes." However, current next-generation sequencing instruments, in conjunction with single-molecule barcoded linked-reads, cannot distinguish long haplotypes directly. Computational reconstruction of haplotypes from pooled sequencing has been attempted in virology, bacterial genomics, metagenomics, and human genetics, using algorithms based on either cross-host genetic sharing or within-host genomic reads. Here, we describe PoolHapX, a flexible computational approach that integrates information from both genetic sharing and genomic sequencing. We demonstrated that PoolHapX outperforms state-of-the-art tools tailored to specific organismal systems, and is robust to within-host evolution. Importantly, together with barcoded linked-reads, PoolHapX can infer whole-chromosome-scale haplotypes from 50 pools each containing 12 different haplotypes. By analyzing real data, we uncovered dynamic variations in the evolutionary processes of within-patient HIV populations previously unobserved in single position-based analysis.
Collapse
Affiliation(s)
- Chen Cao
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada
| | - Jingni He
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada,Department of Cardiology, Xiangya Hospital, Central South University, Changsha, China
| | - Lauren Mak
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada,Present address: Tri-Institutional Computational Biology & Medicine Program, Weill Cornell Medicine of Cornell University, New York, NY, USA
| | - Deshan Perera
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada
| | - Devin Kwok
- Department of Mathematics & Statistics, University of Calgary, Calgary, AB, Canada
| | - Jia Wang
- Electrical and Computer Engineering, Illinois Institute of Technology, Chicago, IL, USA
| | - Minghao Li
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada
| | - Tobias Mourier
- Pathogen Genomics Laboratory, Biological and Environmental Sciences and Engineering (BESE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Stefan Gavriliuc
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada
| | - Matthew Greenberg
- Department of Mathematics & Statistics, University of Calgary, Calgary, AB, Canada
| | - A Sorana Morrissy
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada
| | - Laura K Sycuro
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada,Department of Microbiology, Immunology, and Infectious Diseases, Snyder Institute for Chronic Diseases, University of Calgary, Calgary, AB, Canada
| | - Guang Yang
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada,Department of Medical Genetics, University of Calgary, Calgary, AB, Canada
| | - Daniel C Jeffares
- Department of Biology, York Biomedical Research Institute, University of York, York, United Kingdom
| | - Quan Long
- Department of Biochemistry & Molecular Biology, Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada,Department of Mathematics & Statistics, University of Calgary, Calgary, AB, Canada,Department of Medical Genetics, University of Calgary, Calgary, AB, Canada,Hotchkiss Brain Institute, O’Brien Institute for Public Health, University of Calgary, Calgary, AB, Canada,Corresponding author: E-mail:
| |
Collapse
|
35
|
Linked-Read Whole Genome Sequencing Solves a Double DMD Gene Rearrangement. Genes (Basel) 2021; 12:genes12020133. [PMID: 33494189 PMCID: PMC7909759 DOI: 10.3390/genes12020133] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2020] [Revised: 01/11/2021] [Accepted: 01/18/2021] [Indexed: 01/25/2023] Open
Abstract
Next generation sequencing (NGS) has changed our approach to diagnosis of genetic disorders. Nowadays, the most comprehensive application of NGS is whole genome sequencing (WGS) that is able to detect virtually all DNA variations. However, even after accurate WGS, many genetic conditions remain unsolved. This may be due to the current NGS protocols, based on DNA fragmentation and short reads. To overcome these limitations, we applied a linked-read sequencing technology that combines single-molecule barcoding with short-read WGS. We were able to assemble haplotypes and distinguish between alleles along the genome. As an exemplary case, we studied the case of a female carrier of X-linked muscular dystrophy with an unsolved genetic status. A deletion of exons 16–29 in DMD gene was responsible for the disease in her family, but she showed a normal dosage of these exons by Multiplex Ligation-dependent Probe Amplification (MLPA) and array CGH. This situation is usually considered compatible with a “non-carrier” status. Unexpectedly, the girl also showed an increased dosage of flanking exons 1–15 and 30–34. Using linked-read WGS, we were able to distinguish between the two X chromosomes. In the first allele, we found the 16–29 deletion, while the second allele showed a 1–34 duplication: in both cases, linked-read WGS correctly mapped the borders at single-nucleotide resolution. This duplication in trans apparently restored the normal dosage of exons 16–29 seen by quantitative assays. This had a dramatic impact in genetic counselling, by converting a non-carrier into a double carrier status prediction. We conclude that linked-read WGS should be considered as a valuable option to improve our understanding of unsolved genetic conditions.
Collapse
|
36
|
Della Coletta R, Qiu Y, Ou S, Hufford MB, Hirsch CN. How the pan-genome is changing crop genomics and improvement. Genome Biol 2021; 22:3. [PMID: 33397434 PMCID: PMC7780660 DOI: 10.1186/s13059-020-02224-8] [Citation(s) in RCA: 123] [Impact Index Per Article: 30.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2020] [Accepted: 12/07/2020] [Indexed: 01/13/2023] Open
Abstract
Crop genomics has seen dramatic advances in recent years due to improvements in sequencing technology, assembly methods, and computational resources. These advances have led to the development of new tools to facilitate crop improvement. The study of structural variation within species and the characterization of the pan-genome has revealed extensive genome content variation among individuals within a species that is paradigm shifting to crop genomics and improvement. Here, we review advances in crop genomics and how utilization of these tools is shifting in light of pan-genomes that are becoming available for many crop species.
Collapse
Affiliation(s)
- Rafael Della Coletta
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108 USA
| | - Yinjie Qiu
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108 USA
| | - Shujun Ou
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011 USA
| | - Matthew B. Hufford
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011 USA
| | - Candice N. Hirsch
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108 USA
| |
Collapse
|
37
|
Della Coletta R, Qiu Y, Ou S, Hufford MB, Hirsch CN. How the pan-genome is changing crop genomics and improvement. Genome Biol 2021. [PMID: 33397434 DOI: 10.1186/s13059-020-02224-2228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2023] Open
Abstract
Crop genomics has seen dramatic advances in recent years due to improvements in sequencing technology, assembly methods, and computational resources. These advances have led to the development of new tools to facilitate crop improvement. The study of structural variation within species and the characterization of the pan-genome has revealed extensive genome content variation among individuals within a species that is paradigm shifting to crop genomics and improvement. Here, we review advances in crop genomics and how utilization of these tools is shifting in light of pan-genomes that are becoming available for many crop species.
Collapse
Affiliation(s)
- Rafael Della Coletta
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA
| | - Yinjie Qiu
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA
| | - Shujun Ou
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, 50011, USA
| | - Matthew B Hufford
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, 50011, USA.
| | - Candice N Hirsch
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA.
| |
Collapse
|
38
|
Integrative analysis of structural variations using short-reads and linked-reads yields highly specific and sensitive predictions. PLoS Comput Biol 2020; 16:e1008397. [PMID: 33226985 PMCID: PMC7721175 DOI: 10.1371/journal.pcbi.1008397] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Revised: 12/07/2020] [Accepted: 09/24/2020] [Indexed: 11/19/2022] Open
Abstract
Genetic diseases are driven by aberrations of the human genome. Identification of such aberrations including structural variations (SVs) is key to our understanding. Conventional short-reads whole genome sequencing (cWGS) can identify SVs to base-pair resolution, but utilizes only short-range information and suffers from high false discovery rate (FDR). Linked-reads sequencing (10XWGS) utilizes long-range information by linkage of short-reads originating from the same large DNA molecule. This can mitigate alignment-based artefacts especially in repetitive regions and should enable better prediction of SVs. However, an unbiased evaluation of this technology is not available. In this study, we performed a comprehensive analysis of different types and sizes of SVs predicted by both the technologies and validated with an independent PCR based approach. The SVs commonly identified by both the technologies were highly specific, while validation rate dropped for uncommon events. A particularly high FDR was observed for SVs only found by 10XWGS. To improve FDR and sensitivity, statistical models for both the technologies were trained. Using our approach, we characterized SVs from the MCF7 cell line and a primary breast cancer tumor with high precision. This approach improves SV prediction and can therefore help in understanding the underlying genetics in various diseases. Cancer and many other diseases are often driven by structural rearrangements in the patients. Their precise identification is necessary to understand evolution and cure for the disease. In this study, we have compared two sequencing technologies for the identification of structural variations i.e. Illumina’s short-reads and 10X Genomics linked-reads sequencing. Short-reads sequencing is already known to have high false discovery rate for structural variations, while, an unbiased performance evaluation of linked-reads sequencing is missing. Hence, we evaluate the performance of these two technologies using computational and PCR based methodologies. Moreover, we also present a statistical approach to increase their performance, supporting better detection of structural variations and thus further research into disease biology.
Collapse
|
39
|
A customized scaffolds approach for the detection and phasing of complex variants by next-generation sequencing. Sci Rep 2020; 10:15060. [PMID: 32929119 PMCID: PMC7490669 DOI: 10.1038/s41598-020-71471-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2020] [Accepted: 08/13/2020] [Indexed: 02/06/2023] Open
Abstract
Next-generation sequencing (NGS) is widely used in genetic testing for the highly sensitive detection of single nucleotide changes and small insertions or deletions. However, detection and phasing of structural variants, especially in repetitive or homologous regions, can be problematic due to uneven read coverage or genome reference bias, resulting in false calls. To circumvent this challenge, a computational approach utilizing customized scaffolds as supplementary reference sequences for read alignment was developed, and its effectiveness demonstrated with two CBS gene variants: NM_000071.2:c.833T>C and NM_000071.2:c.[833T>C; 844_845ins68]. Variant c.833T>C is a known causative mutation for homocystinuria, but is not pathogenic when in cis with the insertion, c.844_845ins68, because of alternative splicing. Using simulated reads, the custom scaffolds method resolved all possible combinations with 100% accuracy and, based on > 60,000 clinical specimens, exceeded the performance of current approaches that only align reads to GRCh37/hg19 for the detection of c.833T>C alone or in cis with c.844_845ins68. Furthermore, analysis of two 1000 Genomes Project trios revealed that the c.[833T>C; 844_845ins68] complex variant had previously been undetected in these datasets, likely due to the alignment method used. This approach can be configured for existing workflows to detect other challenging and potentially underrepresented variants, thereby augmenting accurate variant calling in clinical NGS testing.
Collapse
|
40
|
Aganezov S, Goodwin S, Sherman RM, Sedlazeck FJ, Arun G, Bhatia S, Lee I, Kirsche M, Wappel R, Kramer M, Kostroff K, Spector DL, Timp W, McCombie WR, Schatz MC. Comprehensive analysis of structural variants in breast cancer genomes using single-molecule sequencing. Genome Res 2020; 30:1258-1273. [PMID: 32887686 PMCID: PMC7545150 DOI: 10.1101/gr.260497.119] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2019] [Accepted: 08/07/2020] [Indexed: 12/14/2022]
Abstract
Improved identification of structural variants (SVs) in cancer can lead to more targeted and effective treatment options as well as advance our basic understanding of the disease and its progression. We performed whole-genome sequencing of the SKBR3 breast cancer cell line and patient-derived tumor and normal organoids from two breast cancer patients using Illumina/10x Genomics, Pacific Biosciences (PacBio), and Oxford Nanopore Technologies (ONT) sequencing. We then inferred SVs and large-scale allele-specific copy number variants (CNVs) using an ensemble of methods. Our findings show that long-read sequencing allows for substantially more accurate and sensitive SV detection, with between 90% and 95% of variants supported by each long-read technology also supported by the other. We also report high accuracy for long reads even at relatively low coverage (25×–30×). Furthermore, we integrated SV and CNV data into a unifying karyotype-graph structure to present a more accurate representation of the mutated cancer genomes. We find hundreds of variants within known cancer-related genes detectable only through long-read sequencing. These findings highlight the need for long-read sequencing of cancer genomes for the precise analysis of their genetic instability.
Collapse
Affiliation(s)
- Sergey Aganezov
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21211, USA
| | - Sara Goodwin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Rachel M Sherman
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21211, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Gayatri Arun
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Sonam Bhatia
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Isac Lee
- Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21211, USA
| | - Melanie Kirsche
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21211, USA
| | - Robert Wappel
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Melissa Kramer
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | | | - David L Spector
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Winston Timp
- Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21211, USA
| | | | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21211, USA.,Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA.,Department of Biology, Johns Hopkins University, Baltimore, Maryland 21211, USA
| |
Collapse
|
41
|
Aganezov S, Raphael BJ. Reconstruction of clone- and haplotype-specific cancer genome karyotypes from bulk tumor samples. Genome Res 2020; 30:1274-1290. [PMID: 32887685 PMCID: PMC7545144 DOI: 10.1101/gr.256701.119] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Accepted: 08/07/2020] [Indexed: 12/25/2022]
Abstract
Many cancer genomes are extensively rearranged with aberrant chromosomal karyotypes. Deriving these karyotypes from high-throughput DNA sequencing of bulk tumor samples is complicated because most tumors are a heterogeneous mixture of normal cells and subpopulations of cancer cells, or clones, that harbor distinct somatic mutations. We introduce a new algorithm, Reconstructing Cancer Karyotypes (RCK), to reconstruct haplotype-specific karyotypes of one or more rearranged cancer genomes from DNA sequencing data from a bulk tumor sample. RCK leverages evolutionary constraints on the somatic mutational process in cancer to reduce ambiguity in the deconvolution of admixed sequencing data into multiple haplotype-specific cancer karyotypes. RCK models mixtures containing an arbitrary number of derived genomes and allows the incorporation of information both from short-read and long-read DNA sequencing technologies. We compare RCK to existing approaches on 17 primary and metastatic prostate cancer samples. We find that RCK infers cancer karyotypes that better explain the DNA sequencing data and conform to a reasonable evolutionary model. RCK's reconstructions of clone- and haplotype-specific karyotypes will aid further studies of the role of intra-tumor heterogeneity in cancer development and response to treatment. RCK is freely available as open source software.
Collapse
Affiliation(s)
- Sergey Aganezov
- Department of Computer Science, Princeton University, Princeton, New Jersey 08540, USA
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, New Jersey 08540, USA
| |
Collapse
|
42
|
Abstract
Identifying structural variation (SV) is essential for genome interpretation but has been historically difficult due to limitations inherent to available genome technologies. Detection methods that use ensemble algorithms and emerging sequencing technologies have enabled the discovery of thousands of SVs, uncovering information about their ubiquity, relationship to disease and possible effects on biological mechanisms. Given the variability in SV type and size, along with unique detection biases of emerging genomic platforms, multiplatform discovery is necessary to resolve the full spectrum of variation. Here, we review modern approaches for investigating SVs and proffer that, moving forwards, studies integrating biological information with detection will be necessary to comprehensively understand the impact of SV in the human genome.
Collapse
Affiliation(s)
- Steve S Ho
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Alexander E Urban
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Ryan E Mills
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA.
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
43
|
Uguen K, Jubin C, Duffourd Y, Bardel C, Malan V, Dupont JM, El Khattabi L, Chatron N, Vitobello A, Rollat-Farnier PA, Baulard C, Lelorch M, Leduc A, Tisserant E, Tran Mau-Them F, Danjean V, Delepine M, Till M, Meyer V, Lyonnet S, Mosca-Boidron AL, Thevenon J, Faivre L, Thauvin-Robinet C, Schluth-Bolard C, Boland A, Olaso R, Callier P, Romana S, Deleuze JF, Sanlaville D. Genome sequencing in cytogenetics: Comparison of short-read and linked-read approaches for germline structural variant detection and characterization. Mol Genet Genomic Med 2020; 8:e1114. [PMID: 31985172 PMCID: PMC7057128 DOI: 10.1002/mgg3.1114] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Accepted: 12/20/2019] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Structural variants (SVs) include copy number variants (CNVs) and apparently balanced chromosomal rearrangements (ABCRs). Genome sequencing (GS) enables SV detection at base-pair resolution, but the use of short-read sequencing is limited by repetitive sequences, and long-read approaches are not yet validated for diagnosis. Recently, 10X Genomics proposed Chromium, a technology providing linked-reads to reconstruct long DNA fragments and which could represent a good alternative. No study has compared short-read to linked-read technologies to detect SVs in a constitutional diagnostic setting yet. The aim of this work was to determine whether the 10X Genomics technology enables better detection and comprehension of SVs than short-read WGS. METHODS We included 13 patients carrying various SVs. Whole genome analyses were performed using paired-end HiSeq X sequencing with (linked-read strategy) or without (short-read strategy) Chromium library preparation. Two different bioinformatic pipelines were used: Variants are called using BreakDancer for short-read strategy and LongRanger for long-read strategy. Variant interpretations were first blinded. RESULTS The short-read strategy allowed diagnosis of known SV in 10/13 patients. After unblinding, the linked-read strategy identified 10/13 SVs, including one (patient 7) missed by the short-read strategy. CONCLUSION In conclusion, regarding the results of this study, 10X Genomics solution did not improve the detection and characterization of SV.
Collapse
Affiliation(s)
- Kévin Uguen
- Service de Génétique Médicale, CHRU de Brest, Brest, France.,HCL, Service de Génétique, BRON Cedex, France
| | - Claire Jubin
- Centre National de Recherche en Génomique Humaine (CNRGH), CEA, Evry, France.,Labex GenMed, Evry, France
| | - Yannis Duffourd
- UMR1231 GAD, Inserm - Université Bourgogne-Franche Comté, Dijon, France
| | - Claire Bardel
- HCL, Cellule bioinformatique de la plateforme NGS du CHU Lyon, BRON Cedex, France.,Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive, UMR5558, Villeurbanne, France
| | - Valérie Malan
- Service de Cytogénétique, Hôpital Necker-Enfants Malades, APHP, Paris, France
| | - Jean-Michel Dupont
- Institut Cochin, INSERM U1016, Université Paris Descartes, Faculté de Médecine, APHP, HUPC, site Cochin, Laboratoire de Cytogénétique, Paris, France
| | - Laila El Khattabi
- Institut Cochin, INSERM U1016, Université Paris Descartes, Faculté de Médecine, APHP, HUPC, site Cochin, Laboratoire de Cytogénétique, Paris, France
| | | | - Antonio Vitobello
- UMR1231 GAD, Inserm - Université Bourgogne-Franche Comté, Dijon, France.,Unité Fonctionnelle d'Innovation en Diagnostic Génomique des Maladies Rares, FHU-TRANSLAD, CHU Dijon Bourgogne, Dijon, France
| | | | - Céline Baulard
- Centre National de Recherche en Génomique Humaine (CNRGH), CEA, Evry, France.,Labex GenMed, Evry, France
| | - Marc Lelorch
- Service de Cytogénétique, Hôpital Necker-Enfants Malades, APHP, Paris, France
| | - Aurélie Leduc
- Centre National de Recherche en Génomique Humaine (CNRGH), CEA, Evry, France.,Labex GenMed, Evry, France
| | - Emilie Tisserant
- UMR1231 GAD, Inserm - Université Bourgogne-Franche Comté, Dijon, France
| | - Frédéric Tran Mau-Them
- UMR1231 GAD, Inserm - Université Bourgogne-Franche Comté, Dijon, France.,Unité Fonctionnelle d'Innovation en Diagnostic Génomique des Maladies Rares, FHU-TRANSLAD, CHU Dijon Bourgogne, Dijon, France
| | - Vincent Danjean
- Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG, Grenoble, France
| | - Marc Delepine
- Centre National de Recherche en Génomique Humaine (CNRGH), CEA, Evry, France.,Labex GenMed, Evry, France
| | | | - Vincent Meyer
- Centre National de Recherche en Génomique Humaine (CNRGH), CEA, Evry, France.,Labex GenMed, Evry, France
| | - Stanislas Lyonnet
- Fédération de Génétique et Institut Imagine, UMR-1163, Université de Paris, Hôpital Necker-Enfants Malades, APHP Paris, France
| | - Anne-Laure Mosca-Boidron
- UMR1231 GAD, Inserm - Université Bourgogne-Franche Comté, Dijon, France.,Laboratoire de génétique chromosomique et moléculaire, FHU-TRANSLAD, CHU Dijon Bourgogne, Dijon, France
| | - Julien Thevenon
- Centre de génétique, Hôpital Couple-Enfant, CHU Grenoble Alpes, La Tronche, Grenoble, France
| | - Laurence Faivre
- UMR1231 GAD, Inserm - Université Bourgogne-Franche Comté, Dijon, France.,Centre de génétique, FHU-TRANSLAD, CHU Dijon Bourgogne, Dijon, France
| | - Christel Thauvin-Robinet
- UMR1231 GAD, Inserm - Université Bourgogne-Franche Comté, Dijon, France.,Centre de génétique, FHU-TRANSLAD, CHU Dijon Bourgogne, Dijon, France
| | | | - Anne Boland
- Centre National de Recherche en Génomique Humaine (CNRGH), CEA, Evry, France.,Labex GenMed, Evry, France
| | - Robert Olaso
- Centre National de Recherche en Génomique Humaine (CNRGH), CEA, Evry, France.,Labex GenMed, Evry, France
| | - Patrick Callier
- UMR1231 GAD, Inserm - Université Bourgogne-Franche Comté, Dijon, France.,Laboratoire de génétique chromosomique et moléculaire, FHU-TRANSLAD, CHU Dijon Bourgogne, Dijon, France
| | - Serge Romana
- Service de Cytogénétique, Hôpital Necker-Enfants Malades, APHP, Paris, France
| | - Jean-François Deleuze
- Centre National de Recherche en Génomique Humaine (CNRGH), CEA, Evry, France.,Labex GenMed, Evry, France
| | | |
Collapse
|
44
|
Aganezov S, Zban I, Aksenov V, Alexeev N, Schatz MC. Recovering rearranged cancer chromosomes from karyotype graphs. BMC Bioinformatics 2019; 20:641. [PMID: 31842730 PMCID: PMC6915857 DOI: 10.1186/s12859-019-3208-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND Many cancer genomes are extensively rearranged with highly aberrant chromosomal karyotypes. Structural and copy number variations in cancer genomes can be determined via abnormal mapping of sequenced reads to the reference genome. Recently it became possible to reconcile both of these types of large-scale variations into a karyotype graph representation of the rearranged cancer genomes. Such a representation, however, does not directly describe the linear and/or circular structure of the underlying rearranged cancer chromosomes, thus limiting possible analysis of cancer genomes somatic evolutionary process as well as functional genomic changes brought by the large-scale genome rearrangements. RESULTS Here we address the aforementioned limitation by introducing a novel methodological framework for recovering rearranged cancer chromosomes from karyotype graphs. For a cancer karyotype graph we formulate an Eulerian Decomposition Problem (EDP) of finding a collection of linear and/or circular rearranged cancer chromosomes that are determined by the graph. We derive and prove computational complexities for several variations of the EDP. We then demonstrate that Eulerian decomposition of the cancer karyotype graphs is not always unique and present the Consistent Contig Covering Problem (CCCP) of recovering unambiguous cancer contigs from the cancer karyotype graph, and describe a novel algorithm CCR capable of solving CCCP in polynomial time. We apply CCR on a prostate cancer dataset and demonstrate that it is capable of consistently recovering large cancer contigs even when underlying cancer genomes are highly rearranged. CONCLUSIONS CCR can recover rearranged cancer contigs from karyotype graphs thereby addressing existing limitation in inferring chromosomal structures of rearranged cancer genomes and advancing our understanding of both patient/cancer-specific as well as the overall genetic instability in cancer.
Collapse
Affiliation(s)
- Sergey Aganezov
- Department of Computer Science, Johns Hopkins University, 3400 N. Charles st., Baltimore, 21210 MD USA
| | - Ilya Zban
- Computer Technologies Laboratory “Computer technology”, ITMO University, Kronverkskiy Prospekt, 49, Saint Petersburg, 197101 Russia
| | - Vitaly Aksenov
- Computer Technologies Laboratory “Computer technology”, ITMO University, Kronverkskiy Prospekt, 49, Saint Petersburg, 197101 Russia
- IST Austria, Am Campus 1, Klosterneuburg, 3400 Austria
| | - Nikita Alexeev
- Computer Technologies Laboratory “Computer technology”, ITMO University, Kronverkskiy Prospekt, 49, Saint Petersburg, 197101 Russia
| | - Michael C. Schatz
- Department of Computer Science, Johns Hopkins University, 3400 N. Charles st., Baltimore, 21210 MD USA
- Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Harbor, 11724 NY USA
| |
Collapse
|
45
|
Zhang L, Zhou X, Weng Z, Sidow A. De novo diploid genome assembly for genome-wide structural variant detection. NAR Genom Bioinform 2019; 2:lqz018. [PMID: 33575568 PMCID: PMC7671403 DOI: 10.1093/nargab/lqz018] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Revised: 10/09/2019] [Accepted: 12/02/2019] [Indexed: 12/30/2022] Open
Abstract
Detection of structural variants (SVs) on the basis of read alignment to a reference genome remains a difficult problem. De novo assembly, traditionally used to generate reference genomes, offers an alternative for SV detection. However, it has not been applied broadly to human genomes because of fundamental limitations of short-fragment approaches and high cost of long-read technologies. We here show that 10× linked-read sequencing supports accurate SV detection. We examined variants in six de novo 10× assemblies with diverse experimental parameters from two commonly used human cell lines: NA12878 and NA24385. The assemblies are effective for detecting mid-size SVs, which were discovered by simple pairwise alignment of the assemblies’ contigs to the reference (hg38). Our study also shows that the base-pair level SV breakpoint accuracy is high, with a majority of SVs having precisely correct sizes and breakpoints. Setting the ancestral state of SV loci by comparing to ape orthologs allows inference of the actual molecular mechanism (insertion or deletion) causing the mutation. In about half of cases, the mechanism is the opposite of the reference-based call. We uncover 214 SVs that may have been maintained as polymorphisms in the human lineage since before our divergence from chimp. Overall, we show that de novo assembly of 10× linked-read data can achieve cost-effective SV detection for personal genomes.
Collapse
Affiliation(s)
- Lu Zhang
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong.,Department of Pathology, 300 Pasteur Dr, Stanford University, Stanford, CA 94305, USA.,Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Xin Zhou
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Ziming Weng
- Department of Pathology, 300 Pasteur Dr, Stanford University, Stanford, CA 94305, USA
| | - Arend Sidow
- Department of Pathology, 300 Pasteur Dr, Stanford University, Stanford, CA 94305, USA.,Department of Genetics, 300 Pasteur Dr, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
46
|
Fang L, Kao C, Gonzalez MV, Mafra FA, Pellegrino da Silva R, Li M, Wenzel SS, Wimmer K, Hakonarson H, Wang K. LinkedSV for detection of mosaic structural variants from linked-read exome and genome sequencing data. Nat Commun 2019; 10:5585. [PMID: 31811119 PMCID: PMC6898185 DOI: 10.1038/s41467-019-13397-7] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Accepted: 11/07/2019] [Indexed: 02/01/2023] Open
Abstract
Linked-read sequencing provides long-range information on short-read sequencing data by barcoding reads originating from the same DNA molecule, and can improve detection and breakpoint identification for structural variants (SVs). Here we present LinkedSV for SV detection on linked-read sequencing data. LinkedSV considers barcode overlapping and enriched fragment endpoints as signals to detect large SVs, while it leverages read depth, paired-end signals and local assembly to detect small SVs. Benchmarking studies demonstrate that LinkedSV outperforms existing tools, especially on exome data and on somatic SVs with low variant allele frequencies. We demonstrate clinical cases where LinkedSV identifies disease-causal SVs from linked-read exome sequencing data missed by conventional exome sequencing, and show examples where LinkedSV identifies SVs missed by high-coverage long-read sequencing. In summary, LinkedSV can detect SVs missed by conventional short-read and long-read sequencing approaches, and may resolve negative cases from clinical genome/exome sequencing studies.
Collapse
Affiliation(s)
- Li Fang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Charlly Kao
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Michael V Gonzalez
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Fernanda A Mafra
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | | | - Mingyao Li
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Sören-Sebastian Wenzel
- Institute of Human Genetics, Department for Genetics and Pharmacology, Medical University of Innsbruck, Innsbruck, Austria
| | - Katharina Wimmer
- Institute of Human Genetics, Department for Genetics and Pharmacology, Medical University of Innsbruck, Innsbruck, Austria
| | - Hakon Hakonarson
- Department of Pediatrics, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA. .,Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
47
|
Mahmoud M, Gobet N, Cruz-Dávalos DI, Mounier N, Dessimoz C, Sedlazeck FJ. Structural variant calling: the long and the short of it. Genome Biol 2019; 20:246. [PMID: 31747936 PMCID: PMC6868818 DOI: 10.1186/s13059-019-1828-7] [Citation(s) in RCA: 378] [Impact Index Per Article: 63.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2019] [Accepted: 09/19/2019] [Indexed: 02/08/2023] Open
Abstract
Recent research into structural variants (SVs) has established their importance to medicine and molecular biology, elucidating their role in various diseases, regulation of gene expression, ethnic diversity, and large-scale chromosome evolution-giving rise to the differences within populations and among species. Nevertheless, characterizing SVs and determining the optimal approach for a given experimental design remains a computational and scientific challenge. Multiple approaches have emerged to target various SV classes, zygosities, and size ranges. Here, we review these approaches with respect to their ability to infer SVs across the full spectrum of large, complex variations and present computational methods for each approach.
Collapse
Affiliation(s)
- Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, USA
| | - Nastassia Gobet
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Diana Ivette Cruz-Dávalos
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Ninon Mounier
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- University Center for Primary Care and Public Health, Lausanne, Switzerland
| | - Christophe Dessimoz
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution & Environment, University College London, London, UK.
- Department of Computer Science, University College London, London, UK.
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, USA.
| |
Collapse
|
48
|
Zhang L, Zhou X, Weng Z, Sidow A. Assessment of human diploid genome assembly with 10x Linked-Reads data. Gigascience 2019; 8:giz141. [PMID: 31769805 PMCID: PMC6879002 DOI: 10.1093/gigascience/giz141] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 08/07/2019] [Accepted: 11/07/2019] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND Producing cost-effective haplotype-resolved personal genomes remains challenging. 10x Linked-Read sequencing, with its high base quality and long-range information, has been demonstrated to facilitate de novo assembly of human genomes and variant detection. In this study, we investigate in depth how the parameter space of 10x library preparation and sequencing affects assembly quality, on the basis of both simulated and real libraries. RESULTS We prepared and sequenced eight 10x libraries with a diverse set of parameters from standard cell lines NA12878 and NA24385 and performed whole-genome assembly on the data. We also developed the simulator LRTK-SIM to follow the workflow of 10x data generation and produce realistic simulated Linked-Read data sets. We found that assembly quality could be improved by increasing the total sequencing coverage (C) and keeping physical coverage of DNA fragments (CF) or read coverage per fragment (CR) within broad ranges. The optimal physical coverage was between 332× and 823× and assembly quality worsened if it increased to >1,000× for a given C. Long DNA fragments could significantly extend phase blocks but decreased contig contiguity. The optimal length-weighted fragment length (W${\mu _{FL}}$) was ∼50-150 kb. When broadly optimal parameters were used for library preparation and sequencing, ∼80% of the genome was assembled in a diploid state. CONCLUSIONS The Linked-Read libraries we generated and the parameter space we identified provide theoretical considerations and practical guidelines for personal genome assemblies based on 10x Linked-Read sequencing.
Collapse
Affiliation(s)
- Lu Zhang
- Department of Computer Science, Hong Kong Baptist University
- Department of Pathology, 300 Pasteur Dr, Stanford University, Stanford, CA 94305 USA
- Department of Computer Science, Stanford University, Stanford, CA 94305 USA
| | - Xin Zhou
- Department of Computer Science, Stanford University, Stanford, CA 94305 USA
| | - Ziming Weng
- Department of Pathology, 300 Pasteur Dr, Stanford University, Stanford, CA 94305 USA
| | - Arend Sidow
- Department of Pathology, 300 Pasteur Dr, Stanford University, Stanford, CA 94305 USA
- Department of Genetics, 300 Pasteur Dr, Stanford University, Stanford, CA 94305 USA
| |
Collapse
|
49
|
De Coster W, Van Broeckhoven C. Newest Methods for Detecting Structural Variations. Trends Biotechnol 2019; 37:973-982. [DOI: 10.1016/j.tibtech.2019.02.003] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2018] [Revised: 02/08/2019] [Accepted: 02/11/2019] [Indexed: 01/28/2023]
|
50
|
Darby CA, Fitch JR, Brennan PJ, Kelly BJ, Bir N, Magrini V, Leonard J, Cottrell CE, Gastier-Foster JM, Wilson RK, Mardis ER, White P, Langmead B, Schatz MC. Samovar: Single-Sample Mosaic Single-Nucleotide Variant Calling with Linked Reads. iScience 2019; 18:1-10. [PMID: 31271967 PMCID: PMC6609817 DOI: 10.1016/j.isci.2019.05.037] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Revised: 05/06/2019] [Accepted: 05/24/2019] [Indexed: 12/25/2022] Open
Abstract
Linked-read sequencing enables greatly improves haplotype assembly over standard paired-end analysis. The detection of mosaic single-nucleotide variants benefits from haplotype assembly when the model is informed by the mapping between constituent reads and linked reads. Samovar evaluates haplotype-discordant reads identified through linked-read sequencing, thus enabling phasing and mosaic variant detection across the entire genome. Samovar trains a random forest model to score candidate sites using a dataset that considers read quality, phasing, and linked-read characteristics. Samovar calls mosaic single-nucleotide variants (SNVs) within a single sample with accuracy comparable with what previously required trios or matched tumor/normal pairs and outperforms single-sample mosaic variant callers at minor allele frequency 5%-50% with at least 30X coverage. Samovar finds somatic variants in both tumor and normal whole-genome sequencing from 13 pediatric cancer cases that can be corroborated with high recall with whole exome sequencing. Samovar is available open-source at https://github.com/cdarby/samovar under the MIT license.
Collapse
Affiliation(s)
- Charlotte A Darby
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - James R Fitch
- The Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA
| | - Patrick J Brennan
- The Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA
| | - Benjamin J Kelly
- The Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA
| | - Natalie Bir
- The Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA
| | - Vincent Magrini
- The Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA; Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, USA
| | - Jeffrey Leonard
- Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, USA; Department of Neurosurgery, Nationwide Children's Hospital, Columbus, OH, USA
| | - Catherine E Cottrell
- The Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA; Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, USA
| | - Julie M Gastier-Foster
- The Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA; Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, USA
| | - Richard K Wilson
- The Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA; Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, USA
| | - Elaine R Mardis
- The Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA; Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, USA
| | - Peter White
- The Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA; Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, USA
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA; Department of Biology, Johns Hopkins University, Baltimore, MD, USA; Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.
| |
Collapse
|