1
|
Figueroa KP, Gross C, Buena-Atienza E, Paul S, Gandelman M, Kakar N, Sturm M, Casadei N, Admard J, Park J, Zühlke C, Hellenbroich Y, Pozojevic J, Balachandran S, Händler K, Zittel S, Timmann D, Erdlenbruch F, Herrmann L, Feindt T, Zenker M, Klopstock T, Dufke C, Scoles DR, Koeppen A, Spielmann M, Riess O, Ossowski S, Haack TB, Pulst SM. A GGC-repeat expansion in ZFHX3 encoding polyglycine causes spinocerebellar ataxia type 4 and impairs autophagy. Nat Genet 2024:10.1038/s41588-024-01719-5. [PMID: 38684900 DOI: 10.1038/s41588-024-01719-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 03/18/2024] [Indexed: 05/02/2024]
Abstract
Despite linkage to chromosome 16q in 1996, the mutation causing spinocerebellar ataxia type 4 (SCA4), a late-onset sensory and cerebellar ataxia, remained unknown. Here, using long-read single-strand whole-genome sequencing (LR-GS), we identified a heterozygous GGC-repeat expansion in a large Utah pedigree encoding polyglycine (polyG) in zinc finger homeobox protein 3 (ZFHX3), also known as AT-binding transcription factor 1 (ATBF1). We queried 6,495 genome sequencing datasets and identified the repeat expansion in seven additional pedigrees. Ultrarare DNA variants near the repeat expansion indicate a common distant founder event in Sweden. Intranuclear ZFHX3-p62-ubiquitin aggregates were abundant in SCA4 basis pontis neurons. In fibroblasts and induced pluripotent stem cells, the GGC expansion led to increased ZFHX3 protein levels and abnormal autophagy, which were normalized with small interfering RNA-mediated ZFHX3 knockdown in both cell types. Improving autophagy points to a therapeutic avenue for this novel polyG disease. The coding GGC-repeat expansion in an extremely G+C-rich region was not detectable by short-read whole-exome sequencing, which demonstrates the power of LR-GS for variant discovery.
Collapse
Affiliation(s)
- Karla P Figueroa
- Department of Neurology, University of Utah, Salt Lake City, UT, USA
| | - Caspar Gross
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
- NGS Competence Center Tübingen, Tübingen, Germany
| | - Elena Buena-Atienza
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
- NGS Competence Center Tübingen, Tübingen, Germany
| | - Sharan Paul
- Department of Neurology, University of Utah, Salt Lake City, UT, USA
| | - Mandi Gandelman
- Department of Neurology, University of Utah, Salt Lake City, UT, USA
| | - Naseebullah Kakar
- Institute of Human Genetics, University Medical Center Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany
- Department of Biotechnology, FLS&I, BUITEMS, Quetta, Pakistan
| | - Marc Sturm
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
| | - Nicolas Casadei
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
- NGS Competence Center Tübingen, Tübingen, Germany
| | - Jakob Admard
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
- NGS Competence Center Tübingen, Tübingen, Germany
| | - Joohyun Park
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
| | - Christine Zühlke
- Institute of Human Genetics, University Medical Center Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany
| | - Yorck Hellenbroich
- Institute of Human Genetics, University Medical Center Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany
| | - Jelena Pozojevic
- Institute of Human Genetics, University Medical Center Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany
| | - Saranya Balachandran
- Institute of Human Genetics, University Medical Center Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany
| | - Kristian Händler
- Institute of Human Genetics, University Medical Center Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany
| | - Simone Zittel
- Department of Neurology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Dagmar Timmann
- Department of Neurology and Center for Translational Neuro- and Behavioral Sciences (C-TNBS), Essen University Hospital, University of Duisburg-Essen, Essen, Germany
| | - Friedrich Erdlenbruch
- Department of Neurology and Center for Translational Neuro- and Behavioral Sciences (C-TNBS), Essen University Hospital, University of Duisburg-Essen, Essen, Germany
| | - Laura Herrmann
- Department of Neurology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | | | - Martin Zenker
- Institute of Human Genetics, University Hospital Magdeburg and Medical Faculty, Otto-von-Guericke University, Magdeburg, Germany
| | - Thomas Klopstock
- Department of Neurology with Friedrich-Baur-Institute, University Hospital of Ludwig-Maximilians-Universität München, Munich, Germany
- German Center for Neurodegenerative Diseases (DZNE), Munich, Germany
- Munich Cluster for Systems Neurology (SyNergy), Munich, Germany
| | - Claudia Dufke
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
| | - Daniel R Scoles
- Department of Neurology, University of Utah, Salt Lake City, UT, USA
| | | | - Malte Spielmann
- Institute of Human Genetics, University Medical Center Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany
- DZHK (German Centre for Cardiovascular Research), partner site Hamburg, Lübeck, Kiel, Lübeck, Germany
| | - Olaf Riess
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany.
- NGS Competence Center Tübingen, Tübingen, Germany.
| | - Stephan Ossowski
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
- NGS Competence Center Tübingen, Tübingen, Germany
- Institute for Bioinformatics and Medical Informatics (IBMI), University of Tübingen, Tübingen, Germany
| | - Tobias B Haack
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
- NGS Competence Center Tübingen, Tübingen, Germany
| | - Stefan M Pulst
- Department of Neurology, University of Utah, Salt Lake City, UT, USA.
- Clinical Neurosciences Center, University of Utah Hospitals and Clinics, Salt Lake City, UT, USA.
| |
Collapse
|
2
|
Bell CG. Epigenomic insights into common human disease pathology. Cell Mol Life Sci 2024; 81:178. [PMID: 38602535 PMCID: PMC11008083 DOI: 10.1007/s00018-024-05206-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/11/2024] [Accepted: 03/13/2024] [Indexed: 04/12/2024]
Abstract
The epigenome-the chemical modifications and chromatin-related packaging of the genome-enables the same genetic template to be activated or repressed in different cellular settings. This multi-layered mechanism facilitates cell-type specific function by setting the local sequence and 3D interactive activity level. Gene transcription is further modulated through the interplay with transcription factors and co-regulators. The human body requires this epigenomic apparatus to be precisely installed throughout development and then adequately maintained during the lifespan. The causal role of the epigenome in human pathology, beyond imprinting disorders and specific tumour suppressor genes, was further brought into the spotlight by large-scale sequencing projects identifying that mutations in epigenomic machinery genes could be critical drivers in both cancer and developmental disorders. Abrogation of this cellular mechanism is providing new molecular insights into pathogenesis. However, deciphering the full breadth and implications of these epigenomic changes remains challenging. Knowledge is accruing regarding disease mechanisms and clinical biomarkers, through pathogenically relevant and surrogate tissue analyses, respectively. Advances include consortia generated cell-type specific reference epigenomes, high-throughput DNA methylome association studies, as well as insights into ageing-related diseases from biological 'clocks' constructed by machine learning algorithms. Also, 3rd-generation sequencing is beginning to disentangle the complexity of genetic and DNA modification haplotypes. Cell-free DNA methylation as a cancer biomarker has clear clinical utility and further potential to assess organ damage across many disorders. Finally, molecular understanding of disease aetiology brings with it the opportunity for exact therapeutic alteration of the epigenome through CRISPR-activation or inhibition.
Collapse
Affiliation(s)
- Christopher G Bell
- William Harvey Research Institute, Barts & The London Faculty of Medicine, Queen Mary University of London, Charterhouse Square, London, EC1M 6BQ, UK.
| |
Collapse
|
3
|
Nie F, Ni P, Huang N, Zhang J, Wang Z, Xiao C, Luo F, Wang J. De novo diploid genome assembly using long noisy reads. Nat Commun 2024; 15:2964. [PMID: 38580638 PMCID: PMC10997618 DOI: 10.1038/s41467-024-47349-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Accepted: 03/25/2024] [Indexed: 04/07/2024] Open
Abstract
The high sequencing error rate has impeded the application of long noisy reads for diploid genome assembly. Most existing assemblers failed to generate high-quality phased assemblies using long noisy reads. Here, we present PECAT, a Phased Error Correction and Assembly Tool, for reconstructing diploid genomes from long noisy reads. We design a haplotype-aware error correction method that can retain heterozygote alleles while correcting sequencing errors. We combine a corrected read SNP caller and a raw read SNP caller to further improve the identification of inconsistent overlaps in the string graph. We use a grouping method to assign reads to different haplotype groups. PECAT efficiently assembles diploid genomes using Nanopore R9, PacBio CLR or Nanopore R10 reads only. PECAT generates more contiguous haplotype-specific contigs compared to other assemblers. Especially, PECAT achieves nearly haplotype-resolved assembly on B. taurus (Bison×Simmental) using Nanopore R9 reads and phase block NG50 with 59.4/58.0 Mb for HG002 using Nanopore R10 reads.
Collapse
Affiliation(s)
- Fan Nie
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Xiangjiang Laboratory, Changsha, 410205, China
- National Center for Applied Mathematics in Hunan and Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, Hunan, 411105, China
| | - Peng Ni
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Xiangjiang Laboratory, Changsha, 410205, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Neng Huang
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Xiangjiang Laboratory, Changsha, 410205, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Jun Zhang
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Xiangjiang Laboratory, Changsha, 410205, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Zhenyu Wang
- Institute of Nanfan & Seed Industry, Guangdong Academy of Sciences, Guangdong, 510316, China
| | - Chuanle Xiao
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University #7 Jinsui Road, Tianhe District, Guangzhou, China.
| | - Feng Luo
- School of Computing, Clemson University, Clemson, SC, 29634-0974, USA.
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China.
- Xiangjiang Laboratory, Changsha, 410205, China.
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China.
| |
Collapse
|
4
|
Beaulaurier J, Ly L, Duty JA, Tyer C, Stevens C, Hung CT, Sookdeo A, Drong AW, Kowdle S, Turner DJ, Juul S, Hickey S, Lee B. De novo antibody discovery in human blood from full-length single B cell transcriptomics and matching haplotyped-resolved germline assemblies. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.26.586834. [PMID: 38585716 PMCID: PMC10996687 DOI: 10.1101/2024.03.26.586834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Immunoglobulin (IGH, IGK, IGL) loci in the human genome are highly polymorphic regions that encode the building blocks of the light and heavy chain IG proteins that dimerize to form antibodies. The processes of V(D)J recombination and somatic hypermutation in B cells are responsible for creating an enormous reservoir of highly specific antibodies capable of binding a vast array of possible antigens. However, the antibody repertoire is fundamentally limited by the set of variable (V), diversity (D), and joining (J) alleles present in the germline IG loci. To better understand how the germline IG haplotypes contribute to the expressed antibody repertoire, we combined genome sequencing of the germline IG loci with single-cell transcriptome sequencing of B cells from the same donor. Sequencing and assembly of the germline IG loci captured the IGH locus in a single fully-phased contig where the maternal and paternal contributions to the germline V, D, and J repertoire can be fully resolved. The B cells were collected following a measles, mumps, and rubella (MMR) vaccination, resulting in a population of cells that were activated in response to this specific immune challenge. Single-cell, full-length transcriptome sequencing of these B cells resulted in whole transcriptome characterization of each cell, as well as highly-accurate consensus sequences for the somatically rearranged and hypermutated light and heavy chain IG transcripts. A subset of antibodies synthesized based on their consensus heavy and light chain transcript sequences demonstrated binding to measles antigens and neutralization of measles live virus.
Collapse
|
5
|
Keskus A, Bryant A, Ahmad T, Yoo B, Aganezov S, Goretsky A, Donmez A, Lansdon LA, Rodriguez I, Park J, Liu Y, Cui X, Gardner J, McNulty B, Sacco S, Shetty J, Zhao Y, Tran B, Narzisi G, Helland A, Cook DE, Chang PC, Kolesnikov A, Carroll A, Molloy EK, Pushel I, Guest E, Pastinen T, Shafin K, Miga KH, Malikic S, Day CP, Robine N, Sahinalp C, Dean M, Farooqi MS, Paten B, Kolmogorov M. Severus: accurate detection and characterization of somatic structural variation in tumor genomes using long reads. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.22.24304756. [PMID: 38585974 PMCID: PMC10996739 DOI: 10.1101/2024.03.22.24304756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Most current studies rely on short-read sequencing to detect somatic structural variation (SV) in cancer genomes. Long-read sequencing offers the advantage of better mappability and long-range phasing, which results in substantial improvements in germline SV detection. However, current long-read SV detection methods do not generalize well to the analysis of somatic SVs in tumor genomes with complex rearrangements, heterogeneity, and aneuploidy. Here, we present Severus: a method for the accurate detection of different types of somatic SVs using a phased breakpoint graph approach. To benchmark various short- and long-read SV detection methods, we sequenced five tumor/normal cell line pairs with Illumina, Nanopore, and PacBio sequencing platforms; on this benchmark Severus showed the highest F1 scores (harmonic mean of the precision and recall) as compared to long-read and short-read methods. We then applied Severus to three clinical cases of pediatric cancer, demonstrating concordance with known genetic findings as well as revealing clinically relevant cryptic rearrangements missed by standard genomic panels.
Collapse
Affiliation(s)
- Ayse Keskus
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Asher Bryant
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Tanveer Ahmad
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Byunggil Yoo
- Children’s Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | | | - Anton Goretsky
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Ataberk Donmez
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Lisa A. Lansdon
- Children’s Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Isabel Rodriguez
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Rockville, MD, USA
| | - Jimin Park
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Yuelin Liu
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Xiwen Cui
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | | | | | - Samuel Sacco
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Jyoti Shetty
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Yongmei Zhao
- Sequencing Facility Bioinformatics Group, Biomedical Informatics and Data Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Bao Tran
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | | | | | | | | | | | | | - Erin K. Molloy
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Irina Pushel
- Children’s Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Erin Guest
- Children’s Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Tomi Pastinen
- Children’s Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Kishwar Shafin
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Rockville, MD, USA
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Salem Malikic
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Chi-Ping Day
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | | | - Cenk Sahinalp
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Michael Dean
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Rockville, MD, USA
| | - Midhat S. Farooqi
- Children’s Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | | | - Mikhail Kolmogorov
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| |
Collapse
|
6
|
Liu YH, Luo C, Golding SG, Ioffe JB, Zhou XM. Tradeoffs in alignment and assembly-based methods for structural variant detection with long-read sequencing data. Nat Commun 2024; 15:2447. [PMID: 38503752 PMCID: PMC10951360 DOI: 10.1038/s41467-024-46614-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 03/04/2024] [Indexed: 03/21/2024] Open
Abstract
Long-read sequencing offers long contiguous DNA fragments, facilitating diploid genome assembly and structural variant (SV) detection. Efficient and robust algorithms for SV identification are crucial with increasing data availability. Alignment-based methods, favored for their computational efficiency and lower coverage requirements, are prominent. Alternative approaches, relying solely on available reads for de novo genome assembly and employing assembly-based tools for SV detection via comparison to a reference genome, demand significantly more computational resources. However, the lack of comprehensive benchmarking constrains our comprehension and hampers further algorithm development. Here we systematically compare 14 read alignment-based SV calling methods (including 4 deep learning-based methods and 1 hybrid method), and 4 assembly-based SV calling methods, alongside 4 upstream aligners and 7 assemblers. Assembly-based tools excel in detecting large SVs, especially insertions, and exhibit robustness to evaluation parameter changes and coverage fluctuations. Conversely, alignment-based tools demonstrate superior genotyping accuracy at low sequencing coverage (5-10×) and excel in detecting complex SVs, like translocations, inversions, and duplications. Our evaluation provides performance insights, highlighting the absence of a universally superior tool. We furnish guidelines across 31 criteria combinations, aiding users in selecting the most suitable tools for diverse scenarios and offering directions for further method development.
Collapse
Affiliation(s)
- Yichen Henry Liu
- Department of Computer Science, Vanderbilt University, 37235, Nashville, TN, USA
| | - Can Luo
- Department of Biomedical Engineering, Vanderbilt University, 37235, Nashville, TN, USA
| | - Staunton G Golding
- Department of Biomedical Engineering, Vanderbilt University, 37235, Nashville, TN, USA
| | - Jacob B Ioffe
- Department of Computer Science, Vanderbilt University, 37235, Nashville, TN, USA
| | - Xin Maizie Zhou
- Department of Computer Science, Vanderbilt University, 37235, Nashville, TN, USA.
- Department of Biomedical Engineering, Vanderbilt University, 37235, Nashville, TN, USA.
- Data Science Institute, Vanderbilt University, 37235, Nashville, TN, USA.
| |
Collapse
|
7
|
Paulin LF, Fan J, O'Neill K, Pleasance E, Porter VL, Jones SJM, Sedlazeck FJ. The benefit of a complete reference genome for cancer structural variant analysis. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.15.24304369. [PMID: 38562786 PMCID: PMC10984048 DOI: 10.1101/2024.03.15.24304369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
The complexities of cancer genomes are becoming more easily interpreted due to advancements in sequencing technologies and improved bioinformatic analysis. Structural variants (SVs) represent an important subset of somatic events in tumors. While detection of SVs has been markedly improved by the development of long-read sequencing, somatic variant identification and annotation remains challenging. We hypothesized that use of a completed human reference genome (CHM13-T2T) would improve somatic SV calling. Our findings in a tumour/normal matched benchmark sample and two patient samples show that the CHM13-T2T improves SV detection and prioritization accuracy compared to GRCh38, with a notable reduction in false positive calls. We also overcame the lack of annotation resources for CHM13-T2T by lifting over CHM13-T2T-aligned reads to the GRCh38 genome, therefore combining both improved alignment and advanced annotations. In this process, we assessed the current SV benchmark set for COLO829/COLO829BL across four replicates sequenced at different centers with different long-read technologies. We discovered instability of this cell line across these replicates; 346 SVs (1.13%) were only discoverable in a single replicate. We identify 49 somatic SVs, which appear to be stable as they are consistently present across the four replicates. As such, we propose this consensus set as an updated benchmark for somatic SV calling and include both GRCh38 and CHM13-T2T coordinates in our benchmark. The benchmark is available at: 10.5281/zenodo.10819636 Our work demonstrates new approaches to optimize somatic SV prioritization in cancer with potential improvements in other genetic diseases.
Collapse
Affiliation(s)
- Luis F Paulin
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
| | - Jeremy Fan
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada
| | - Kieran O'Neill
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada
| | - Erin Pleasance
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada
| | - Vanessa L Porter
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada
| | - Steven J M Jones
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| |
Collapse
|
8
|
Genner R, Akeson S, Meredith M, Jerez PA, Malik L, Baker B, Miano-Burkhardt A, Paten B, Billingsley KJ, Blauwendraat C, Jain M. Assessing methylation detection for primary human tissue using Nanopore sequencing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.29.581569. [PMID: 38464144 PMCID: PMC10925257 DOI: 10.1101/2024.02.29.581569] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
DNA methylation most commonly occurs as 5-methylcytosine (5-mC) in the human genome and has been associated with human diseases. Recent developments in single-molecule sequencing technologies (Oxford Nanopore Technologies (ONT) and Pacific Biosciences) have enabled readouts of long, native DNA molecules, including cytosine methylation. ONT recently upgraded their Nanopore sequencing chemistry and kits from R9 to the R10 version, which yielded increased accuracy and sequencing throughput. However the effects on methylation detection have not yet been documented. Here we performed a series of computational analyses to characterize differences in Nanopore-based 5mC detection between the ONT R9 and R10 chemistries. We compared 5mC calls in R9 and R10 for three human genome datasets: a cell line, a frontal cortex brain sample, and a blood sample. We performed an in-depth analysis on CpG islands and homopolymer regions, and documented high concordance for methylation detection among sequencing technologies. The strongest correlation was observed between Nanopore R10 and Illumina bisulfite technologies for cell line-derived datasets. Subtle differences in methylation datasets between technologies can impact analysis tools such as differential methylation calling software. Our findings show that comparisons can be drawn between methylation data from different Nanopore chemistries using guided hypotheses. This work will facilitate comparison among Nanopore data cohorts derived using different chemistries from large scale sequencing efforts, such as the NIH CARD Long Read Initiative.
Collapse
Affiliation(s)
- Rylee Genner
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Stuart Akeson
- Department of Bioengineering, Northeastern University, Boston, MA, USA
| | - Melissa Meredith
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Pilar Alvarez Jerez
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Laksh Malik
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Breeana Baker
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Benedict Paten
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Kimberley J Billingsley
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
| | - Cornelis Blauwendraat
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
| | - Miten Jain
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Department of Bioengineering, Northeastern University, Boston, MA, USA
- Department of Physics, Northeastern University, Boston, MA, USA
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| |
Collapse
|
9
|
Schreiber M, Jayakodi M, Stein N, Mascher M. Plant pangenomes for crop improvement, biodiversity and evolution. Nat Rev Genet 2024:10.1038/s41576-024-00691-4. [PMID: 38378816 DOI: 10.1038/s41576-024-00691-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/14/2023] [Indexed: 02/22/2024]
Abstract
Plant genome sequences catalogue genes and the genetic elements that regulate their expression. Such inventories further research aims as diverse as mapping the molecular basis of trait diversity in domesticated plants or inquiries into the origin of evolutionary innovations in flowering plants millions of years ago. The transformative technological progress of DNA sequencing in the past two decades has enabled researchers to sequence ever more genomes with greater ease. Pangenomes - complete sequences of multiple individuals of a species or higher taxonomic unit - have now entered the geneticists' toolkit. The genomes of crop plants and their wild relatives are being studied with translational applications in breeding in mind. But pangenomes are applicable also in ecological and evolutionary studies, as they help classify and monitor biodiversity across the tree of life, deepen our understanding of how plant species diverged and show how plants adapt to changing environments or new selection pressures exerted by human beings.
Collapse
Affiliation(s)
- Mona Schreiber
- Department of Biology, University of Marburg, Marburg, Germany
| | - Murukarthick Jayakodi
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Nils Stein
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
- Martin Luther University Halle-Wittenberg, Halle (Saale), Germany
| | - Martin Mascher
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany.
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany.
| |
Collapse
|
10
|
Busquets O, Li H, Mohieddin Syed K, Jerez PA, Dunnack J, Bu RL, Verma Y, Pangilinan GR, Martin A, Straub J, Du Y, Simon VM, Poser S, Bush Z, Diaz J, Sahagun A, Gao J, Hernandez DG, Levine KS, Booth EO, Bateup HS, Rio DC, Hockemeyer D, Blauwendraat C, Soldner F. iSCORE-PD: an isogenic stem cell collection to research Parkinson's Disease. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.12.579917. [PMID: 38405931 PMCID: PMC10888955 DOI: 10.1101/2024.02.12.579917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Parkinson's disease (PD) is a neurodegenerative disorder caused by complex genetic and environmental factors. Genome-edited human pluripotent stem cells (hPSCs) offer the uniique potential to advance our understanding of PD etiology by providing disease-relevant cell-types carrying patient mutations along with isogenic control cells. To facilitate this experimental approach, we generated a collection of 55 cell lines genetically engineered to harbor mutations in genes associated with monogenic PD (SNCA A53T, SNCA A30P, PRKN Ex3del, PINK1 Q129X, DJ1/PARK7 Ex1-5del, LRRK2 G2019S, ATP13A2 FS, FBXO7 R498X/FS, DNAJC6 c.801 A>G+FS, SYNJ1 R258Q/FS, VPS13C A444P, VPS13C W395C, GBA1 IVS2+1). All mutations were generated in a fully characterized and sequenced female human embryonic stem cell (hESC) line (WIBR3; NIH approval number NIHhESC-10-0079) using CRISPR/Cas9 or prime editing-based approaches. We implemented rigorous quality controls, including high density genotyping to detect structural variants and confirm the genomic integrity of each cell line. This systematic approach ensures the high quality of our stem cell collection, highlights differences between conventional CRISPR/Cas9 and prime editing and provides a roadmap for how to generate gene-edited hPSCs collections at scale in an academic setting. We expect that our isogenic stem cell collection will become an accessible platform for the study of PD, which can be used by investigators to understand the molecular pathophysiology of PD in a human cellular setting.
Collapse
Affiliation(s)
- Oriol Busquets
- Dominick P. Purpura Department of Neuroscience, Albert Einstein College of Medicine, Rose F. Kennedy Center, Albert Einstein College of Medicine, 1410 Pelham Parkway South, Bronx, NY 10461, USA
- Ruth L. and David S. Gottesman Institute for Stem Cell and Regenerative Medicine Research, Albert Einstein College of Medicine, 1301 Morris Park Ave., Bronx, NY 10461, USA
- Aligning Science Across Parkinson’s (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815, USA
- These authors contributed equally
| | - Hanqin Li
- Aligning Science Across Parkinson’s (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA
- These authors contributed equally
| | - Khaja Mohieddin Syed
- Aligning Science Across Parkinson’s (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA
- These authors contributed equally
| | - Pilar Alvarez Jerez
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, 20892, USA
- Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, University College London, London, UK
- These authors contributed equally
| | - Jesse Dunnack
- Aligning Science Across Parkinson’s (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA
- These authors contributed equally
| | - Riana Lo Bu
- Dominick P. Purpura Department of Neuroscience, Albert Einstein College of Medicine, Rose F. Kennedy Center, Albert Einstein College of Medicine, 1410 Pelham Parkway South, Bronx, NY 10461, USA
- Aligning Science Across Parkinson’s (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815, USA
| | - Yogendra Verma
- Aligning Science Across Parkinson’s (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Gabriella R. Pangilinan
- Aligning Science Across Parkinson’s (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Annika Martin
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Jannes Straub
- Aligning Science Across Parkinson’s (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - YuXin Du
- Aligning Science Across Parkinson’s (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Vivien M. Simon
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Steven Poser
- Dominick P. Purpura Department of Neuroscience, Albert Einstein College of Medicine, Rose F. Kennedy Center, Albert Einstein College of Medicine, 1410 Pelham Parkway South, Bronx, NY 10461, USA
- Aligning Science Across Parkinson’s (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815, USA
| | - Zipporiah Bush
- Aligning Science Across Parkinson’s (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815, USA
- Department of Genetics, Albert Einstein College of Medicine, 1301 Morris Park Ave., Bronx, NY 10461, USA
| | - Jessica Diaz
- Dominick P. Purpura Department of Neuroscience, Albert Einstein College of Medicine, Rose F. Kennedy Center, Albert Einstein College of Medicine, 1410 Pelham Parkway South, Bronx, NY 10461, USA
- Aligning Science Across Parkinson’s (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815, USA
| | - Atehsa Sahagun
- Aligning Science Across Parkinson’s (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Jianpu Gao
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Dena G. Hernandez
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Kristin S. Levine
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Ezgi O. Booth
- Aligning Science Across Parkinson’s (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Helen S. Bateup
- Aligning Science Across Parkinson’s (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA
- Chan Zuckerberg Biohub, San Francisco, CA, 94158, USA
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Donald C. Rio
- Aligning Science Across Parkinson’s (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Dirk Hockemeyer
- Aligning Science Across Parkinson’s (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA
- Innovative Genomics Institute, University of California, Berkeley, CA, 94720, USA
- Chan Zuckerberg Biohub, San Francisco, CA, 94158, USA
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Cornelis Blauwendraat
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, 20892, USA
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Frank Soldner
- Dominick P. Purpura Department of Neuroscience, Albert Einstein College of Medicine, Rose F. Kennedy Center, Albert Einstein College of Medicine, 1410 Pelham Parkway South, Bronx, NY 10461, USA
- Ruth L. and David S. Gottesman Institute for Stem Cell and Regenerative Medicine Research, Albert Einstein College of Medicine, 1301 Morris Park Ave., Bronx, NY 10461, USA
- Aligning Science Across Parkinson’s (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815, USA
- Department of Genetics, Albert Einstein College of Medicine, 1301 Morris Park Ave., Bronx, NY 10461, USA
- Lead contact
| |
Collapse
|
11
|
Ramirez P, Sun W, Kazempour Dehkordi S, Zare H, Fongang B, Bieniek KF, Frost B. Nanopore-based DNA long-read sequencing analysis of the aged human brain. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.01.578450. [PMID: 38370753 PMCID: PMC10871260 DOI: 10.1101/2024.02.01.578450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
Aging disrupts cellular processes such as DNA repair and epigenetic control, leading to a gradual buildup of genomic alterations that can have detrimental effects in post-mitotic cells. Genomic alterations in regions of the genome that are rich in repetitive sequences, often termed "dark loci," are difficult to resolve using traditional sequencing approaches. New long-read technologies offer promising avenues for exploration of previously inaccessible regions of the genome. Using nanopore-based long-read whole-genome sequencing of DNA extracted from aged 18 human brains, we identify previously unreported structural variants and methylation patterns within repetitive DNA, focusing on transposable elements ("jumping genes") as crucial sources of variation, particularly in dark loci. Our analyses reveal potential somatic insertion variants and provides DNA methylation frequencies for many retrotransposon families. We further demonstrate the utility of this technology for the study of these challenging genomic regions in brains affected by Alzheimer's disease and identify significant differences in DNA methylation in pathologically normal brains versus those affected by Alzheimer's disease. Highlighting the power of this approach, we discover specific polymorphic retrotransposons with altered DNA methylation patterns. These retrotransposon loci have the potential to contribute to pathology, warranting further investigation in Alzheimer's disease research. Taken together, our study provides the first long-read DNA sequencing-based analysis of retrotransposon sequences, structural variants, and DNA methylation in the aging brain affected with Alzheimer's disease neuropathology.
Collapse
Affiliation(s)
- Paulino Ramirez
- Barshop Institute for Longevity and Aging Studies, University of Texas Health San Antonio, San Antonio, Texas
- Glenn Biggs Institute for Alzheimer’s and Neurodegenerative Diseases, University of Texas Health San Antonio, San Antonio, Texas
- Department of Cell Systems and Anatomy, University of Texas Health San Antonio, San Antonio, Texas
| | - Wenyan Sun
- Barshop Institute for Longevity and Aging Studies, University of Texas Health San Antonio, San Antonio, Texas
- Glenn Biggs Institute for Alzheimer’s and Neurodegenerative Diseases, University of Texas Health San Antonio, San Antonio, Texas
- Department of Cell Systems and Anatomy, University of Texas Health San Antonio, San Antonio, Texas
- School of Pharmacy, University of Missouri-Kansas City, Kansas City, Missouri
| | - Shiva Kazempour Dehkordi
- Glenn Biggs Institute for Alzheimer’s and Neurodegenerative Diseases, University of Texas Health San Antonio, San Antonio, Texas
- Department of Cell Systems and Anatomy, University of Texas Health San Antonio, San Antonio, Texas
| | - Habil Zare
- Glenn Biggs Institute for Alzheimer’s and Neurodegenerative Diseases, University of Texas Health San Antonio, San Antonio, Texas
- Department of Cell Systems and Anatomy, University of Texas Health San Antonio, San Antonio, Texas
| | - Bernard Fongang
- Glenn Biggs Institute for Alzheimer’s and Neurodegenerative Diseases, University of Texas Health San Antonio, San Antonio, Texas
- Department of Biochemistry & Structural Biology, University of Texas Health San Antonio, San Antonio, Texas
| | - Kevin F. Bieniek
- Glenn Biggs Institute for Alzheimer’s and Neurodegenerative Diseases, University of Texas Health San Antonio, San Antonio, Texas
- Department of Pathology, University of Texas Health San Antonio, San Antonio, Texas
| | - Bess Frost
- Barshop Institute for Longevity and Aging Studies, University of Texas Health San Antonio, San Antonio, Texas
- Glenn Biggs Institute for Alzheimer’s and Neurodegenerative Diseases, University of Texas Health San Antonio, San Antonio, Texas
- Department of Cell Systems and Anatomy, University of Texas Health San Antonio, San Antonio, Texas
| |
Collapse
|
12
|
Damaraju N, Miller AL, Miller DE. Long-Read DNA and RNA Sequencing to Streamline Clinical Genetic Testing and Reduce Barriers to Comprehensive Genetic Testing. J Appl Lab Med 2024; 9:138-150. [PMID: 38167773 DOI: 10.1093/jalm/jfad107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 10/24/2023] [Indexed: 01/05/2024]
Abstract
BACKGROUND Obtaining a precise molecular diagnosis through clinical genetic testing provides information about disease prognosis or progression, allows accurate counseling about recurrence risk, and empowers individuals to benefit from precision therapies or take part in N-of-1 trials. Unfortunately, more than half of individuals with a suspected Mendelian condition remain undiagnosed after a comprehensive clinical evaluation, and the results of any individual clinical genetic test ordered during a typical evaluation may take weeks or months to return. Furthermore, commonly used technologies, such as short-read sequencing, are limited in the types of disease-causing variation they can identify. New technologies, such as long-read sequencing (LRS), are poised to solve these problems. CONTENT Recent technical advances have improved accuracy, increased throughput, and decreased the costs of commercially available LRS technologies. This has resolved many historical concerns about the use of LRS in the clinical environment and opened the door to widespread clinical adoption of LRS. Here, we review LRS technology, how it has been used in the research setting to clarify complex variants or identify disease-causing variation missed by prior clinical testing, and how it may be used clinically in the near future. SUMMARY LRS is unique in that, as a single data source, it has the potential to replace nearly every other clinical genetic test offered today. When analyzed in a stepwise fashion, LRS will simplify laboratory processes, reduce barriers to comprehensive genetic testing, increase the rate of genetic diagnoses, and shorten the amount of time required to make a molecular diagnosis.
Collapse
Affiliation(s)
- Nikhita Damaraju
- Institute for Public Health Genetics, University of Washington, Seattle, WA 98195, United States
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, United States
| | - Angela L Miller
- Department of Pediatrics, University of Washington, Seattle, WA 98195, United States
| | - Danny E Miller
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, United States
- Department of Pediatrics, University of Washington, Seattle, WA 98195, United States
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, United States
| |
Collapse
|
13
|
Wong B, Ferguson JM, Do JY, Gamaarachchi H, Deveson IW. Streamlining remote nanopore data access with slow5curl. Gigascience 2024; 13:giae016. [PMID: 38608279 PMCID: PMC11010652 DOI: 10.1093/gigascience/giae016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2023] [Revised: 03/03/2024] [Accepted: 03/18/2024] [Indexed: 04/14/2024] Open
Abstract
BACKGROUND As adoption of nanopore sequencing technology continues to advance, the need to maintain large volumes of raw current signal data for reanalysis with updated algorithms is a growing challenge. Here we introduce slow5curl, a software package designed to streamline nanopore data sharing, accessibility, and reanalysis. RESULTS Slow5curl allows a user to fetch a specified read or group of reads from a raw nanopore dataset stored on a remote server, such as a public data repository, without downloading the entire file. Slow5curl uses an index to quickly fetch specific reads from a large dataset in SLOW5/BLOW5 format and highly parallelized data access requests to maximize download speeds. Using all public nanopore data from the Human Pangenome Reference Consortium (>22 TB), we demonstrate how slow5curl can be used to quickly fetch and reanalyze raw signal reads corresponding to a set of target genes from each individual in large cohort dataset (n = 91), minimizing the time, egress costs, and local storage requirements for their reanalysis. CONCLUSIONS We provide slow5curl as a free, open-source package that will reduce frictions in data sharing for the nanopore community: https://github.com/BonsonW/slow5curl.
Collapse
Affiliation(s)
- Bonson Wong
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, NSW 2010, Australia
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children’s Research Institute,Sydney, NSW 2010, Australia
- School of Computer Science and Engineering, University of New South Wales, Sydney, NSW 2052, Australia
| | - James M Ferguson
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, NSW 2010, Australia
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children’s Research Institute,Sydney, NSW 2010, Australia
| | - Jessica Y Do
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, NSW 2010, Australia
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children’s Research Institute,Sydney, NSW 2010, Australia
- School of Computer Science and Engineering, University of New South Wales, Sydney, NSW 2052, Australia
| | - Hasindu Gamaarachchi
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, NSW 2010, Australia
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children’s Research Institute,Sydney, NSW 2010, Australia
- School of Computer Science and Engineering, University of New South Wales, Sydney, NSW 2052, Australia
| | - Ira W Deveson
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, NSW 2010, Australia
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children’s Research Institute,Sydney, NSW 2010, Australia
- St Vincent’s Clinical School, Faculty of Medicine, University of New South Wales, Sydney, NSW 2052, Australia
| |
Collapse
|
14
|
Smolka M, Paulin LF, Grochowski CM, Horner DW, Mahmoud M, Behera S, Kalef-Ezra E, Gandhi M, Hong K, Pehlivan D, Scholz SW, Carvalho CMB, Proukakis C, Sedlazeck FJ. Detection of mosaic and population-level structural variants with Sniffles2. Nat Biotechnol 2024:10.1038/s41587-023-02024-y. [PMID: 38168980 DOI: 10.1038/s41587-023-02024-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 10/11/2023] [Indexed: 01/05/2024]
Abstract
Calling structural variations (SVs) is technically challenging, but using long reads remains the most accurate way to identify complex genomic alterations. Here we present Sniffles2, which improves over current methods by implementing a repeat aware clustering coupled with a fast consensus sequence and coverage-adaptive filtering. Sniffles2 is 11.8 times faster and 29% more accurate than state-of-the-art SV callers across different coverages (5-50×), sequencing technologies (ONT and HiFi) and SV types. Furthermore, Sniffles2 solves the problem of family-level to population-level SV calling to produce fully genotyped VCF files. Across 11 probands, we accurately identified causative SVs around MECP2, including highly complex alleles with three overlapping SVs. Sniffles2 also enables the detection of mosaic SVs in bulk long-read data. As a result, we identified multiple mosaic SVs in brain tissue from a patient with multiple system atrophy. The identified SV showed a remarkable diversity within the cingulate cortex, impacting both genes involved in neuron function and repetitive elements.
Collapse
Affiliation(s)
- Moritz Smolka
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
| | - Luis F Paulin
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
| | | | - Dominic W Horner
- Department of Clinical and Movement Neurosciences, Royal Free Campus, Queen Square Institute of Neurology, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Sairam Behera
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
| | - Ester Kalef-Ezra
- Department of Clinical and Movement Neurosciences, Royal Free Campus, Queen Square Institute of Neurology, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Mira Gandhi
- Pacific Northwest Research Institute (PNRI), Seattle, WA, USA
| | - Karl Hong
- Bionano Genomics, San Diego, CA, USA
| | - Davut Pehlivan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Division of Neurology and Developmental Neuroscience, Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
| | - Sonja W Scholz
- Neurodegenerative Diseases Research Unit, National Institute of Neurological Disorders and Stroke, Bethesda, MD, USA
- Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD, USA
| | - Claudia M B Carvalho
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Pacific Northwest Research Institute (PNRI), Seattle, WA, USA
| | - Christos Proukakis
- Department of Clinical and Movement Neurosciences, Royal Free Campus, Queen Square Institute of Neurology, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
15
|
Harvey WT, Ebert P, Ebler J, Audano PA, Munson KM, Hoekzema K, Porubsky D, Beck CR, Marschall T, Garimella K, Eichler EE. Whole-genome long-read sequencing downsampling and its effect on variant-calling precision and recall. Genome Res 2023; 33:2029-2040. [PMID: 38190646 PMCID: PMC10760522 DOI: 10.1101/gr.278070.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 11/03/2023] [Indexed: 01/10/2024]
Abstract
Advances in long-read sequencing (LRS) technologies continue to make whole-genome sequencing more complete, affordable, and accurate. LRS provides significant advantages over short-read sequencing approaches, including phased de novo genome assembly, access to previously excluded genomic regions, and discovery of more complex structural variants (SVs) associated with disease. Limitations remain with respect to cost, scalability, and platform-dependent read accuracy and the tradeoffs between sequence coverage and sensitivity of variant discovery are important experimental considerations for the application of LRS. We compare the genetic variant-calling precision and recall of Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) HiFi platforms over a range of sequence coverages. For read-based applications, LRS sensitivity begins to plateau around 12-fold coverage with a majority of variants called with reasonable accuracy (F1 score above 0.5), and both platforms perform well for SV detection. Genome assembly increases variant-calling precision and recall of SVs and indels in HiFi data sets with HiFi outperforming ONT in quality as measured by the F1 score of assembly-based variant call sets. While both technologies continue to evolve, our work offers guidance to design cost-effective experimental strategies that do not compromise on discovering novel biology.
Collapse
Affiliation(s)
- William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195-5065, USA
| | - Peter Ebert
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
- Core Unit Bioinformatics, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Peter A Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195-5065, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195-5065, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195-5065, USA
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, Connecticut 06030-6403, USA
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Kiran Garimella
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195-5065, USA;
- Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
16
|
Chaisson MJP, Sulovari A, Valdmanis PN, Miller DE, Eichler EE. Advances in the discovery and analyses of human tandem repeats. Emerg Top Life Sci 2023; 7:361-381. [PMID: 37905568 PMCID: PMC10806765 DOI: 10.1042/etls20230074] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 10/18/2023] [Accepted: 10/18/2023] [Indexed: 11/02/2023]
Abstract
Long-read sequencing platforms provide unparalleled access to the structure and composition of all classes of tandemly repeated DNA from STRs to satellite arrays. This review summarizes our current understanding of their organization within the human genome, their importance with respect to disease, as well as the advances and challenges in understanding their genetic diversity and functional effects. Novel computational methods are being developed to visualize and associate these complex patterns of human variation with disease, expression, and epigenetic differences. We predict accurate characterization of this repeat-rich form of human variation will become increasingly relevant to both basic and clinical human genetics.
Collapse
Affiliation(s)
- Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, U.S.A
- The Genomic and Epigenomic Regulation Program, USC Norris Cancer Center, University of Southern California, Los Angeles, CA 90089, U.S.A
| | - Arvis Sulovari
- Computational Biology, Cajal Neuroscience Inc, Seattle, WA 98102, U.S.A
| | - Paul N Valdmanis
- Division of Medical Genetics, Department of Medicine, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, U.S.A
| | - Danny E Miller
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, U.S.A
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, U.S.A
- Department of Pediatrics, University of Washington, Seattle, WA 98195, U.S.A
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, U.S.A
| |
Collapse
|