1
|
Li Q, Keskus AG, Wagner J, Izydorczyk MB, Timp W, Sedlazeck FJ, Klein AP, Zook JM, Kolmogorov M, Schatz MC. Unraveling the hidden complexity of cancer through long-read sequencing. Genome Res 2025; 35:599-620. [PMID: 40113261 PMCID: PMC12047254 DOI: 10.1101/gr.280041.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/22/2025]
Abstract
Cancer is fundamentally a disease of the genome, characterized by extensive genomic, transcriptomic, and epigenomic alterations. Most current studies predominantly use short-read sequencing, gene panels, or microarrays to explore these alterations; however, these technologies can systematically miss or misrepresent certain types of alterations, especially structural variants, complex rearrangements, and alterations within repetitive regions. Long-read sequencing is rapidly emerging as a transformative technology for cancer research by providing a comprehensive view across the genome, transcriptome, and epigenome, including the ability to detect alterations that previous technologies have overlooked. In this Perspective, we explore the current applications of long-read sequencing for both germline and somatic cancer analysis. We provide an overview of the computational methodologies tailored to long-read data and highlight key discoveries and resources within cancer genomics that were previously inaccessible with prior technologies. We also address future opportunities and persistent challenges, including the experimental and computational requirements needed to scale to larger sample sizes, the hurdles in sequencing and analyzing complex cancer genomes, and opportunities for leveraging machine learning and artificial intelligence technologies for cancer informatics. We further discuss how the telomere-to-telomere genome and the emerging human pangenome could enhance the resolution of cancer genome analysis, potentially revolutionizing early detection and disease monitoring in patients. Finally, we outline strategies for transitioning long-read sequencing from research applications to routine clinical practice.
Collapse
Affiliation(s)
- Qiuhui Li
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Ayse G Keskus
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, Bethesda, Maryland 20892, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, Maryland 20899, USA
| | - Michal B Izydorczyk
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Texas 77030, USA
- Department of Computer Science, Rice University, Houston, Texas 77251, USA
| | - Alison P Klein
- Sidney Kimmel Comprehensive Cancer Center, Department of Oncology, Johns Hopkins Medicine, Baltimore, Maryland 21031, USA
| | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, Maryland 20899, USA
| | - Mikhail Kolmogorov
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, Bethesda, Maryland 20892, USA;
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA;
- Sidney Kimmel Comprehensive Cancer Center, Department of Oncology, Johns Hopkins Medicine, Baltimore, Maryland 21031, USA
| |
Collapse
|
2
|
Montano C, Timp W. Evolution of genome-wide methylation profiling technologies. Genome Res 2025; 35:572-582. [PMID: 40228903 PMCID: PMC12047278 DOI: 10.1101/gr.278407.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/16/2025]
Abstract
In this mini-review, we explore the advancements in genome-wide DNA methylation profiling, tracing the evolution from traditional methods such as methylation arrays and whole-genome bisulfite sequencing to the cutting-edge single-molecule profiling enabled by long-read sequencing (LRS) technologies. We highlight how LRS is transforming clinical and translational research, particularly by its ability to simultaneously measure genetic and epigenetic information, providing a more comprehensive understanding of complex disease mechanisms. We discuss current challenges and future directions in the field, emphasizing the need for innovative computational tools and robust, reproducible approaches to fully harness the capabilities of LRS in molecular diagnostics.
Collapse
Affiliation(s)
- Carolina Montano
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, USA
- Division of Human Genetics, Department of Pediatrics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, USA
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, USA;
| |
Collapse
|
3
|
Rausch T, Marschall T, Korbel JO. The impact of long-read sequencing on human population-scale genomics. Genome Res 2025; 35:593-598. [PMID: 40228902 PMCID: PMC12047236 DOI: 10.1101/gr.280120.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/16/2025]
Abstract
Long-read sequencing technologies, particularly those from Pacific Biosciences and Oxford Nanopore Technologies, are revolutionizing genome research by providing high-resolution insights into complex and repetitive regions of the human genome that were previously inaccessible. These advances have been particularly enabling for the comprehensive detection of genomic structural variants (SVs), which is critical for linking genotype to phenotype in population-scale and rare disease studies, as well as in cancer. Recent developments in sequencing throughput and computational methods, such as pangenome graphs and haplotype-resolved assemblies, are paving the way for the future inclusion of long-read sequencing in clinical cohort studies and disease diagnostics. DNA methylation signals directly obtained from long reads enhance the utility of single-molecule long-read sequencing technologies by enabling molecular phenotypes to be interpreted, and by allowing the identification of the parent of origin of de novo mutations. Despite this recent progress, challenges remain in scaling long-read technologies to large populations due to cost, computational complexity, and the lack of tools to facilitate the efficient interpretation of SVs in graphs. This perspective provides a succinct review on the current state of long-read sequencing in genomics by highlighting its transformative potential and key hurdles, and emphasizing future opportunities for advancing the understanding of human genetic diversity and diseases through population-scale long-read analysis.
Collapse
Affiliation(s)
- Tobias Rausch
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, 69117 Heidelberg, Germany;
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, 40225 Düsseldorf, Germany;
- Center for Digital Medicine, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Jan O Korbel
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, 69117 Heidelberg, Germany;
| |
Collapse
|
4
|
Keskus AG, Bryant A, Ahmad T, Yoo B, Aganezov S, Goretsky A, Donmez A, Lansdon LA, Rodriguez I, Park J, Liu Y, Cui X, Gardner J, McNulty B, Sacco S, Shetty J, Zhao Y, Tran B, Narzisi G, Helland A, Cook DE, Chang PC, Kolesnikov A, Carroll A, Molloy EK, Bi C, Walter A, Gibson M, Pushel I, Guest E, Pastinen T, Shafin K, Miga KH, Malikic S, Day CP, Robine N, Sahinalp C, Dean M, Farooqi MS, Paten B, Kolmogorov M. Severus detects somatic structural variation and complex rearrangements in cancer genomes using long-read sequencing. Nat Biotechnol 2025:10.1038/s41587-025-02618-8. [PMID: 40185952 DOI: 10.1038/s41587-025-02618-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2024] [Accepted: 02/26/2025] [Indexed: 04/07/2025]
Abstract
For the detection of somatic structural variation (SV) in cancer genomes, long-read sequencing is advantageous over short-read sequencing with respect to mappability and variant phasing. However, most current long-read SV detection methods are not developed for the analysis of tumor genomes characterized by complex rearrangements and heterogeneity. Here, we present Severus, a breakpoint graph-based algorithm for somatic SV calling from long-read cancer sequencing. Severus works with matching normal samples, supports unbalanced cancer karyotypes, can characterize complex multibreak SV patterns and produces haplotype-specific calls. On a comprehensive multitechnology cell line panel, Severus consistently outperforms other long-read and short-read methods in terms of SV detection F1 score (harmonic mean of the precision and recall). We also illustrate that compared to long-read methods, short-read sequencing systematically misses certain classes of somatic SVs, such as insertions or clustered rearrangements. We apply Severus to several clinical cases of pediatric leukemia/lymphoma, revealing clinically relevant cryptic rearrangements missed by standard genomic panels.
Collapse
Affiliation(s)
- Ayse G Keskus
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Asher Bryant
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Tanveer Ahmad
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Byunggil Yoo
- Children's Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | | | - Anton Goretsky
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Ataberk Donmez
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Lisa A Lansdon
- Children's Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Isabel Rodriguez
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Rockville, MD, USA
| | - Jimin Park
- University of California, Santa Cruz, Genomics Institute, Santa Cruz, CA, USA
| | - Yuelin Liu
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Xiwen Cui
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Joshua Gardner
- University of California, Santa Cruz, Genomics Institute, Santa Cruz, CA, USA
| | - Brandy McNulty
- University of California, Santa Cruz, Genomics Institute, Santa Cruz, CA, USA
| | - Samuel Sacco
- University of California, Santa Cruz, Genomics Institute, Santa Cruz, CA, USA
| | - Jyoti Shetty
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Yongmei Zhao
- Sequencing Facility Bioinformatics Group, Biomedical Informatics and Data Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Bao Tran
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | | | | | | | | | | | | | - Erin K Molloy
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Chengpeng Bi
- Children's Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Adam Walter
- Children's Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Margaret Gibson
- Children's Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Irina Pushel
- Children's Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Erin Guest
- Children's Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Tomi Pastinen
- Children's Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Kishwar Shafin
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Rockville, MD, USA
| | - Karen H Miga
- University of California, Santa Cruz, Genomics Institute, Santa Cruz, CA, USA
| | - Salem Malikic
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Chi-Ping Day
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | | | - Cenk Sahinalp
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Michael Dean
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Rockville, MD, USA
| | - Midhat S Farooqi
- Children's Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Benedict Paten
- University of California, Santa Cruz, Genomics Institute, Santa Cruz, CA, USA
| | - Mikhail Kolmogorov
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA.
| |
Collapse
|
5
|
Frias-De-Diego A, Jara M, Lanzas C. Influence of Sequencing Technology on Pangenome-Level Analysis and Detection of Antimicrobial Resistance Genes in ESKAPE Pathogens. Open Forum Infect Dis 2025; 12:ofaf183. [PMID: 40212029 PMCID: PMC11983279 DOI: 10.1093/ofid/ofaf183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2025] [Accepted: 03/24/2025] [Indexed: 04/13/2025] Open
Abstract
As sequencing costs decrease, short-read and long-read technologies are indispensable tools for uncovering the genetic drivers behind bacterial pathogen resistance. This study explores the differences between the use of short-read (Illumina) and long-read (Oxford Nanopore Technologies [ONT]) sequencing in detecting antimicrobial resistance (AMR) genes in ESKAPE pathogens (ie, Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa, and Enterobacter cloacae). Utilizing a dataset of 1385 whole genome sequences and applying commonly used bioinformatic methods in bacterial genomics, we assessed the differences in genomic completeness, pangenome structure, and AMR gene and point mutation identification. Illumina presented higher genome completeness, while ONT identified a broader pangenome. Hybrid assembly outperformed both Illumina and ONT at identifying key AMR genetic determinants, presented results closer to Illumina's completeness, and revealed ONT-like pangenomic content. Notably, Illumina consistently detected more AMR-related point mutations than its counterparts. This highlights the importance of method selection based on research goals, particularly when using publicly available data ranging a wide timespan. Differences were also observed for specific gene classes and bacterial species, underscoring the need for a nuanced understanding of technology limitations. Overall, this study reveals the strengths and limitations of each approach, advocating for the use of Illumina for common AMR analysis, ONT for studying complex genomes and novel species, and hybrid assembly for a more comprehensive characterization, leveraging the benefits of both technologies.
Collapse
Affiliation(s)
- Alba Frias-De-Diego
- Department of Population Health and Pathobiology, College of Veterinary Medicine, North Carolina State University, Raleigh, North Carolina, USA
| | - Manuel Jara
- Department of Population Health and Pathobiology, College of Veterinary Medicine, North Carolina State University, Raleigh, North Carolina, USA
| | - Cristina Lanzas
- Department of Population Health and Pathobiology, College of Veterinary Medicine, North Carolina State University, Raleigh, North Carolina, USA
| |
Collapse
|
6
|
Lu B, Winnall S, Cross W, Barnes CP. Cell-cycle dependent DNA repair and replication unifies patterns of chromosome instability. Nat Commun 2025; 16:3033. [PMID: 40155604 PMCID: PMC11953314 DOI: 10.1038/s41467-025-58245-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Accepted: 03/17/2025] [Indexed: 04/01/2025] Open
Abstract
Chromosomal instability (CIN) is pervasive in human tumours and often leads to structural or numerical chromosomal aberrations. Somatic structural variants (SVs) are intimately related to copy number alterations but the two types of variant are often studied independently. Additionally, despite numerous studies on detecting various SV patterns, there are still no general quantitative models of SV generation. To address this issue, we develop a computational cell-cycle model for the generation of SVs from end-joining repair and replication after double-strand break formation. Our model provides quantitative information on the relationship between breakage fusion bridge cycle, chromothripsis, seismic amplification, and extra-chromosomal circular DNA. Given whole-genome sequencing data, the model also allows us to infer important parameters in SV generation with Bayesian inference. Our quantitative framework unifies disparate genomic patterns resulted from CIN, provides a null mutational model for SV, and reveals deeper insights into the impact of genome rearrangement on tumour evolution.
Collapse
Affiliation(s)
- Bingxin Lu
- Department of Cell and Developmental Biology, University College London, Gower Street, London, UK.
- UCL Genetics Institute, University College London, Gower Street, London, UK.
- School of Biosciences, University of Surrey, Stag Hill, Guildford, UK.
- Surrey Institute for People-Centred Artificial Intelligence, University of Surrey, Stag Hill, Guildford, UK.
| | - Samuel Winnall
- Department of Cell and Developmental Biology, University College London, Gower Street, London, UK
| | - William Cross
- Department of Cell and Developmental Biology, University College London, Gower Street, London, UK
- Rare Malignancies and Cancer Evolution Laboratory, School of Biological Sciences, University of Reading, Whiteknights, Reading, UK
| | - Chris P Barnes
- Department of Cell and Developmental Biology, University College London, Gower Street, London, UK.
- UCL Genetics Institute, University College London, Gower Street, London, UK.
| |
Collapse
|
7
|
Simovic-Lorenz M, Ernst A. Chromothripsis in cancer. Nat Rev Cancer 2025; 25:79-92. [PMID: 39548283 DOI: 10.1038/s41568-024-00769-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 10/16/2024] [Indexed: 11/17/2024]
Abstract
Chromothripsis is a mutational phenomenon in which a single catastrophic event generates extensive rearrangements of one or a few chromosomes. This extreme form of genome instability has been detected in 30-50% of cancers. Studies conducted in the past few years have uncovered insights into how chromothripsis arises and deciphered some of the cellular and molecular consequences of chromosome shattering. This Review discusses the defining features of chromothripsis and describes its prevalence across different cancer types as indicated by the manifestations of chromothripsis detected in human cancer samples. The different mechanistic models of chromothripsis, derived from in vitro systems that enable causal inference through experimental manipulation, are discussed in detail. The contribution of chromothripsis to cancer development, the selective advantages that cancer cells might gain from chromothripsis, the evolutionary trajectories of chromothriptic tumours, and the potential vulnerabilities and therapeutic opportunities presented by chromothriptic cells are also highlighted.
Collapse
Affiliation(s)
- Milena Simovic-Lorenz
- Group Genome Instability in Tumors, German Cancer Research Center, Heidelberg, Germany
- German Cancer Consortium (DKTK), Heidelberg, Germany
| | - Aurélie Ernst
- Group Genome Instability in Tumors, German Cancer Research Center, Heidelberg, Germany.
- German Cancer Consortium (DKTK), Heidelberg, Germany.
| |
Collapse
|
8
|
Frias-De-Diego A, Jara M, Lanzas C. Influence of Sequencing Technology on Pangenome-level Analysis and Detection of Antimicrobial Resistance Genes in ESKAPE Pathogens. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.01.08.631980. [PMID: 39829834 PMCID: PMC11741274 DOI: 10.1101/2025.01.08.631980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2025]
Abstract
As sequencing costs decrease, short-read and long-read technologies are indispensable tools for uncovering the genetic drivers behind bacterial pathogen resistance. This study explores the differences between the use of short-read (Illumina) and long-read (Oxford Nanopore Technologies, ONT) sequencing in detecting antimicrobial resistance (AMR) genes in ESKAPE pathogens ( Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa, and Enterobacter cloacae ). Utilizing a dataset of 1,385 whole genome sequences and applying commonly used bioinformatic methods in bacterial genomics, we assessed the differences in genomic completeness, pangenome structure, and AMR gene and point mutation identification. Illumina presented higher genome completeness, while ONT identified a broader pangenome. Hybrid assembly outperformed both Illumina and ONT at identifying key AMR genetic determinants, presented results closer to Illumina's completeness, and revealed ONT-like pangenomic content. Notably, Illumina consistently detected more AMR-related point mutations than its counterparts. This highlights the importance of method selection based on research goals. Differences were also observed for specific gene classes and bacterial species, underscoring the need for a nuanced understanding of technology limitations. Overall, this study reveals the strengths and limitations of each approach, advocating for the use of Illumina for common AMR analysis; ONT for studying complex genomes and novel species, and hybrid assembly for a more comprehensive characterization, leveraging the benefits of both technologies. Impact Statement This study provides a comprehensive comparison of short-read (Illumina) and long-read (Oxford Nanopore Technologies, ONT) sequencing technologies in the context of antimicrobial resistance (AMR) detection in ESKAPE pathogens. By analyzing a large dataset of 1,385 whole genome sequences, the research offers valuable insights into the strengths and limitations of each approach, as well as the benefits of the novel approach of hybrid assembly. These findings have broad utility across microbiology, genomics, and infectious disease research. In particular, they apply to the work of researchers and clinicians dealing with AMR surveillance, investigation into outbreaks, and bacterial genome analysis. Given the nuance with which technological differences in genomic completeness, pangenome structure, and AMR determinant detection have been explored in this study, it is a good basis for informed method selection for future research. While the output represents an incremental advance, its significance lies in its practical implications. It thus enables researchers to take more reasonable decisions in designing genomic studies of bacterial pathogens by showing the complementarity of various sequencing approaches and their specific strengths. This could lead to more accurate and comprehensive detection of AMR, which would contribute ultimately to improved antibiotic stewardship and public health strategies. Data Summary The authors confirm all supporting data, code and protocols have been provided within the article or through supplementary data files. Repositories All the sequences used for this study are publicly accessible from GenBank, and their individual IDs are disclosed in Supplementary Table 1.
Collapse
|
9
|
Calvo-Roitberg E, Daniels RF, Pai AA. Challenges in identifying mRNA transcript starts and ends from long-read sequencing data. Genome Res 2024; 34:1719-1734. [PMID: 39567236 DOI: 10.1101/gr.279559.124] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Accepted: 08/16/2024] [Indexed: 11/22/2024]
Abstract
Long-read sequencing (LRS) technologies have the potential to revolutionize scientific discoveries in RNA biology through the comprehensive identification and quantification of full-length mRNA isoforms. Despite great promise, challenges remain in the widespread implementation of LRS technologies for RNA-based applications, including concerns about low coverage, high sequencing error, and robust computational pipelines. Although much focus has been placed on defining mRNA exon composition and structure with LRS data, less careful characterization has been done of the ability to assess the terminal ends of isoforms, specifically, transcription start and end sites. Such characterization is crucial for completely delineating full mRNA molecules and regulatory consequences. However, there are substantial inconsistencies in both start and end coordinates of LRS reads spanning a gene, such that LRS reads often fail to accurately recapitulate annotated or empirically derived terminal ends of mRNA molecules. Here, we describe the specific challenges of identifying and quantifying mRNA terminal ends with LRS technologies and how these issues influence biological interpretations of LRS data. We then review recent experimental and computational advances designed to alleviate these problems, with ideal use cases for each approach. Finally, we outline anticipated developments and necessary improvements for the characterization of terminal ends from LRS data.
Collapse
Affiliation(s)
- Ezequiel Calvo-Roitberg
- RNA Therapeutics Institute, University of Massachusetts Chan Medical School, Worcester, Massachusetts 01605, USA
| | - Rachel F Daniels
- RNA Therapeutics Institute, University of Massachusetts Chan Medical School, Worcester, Massachusetts 01605, USA
| | - Athma A Pai
- RNA Therapeutics Institute, University of Massachusetts Chan Medical School, Worcester, Massachusetts 01605, USA
| |
Collapse
|
10
|
O'Neill K, Pleasance E, Fan J, Akbari V, Chang G, Dixon K, Csizmok V, MacLennan S, Porter V, Galbraith A, Grisdale CJ, Culibrk L, Dupuis JH, Corbett R, Hopkins J, Bowlby R, Pandoh P, Smailus DE, Cheng D, Wong T, Frey C, Shen Y, Lewis E, Paulin LF, Sedlazeck FJ, Nelson JMT, Chuah E, Mungall KL, Moore RA, Coope R, Mungall AJ, McConechy MK, Williamson LM, Schrader KA, Yip S, Marra MA, Laskin J, Jones SJM. Long-read sequencing of an advanced cancer cohort resolves rearrangements, unravels haplotypes, and reveals methylation landscapes. CELL GENOMICS 2024; 4:100674. [PMID: 39406235 PMCID: PMC11605692 DOI: 10.1016/j.xgen.2024.100674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 06/26/2024] [Accepted: 09/18/2024] [Indexed: 11/16/2024]
Abstract
The Long-Read Personalized OncoGenomics (POG) dataset comprises a cohort of 189 patient tumors and 41 matched normal samples sequenced using the Oxford Nanopore Technologies PromethION platform. This dataset from the POG program and the Marathon of Hope Cancer Centres Network includes DNA and RNA short-read sequence data, analytics, and clinical information. We show the potential of long-read sequencing for resolving complex cancer-related structural variants, viral integrations, and extrachromosomal circular DNA. Long-range phasing facilitates the discovery of allelically differentially methylated regions (aDMRs) and allele-specific expression, including recurrent aDMRs in the cancer genes RET and CDKN2A. Germline promoter methylation in MLH1 can be directly observed in Lynch syndrome. Promoter methylation in BRCA1 and RAD51C is a likely driver behind homologous recombination deficiency where no coding driver mutation was found. This dataset demonstrates applications for long-read sequencing in precision medicine and is available as a resource for developing analytical approaches using this technology.
Collapse
Affiliation(s)
- Kieran O'Neill
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada
| | - Erin Pleasance
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada
| | - Jeremy Fan
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada
| | - Vahid Akbari
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada; Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | - Glenn Chang
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada
| | - Katherine Dixon
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada
| | - Veronika Csizmok
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada
| | - Signe MacLennan
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada; Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada; Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada
| | - Vanessa Porter
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada; Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada; Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada
| | - Andrew Galbraith
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada
| | - Cameron J Grisdale
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada
| | - Luka Culibrk
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada
| | - John H Dupuis
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada
| | - Richard Corbett
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada
| | - James Hopkins
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada
| | - Reanne Bowlby
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada
| | - Pawan Pandoh
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada
| | - Duane E Smailus
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada
| | - Dean Cheng
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada
| | - Tina Wong
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada
| | - Connor Frey
- Department of Medicine, University of British Columbia, Vancouver, BC, Canada
| | - Yaoqing Shen
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada
| | - Eleanor Lewis
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada
| | - Luis F Paulin
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Jessica M T Nelson
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada
| | - Eric Chuah
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada
| | - Karen L Mungall
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada
| | - Richard A Moore
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada
| | - Robin Coope
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada
| | - Andrew J Mungall
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada
| | - Melissa K McConechy
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada
| | - Laura M Williamson
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada
| | - Kasmintan A Schrader
- Hereditary Cancer Program, BC Cancer, Vancouver, BC, Canada; Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | - Stephen Yip
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC, Canada
| | - Marco A Marra
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada; Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada; Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada
| | - Janessa Laskin
- Department of Medical Oncology, BC Cancer, Vancouver, BC, Canada
| | - Steven J M Jones
- Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada; Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada.
| |
Collapse
|
11
|
Reeve MP, Loomis S, Nissilä E, Rausch T, Zheng Z, Briotta Parolo PD, Ben-Isvy D, Aho E, Cesetti E, Okunuki Y, McLaughlin H, Mäkelä J, FinnGen, Kurki M, Talkowski ME, Korbel JO, Connor K, Meri S, Daly MJ, Runz H. Loss of CFHR5 function reduces the risk for age-related macular degeneration. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.11.11.24317117. [PMID: 39606340 PMCID: PMC11601675 DOI: 10.1101/2024.11.11.24317117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
Age-related macular degeneration (AMD) is a prevalent cause of vision loss in the elderly with limited therapeutic options. A single chromosomal region around the complement factor H gene (CFH) is reported to explain nearly 25% of genetic AMD risk. Here, we used association testing, statistical finemapping and conditional analyses in 12,495 AMD cases and 461,686 controls to deconvolute four major CFH haplotypes that convey protection from AMD. We show that beyond CFH, two of these are explained by Finn-enriched frameshift and missense variants in the CFH modulator CFHR5. We demonstrate through a FinnGen sample recall study that CFHR5 variant carriers exhibit dose-dependent reductions in serum levels of the CFHR5 gene product FHR-5 and two functionally related proteins at the locus. Genetic reduction in FHR-5 correlates with higher preserved activities of the classical and alternative complement pathways. Our results propose therapeutic downregulation of FHR-5 as promising to prevent or treat AMD.
Collapse
Affiliation(s)
- Mary Pat Reeve
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | | | - Eija Nissilä
- Department of Bacteriology and Immunology, Translational Immunology Research Program, University of Helsinki, Helsinki, Finland
| | - Tobias Rausch
- European Molecular Biological Laboratories (EMBL), Heidelberg, Germany
| | - Zhili Zheng
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Pietro Della Briotta Parolo
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Daniel Ben-Isvy
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Division of Medical Sciences, Harvard Medical School, Boston, MA, USA
| | - Elias Aho
- Department of Bacteriology and Immunology, Translational Immunology Research Program, University of Helsinki, Helsinki, Finland
| | - Emilia Cesetti
- Department of Bacteriology and Immunology, Translational Immunology Research Program, University of Helsinki, Helsinki, Finland
- Department of Biomedical Sciences, Humanitas University, Milan, Italy
| | - Yoko Okunuki
- Research and Development, Biogen Inc., Cambridge, MA, USA
| | | | | | | | - Mitja Kurki
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Michael E. Talkowski
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Jan O. Korbel
- European Molecular Biological Laboratories (EMBL), Heidelberg, Germany
| | - Kip Connor
- Research and Development, Biogen Inc., Cambridge, MA, USA
| | - Seppo Meri
- Department of Bacteriology and Immunology, Translational Immunology Research Program, University of Helsinki, Helsinki, Finland
| | - Mark J. Daly
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Heiko Runz
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- Research and Development, Biogen Inc., Cambridge, MA, USA
- European Molecular Biological Laboratories (EMBL), Heidelberg, Germany
| |
Collapse
|
12
|
Lysenkova Wiklander M, Arvidsson G, Bunikis I, Lundmark A, Raine A, Marincevic-Zuniga Y, Gezelius H, Bremer A, Feuk L, Ameur A, Nordlund J. A multiomic characterization of the leukemia cell line REH using short- and long-read sequencing. Life Sci Alliance 2024; 7:e202302481. [PMID: 38777370 PMCID: PMC11111970 DOI: 10.26508/lsa.202302481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 05/02/2024] [Accepted: 05/02/2024] [Indexed: 05/25/2024] Open
Abstract
The B-cell acute lymphoblastic leukemia (ALL) cell line REH, with the t(12;21) ETV6::RUNX1 translocation, is known to have a complex karyotype defined by a series of large-scale chromosomal rearrangements. Taken from a 15-yr-old at relapse, the cell line offers a practical model for the study of pediatric B-ALL. In recent years, short- and long-read DNA and RNA sequencing have emerged as a complement to karyotyping techniques in the resolution of structural variants in an oncological context. Here, we explore the integration of long-read PacBio and Oxford Nanopore whole-genome sequencing, IsoSeq RNA sequencing, and short-read Illumina sequencing to create a detailed genomic and transcriptomic characterization of the REH cell line. Whole-genome sequencing clarified the molecular traits of disrupted ALL-associated genes including CDKN2A, PAX5, BTG1, VPREB1, and TBL1XR1, as well as the glucocorticoid receptor NR3C1 Meanwhile, transcriptome sequencing identified seven fusion genes within the genomic breakpoints. Together, our extensive whole-genome investigation makes high-quality open-source data available to the leukemia genomics community.
Collapse
Affiliation(s)
- Mariya Lysenkova Wiklander
- Department of Medical Sciences, Uppsala University, Uppsala, Sweden
- SciLifeLab, Uppsala University, Uppsala, Sweden
| | - Gustav Arvidsson
- Department of Medical Sciences, Uppsala University, Uppsala, Sweden
- SciLifeLab, Uppsala University, Uppsala, Sweden
| | - Ignas Bunikis
- SciLifeLab, Uppsala University, Uppsala, Sweden
- Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
- National Genomics Infrastructure, Uppsala University, Uppsala, Sweden
| | - Anders Lundmark
- Department of Medical Sciences, Uppsala University, Uppsala, Sweden
- SciLifeLab, Uppsala University, Uppsala, Sweden
| | - Amanda Raine
- Department of Medical Sciences, Uppsala University, Uppsala, Sweden
- SciLifeLab, Uppsala University, Uppsala, Sweden
- National Genomics Infrastructure, Uppsala University, Uppsala, Sweden
| | - Yanara Marincevic-Zuniga
- Department of Medical Sciences, Uppsala University, Uppsala, Sweden
- SciLifeLab, Uppsala University, Uppsala, Sweden
- National Genomics Infrastructure, Uppsala University, Uppsala, Sweden
| | - Henrik Gezelius
- Department of Medical Sciences, Uppsala University, Uppsala, Sweden
- SciLifeLab, Uppsala University, Uppsala, Sweden
- National Genomics Infrastructure, Uppsala University, Uppsala, Sweden
| | - Anna Bremer
- SciLifeLab, Uppsala University, Uppsala, Sweden
- Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
- Department of Clinical Genetics, Uppsala University Hospital, Uppsala, Sweden
| | - Lars Feuk
- SciLifeLab, Uppsala University, Uppsala, Sweden
- Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
- National Genomics Infrastructure, Uppsala University, Uppsala, Sweden
| | - Adam Ameur
- SciLifeLab, Uppsala University, Uppsala, Sweden
- Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
- National Genomics Infrastructure, Uppsala University, Uppsala, Sweden
| | - Jessica Nordlund
- Department of Medical Sciences, Uppsala University, Uppsala, Sweden
- SciLifeLab, Uppsala University, Uppsala, Sweden
- National Genomics Infrastructure, Uppsala University, Uppsala, Sweden
| |
Collapse
|
13
|
Shelton WJ, Zandpazandi S, Nix JS, Gokden M, Bauer M, Ryan KR, Wardell CP, Vaske OM, Rodriguez A. Long-read sequencing for brain tumors. Front Oncol 2024; 14:1395985. [PMID: 38915364 PMCID: PMC11194609 DOI: 10.3389/fonc.2024.1395985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 05/27/2024] [Indexed: 06/26/2024] Open
Abstract
Brain tumors and genomics have a long-standing history given that glioblastoma was the first cancer studied by the cancer genome atlas. The numerous and continuous advances through the decades in sequencing technologies have aided in the advanced molecular characterization of brain tumors for diagnosis, prognosis, and treatment. Since the implementation of molecular biomarkers by the WHO CNS in 2016, the genomics of brain tumors has been integrated into diagnostic criteria. Long-read sequencing, also known as third generation sequencing, is an emerging technique that allows for the sequencing of longer DNA segments leading to improved detection of structural variants and epigenetics. These capabilities are opening a way for better characterization of brain tumors. Here, we present a comprehensive summary of the state of the art of third-generation sequencing in the application for brain tumor diagnosis, prognosis, and treatment. We discuss the advantages and potential new implementations of long-read sequencing into clinical paradigms for neuro-oncology patients.
Collapse
Affiliation(s)
- William J Shelton
- Department of Neurosurgery, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR, United States
| | - Sara Zandpazandi
- Department of Neurosurgery, Medical University of South Carolina, Charleston, SC, United States
| | - J Stephen Nix
- Department of Pathology, University of Arkansas for Medical Sciences, Little Rock, AR, United States
| | - Murat Gokden
- Department of Pathology, University of Arkansas for Medical Sciences, Little Rock, AR, United States
| | - Michael Bauer
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, United States
| | - Katie Rose Ryan
- Department of Biochemistry and Molecular Biology, University of Arkansas for Medical Sciences, Little Rock, AR, United States
| | - Christopher P Wardell
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, United States
| | - Olena Morozova Vaske
- Department of Molecular, Cell and Developmental Biology, University of California Santa Cruz, Santa Cruz, CA, United States
| | - Analiz Rodriguez
- Department of Neurosurgery, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR, United States
| |
Collapse
|
14
|
Schloissnig S, Pani S, Rodriguez-Martin B, Ebler J, Hain C, Tsapalou V, Söylev A, Hüther P, Ashraf H, Prodanov T, Asparuhova M, Hunt S, Rausch T, Marschall T, Korbel JO. Long-read sequencing and structural variant characterization in 1,019 samples from the 1000 Genomes Project. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.18.590093. [PMID: 38659906 PMCID: PMC11042266 DOI: 10.1101/2024.04.18.590093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Structural variants (SVs) contribute significantly to human genetic diversity and disease 1-4 . Previously, SVs have remained incompletely resolved by population genomics, with short-read sequencing facing limitations in capturing the whole spectrum of SVs at nucleotide resolution 5-7 . Here we leveraged nanopore sequencing 8 to construct an intermediate coverage resource of 1,019 long-read genomes sampled within 26 human populations from the 1000 Genomes Project. By integrating linear and graph-based approaches for SV analysis via pangenome graph-augmentation, we uncover 167,291 sequence-resolved SVs in these samples, considerably advancing SV characterization compared to population-wide short-read sequencing studies 3,4 . Our analysis details diverse SV classes-deletions, duplications, insertions, and inversions-at population-scale. LINE-1 and SVA retrotransposition activities frequently mediate transductions 9,10 of unique sequences, with both mobile element classes transducing sequences at either the 3'- or 5'-end, depending on the source element locus. Furthermore, analyses of SV breakpoint junctions suggest a continuum of homology-mediated rearrangement processes are integral to SV formation, and highlight evidence for SV recurrence involving repeat sequences. Our open-access dataset underscores the transformative impact of long-read sequencing in advancing the characterisation of polymorphic genomic architectures, and provides a resource for guiding variant prioritisation in future long-read sequencing-based disease studies.
Collapse
|
15
|
Keskus A, Bryant A, Ahmad T, Yoo B, Aganezov S, Goretsky A, Donmez A, Lansdon LA, Rodriguez I, Park J, Liu Y, Cui X, Gardner J, McNulty B, Sacco S, Shetty J, Zhao Y, Tran B, Narzisi G, Helland A, Cook DE, Chang PC, Kolesnikov A, Carroll A, Molloy EK, Pushel I, Guest E, Pastinen T, Shafin K, Miga KH, Malikic S, Day CP, Robine N, Sahinalp C, Dean M, Farooqi MS, Paten B, Kolmogorov M. Severus: accurate detection and characterization of somatic structural variation in tumor genomes using long reads. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.22.24304756. [PMID: 38585974 PMCID: PMC10996739 DOI: 10.1101/2024.03.22.24304756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Most current studies rely on short-read sequencing to detect somatic structural variation (SV) in cancer genomes. Long-read sequencing offers the advantage of better mappability and long-range phasing, which results in substantial improvements in germline SV detection. However, current long-read SV detection methods do not generalize well to the analysis of somatic SVs in tumor genomes with complex rearrangements, heterogeneity, and aneuploidy. Here, we present Severus: a method for the accurate detection of different types of somatic SVs using a phased breakpoint graph approach. To benchmark various short- and long-read SV detection methods, we sequenced five tumor/normal cell line pairs with Illumina, Nanopore, and PacBio sequencing platforms; on this benchmark Severus showed the highest F1 scores (harmonic mean of the precision and recall) as compared to long-read and short-read methods. We then applied Severus to three clinical cases of pediatric cancer, demonstrating concordance with known genetic findings as well as revealing clinically relevant cryptic rearrangements missed by standard genomic panels.
Collapse
Affiliation(s)
- Ayse Keskus
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Asher Bryant
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Tanveer Ahmad
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Byunggil Yoo
- Children’s Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | | | - Anton Goretsky
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Ataberk Donmez
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Lisa A. Lansdon
- Children’s Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Isabel Rodriguez
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Rockville, MD, USA
| | - Jimin Park
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Yuelin Liu
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Xiwen Cui
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | | | | | - Samuel Sacco
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Jyoti Shetty
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Yongmei Zhao
- Sequencing Facility Bioinformatics Group, Biomedical Informatics and Data Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Bao Tran
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | | | | | | | | | | | | | - Erin K. Molloy
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Irina Pushel
- Children’s Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Erin Guest
- Children’s Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Tomi Pastinen
- Children’s Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Kishwar Shafin
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Rockville, MD, USA
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Salem Malikic
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Chi-Ping Day
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | | | - Cenk Sahinalp
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Michael Dean
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Rockville, MD, USA
| | - Midhat S. Farooqi
- Children’s Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | | | - Mikhail Kolmogorov
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| |
Collapse
|
16
|
Krupina K, Goginashvili A, Cleveland DW. Scrambling the genome in cancer: causes and consequences of complex chromosome rearrangements. Nat Rev Genet 2024; 25:196-210. [PMID: 37938738 PMCID: PMC10922386 DOI: 10.1038/s41576-023-00663-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/20/2023] [Indexed: 11/09/2023]
Abstract
Complex chromosome rearrangements, known as chromoanagenesis, are widespread in cancer. Based on large-scale DNA sequencing of human tumours, the most frequent type of complex chromosome rearrangement is chromothripsis, a massive, localized and clustered rearrangement of one (or a few) chromosomes seemingly acquired in a single event. Chromothripsis can be initiated by mitotic errors that produce a micronucleus encapsulating a single chromosome or chromosomal fragment. Rupture of the unstable micronuclear envelope exposes its chromatin to cytosolic nucleases and induces chromothriptic shattering. Found in up to half of tumours included in pan-cancer genomic analyses, chromothriptic rearrangements can contribute to tumorigenesis through inactivation of tumour suppressor genes, activation of proto-oncogenes, or gene amplification through the production of self-propagating extrachromosomal circular DNAs encoding oncogenes or genes conferring anticancer drug resistance. Here, we discuss what has been learned about the mechanisms that enable these complex genomic rearrangements and their consequences in cancer.
Collapse
Affiliation(s)
- Ksenia Krupina
- Department of Cellular and Molecular Medicine, University of California at San Diego, La Jolla, CA, USA
| | - Alexander Goginashvili
- Department of Cellular and Molecular Medicine, University of California at San Diego, La Jolla, CA, USA
| | - Don W Cleveland
- Department of Cellular and Molecular Medicine, University of California at San Diego, La Jolla, CA, USA.
| |
Collapse
|
17
|
Audano PA, Beck CR. Small polymorphisms are a source of ancestral bias in structural variant breakpoint placement. Genome Res 2024; 34:7-19. [PMID: 38176712 PMCID: PMC10904011 DOI: 10.1101/gr.278203.123] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 01/02/2024] [Indexed: 01/06/2024]
Abstract
High-quality genome assemblies and sophisticated algorithms have increased sensitivity for a wide range of variant types, and breakpoint accuracy for structural variants (SVs, ≥50 bp) has improved to near base pair precision. Despite these advances, many SV breakpoint locations are subject to systematic bias affecting variant representation. To understand why SV breakpoints are inconsistent across samples, we reanalyzed 64 phased haplotypes constructed from long-read assemblies released by the Human Genome Structural Variation Consortium (HGSVC). We identify 882 SV insertions and 180 SV deletions with variable breakpoints not anchored in tandem repeats (TRs) or segmental duplications (SDs). SVs called from aligned sequencing reads increase breakpoint disagreements by 2×-16×. Sequence accuracy had a minimal impact on breakpoints, but we observe a strong effect of ancestry. We confirm that SNP and indel polymorphisms are enriched at shifted breakpoints and are also absent from variant callsets. Breakpoint homology increases the likelihood of imprecise SV calls and the distance they are shifted, and tandem duplications are the most heavily affected SVs. Because graph genome methods normalize SV calls across samples, we investigated graphs generated by two different methods and find the resulting breakpoints are subject to other technical biases affecting breakpoint accuracy. The breakpoint inconsistencies we characterize affect ∼5% of the SVs called in a human genome and can impact variant interpretation and annotation. These limitations underscore a need for algorithm development to improve SV databases, mitigate the impact of ancestry on breakpoints, and increase the value of callsets for investigating breakpoint features.
Collapse
Affiliation(s)
- Peter A Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA;
- Department of Genetics and Genome Sciences, Institute for Systems Genomics, University of Connecticut Health Center, Farmington, Connecticut 06030, USA
| |
Collapse
|
18
|
Lysenkova Wiklander M, Övernäs E, Lagensjö J, Raine A, Petri A, Wiman AC, Ramsell J, Marincevic-Zuniga Y, Gezelius H, Martin T, Bunikis I, Ekberg S, Erlandsson R, Larsson P, Mosbech MB, Häggqvist S, Hellstedt Kerje S, Feuk L, Ameur A, Liljedahl U, Nordlund J. Genomic, transcriptomic and epigenomic sequencing data of the B-cell leukemia cell line REH. BMC Res Notes 2023; 16:265. [PMID: 37817248 PMCID: PMC10566058 DOI: 10.1186/s13104-023-06537-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 09/25/2023] [Indexed: 10/12/2023] Open
Abstract
OBJECTIVES The aim of this data paper is to describe a collection of 33 genomic, transcriptomic and epigenomic sequencing datasets of the B-cell acute lymphoblastic leukemia (ALL) cell line REH. REH is one of the most frequently used cell lines for functional studies of pediatric ALL, and these data provide a multi-faceted characterization of its molecular features. The datasets described herein, generated with short- and long-read sequencing technologies, can both provide insights into the complex aberrant karyotype of REH, and be used as reference datasets for sequencing data quality assessment or for methods development. DATA DESCRIPTION This paper describes 33 datasets corresponding to 867 gigabases of raw sequencing data generated from the REH cell line. These datasets include five different approaches for whole genome sequencing (WGS) on four sequencing platforms, two RNA sequencing (RNA-seq) techniques on two different sequencing platforms, DNA methylation sequencing, and single-cell ATAC-sequencing.
Collapse
Affiliation(s)
- Mariya Lysenkova Wiklander
- Department of Medical Sciences and Science for Life Laboratory, Uppsala University, Box 1432, Uppsala, SE-751 44, Sweden
| | - Elin Övernäs
- Department of Medical Sciences and Science for Life Laboratory, Uppsala University, Box 1432, Uppsala, SE-751 44, Sweden
| | - Johanna Lagensjö
- Department of Medical Sciences and Science for Life Laboratory, Uppsala University, Box 1432, Uppsala, SE-751 44, Sweden
| | - Amanda Raine
- Department of Medical Sciences and Science for Life Laboratory, Uppsala University, Box 1432, Uppsala, SE-751 44, Sweden
| | - Anna Petri
- Department of Immunology, Genetics and Pathology and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Ann-Christin Wiman
- Department of Medical Sciences and Science for Life Laboratory, Uppsala University, Box 1432, Uppsala, SE-751 44, Sweden
| | - Jon Ramsell
- Department of Medical Sciences and Science for Life Laboratory, Uppsala University, Box 1432, Uppsala, SE-751 44, Sweden
| | - Yanara Marincevic-Zuniga
- Department of Medical Sciences and Science for Life Laboratory, Uppsala University, Box 1432, Uppsala, SE-751 44, Sweden
| | - Henrik Gezelius
- Department of Medical Sciences and Science for Life Laboratory, Uppsala University, Box 1432, Uppsala, SE-751 44, Sweden
| | - Tom Martin
- Department of Medical Sciences and Science for Life Laboratory, Uppsala University, Box 1432, Uppsala, SE-751 44, Sweden
| | - Ignas Bunikis
- Department of Immunology, Genetics and Pathology and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Sara Ekberg
- Department of Medical Sciences and Science for Life Laboratory, Uppsala University, Box 1432, Uppsala, SE-751 44, Sweden
| | - Rikard Erlandsson
- Department of Medical Sciences and Science for Life Laboratory, Uppsala University, Box 1432, Uppsala, SE-751 44, Sweden
| | - Pontus Larsson
- Department of Medical Sciences and Science for Life Laboratory, Uppsala University, Box 1432, Uppsala, SE-751 44, Sweden
| | - Mai-Britt Mosbech
- Department of Immunology, Genetics and Pathology and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Susana Häggqvist
- Department of Immunology, Genetics and Pathology and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Susanne Hellstedt Kerje
- Department of Immunology, Genetics and Pathology and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Lars Feuk
- Department of Immunology, Genetics and Pathology and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Adam Ameur
- Department of Immunology, Genetics and Pathology and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Ulrika Liljedahl
- Department of Medical Sciences and Science for Life Laboratory, Uppsala University, Box 1432, Uppsala, SE-751 44, Sweden
| | - Jessica Nordlund
- Department of Medical Sciences and Science for Life Laboratory, Uppsala University, Box 1432, Uppsala, SE-751 44, Sweden.
| |
Collapse
|
19
|
Yang W, Ma W, Huang J, Cai Y, Peng X, Zhao F, Zhang D, Zou Z, Sun H, Qi X, Ge M. Beijing Children's Hospital guidelines on the design and conduction of the first standardized database for medulloblastoma. Metab Brain Dis 2023; 38:2393-2400. [PMID: 37261631 DOI: 10.1007/s11011-023-01233-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 05/09/2023] [Indexed: 06/02/2023]
Abstract
Medulloblastoma (MB) is one of the most common malignant childhood brain tumors (WHO grade IV). Its high degree of malignancy leads to an unsatisfactory prognosis, requiring more precise and personalized treatment in the near future. Multi-omics and artificial intelligence have been playing a significant role in precise medical research, but their implementation needs a large amount of clinical information and biomaterials. For these reasons, it is urgent for current MB researchers to establish a large sample-size database of MB that contains complete clinical data and sufficient biomaterials such as blood, cerebrospinal fluid (CSF), cancer tissue, and urine. Unfortunately, there are few biobanks of pediatric central nervous system (CNS) tumors throughout the world for limited specimens, scarce funds, different standards collecting methods and et cl. Even though, China falls behind western countries in this area. The present research set up a standard workflow to construct the Beijing Children's Hospital Medulloblastoma (BCH-MB) biobank. Clinical data from children with MB and for collecting and storing biomaterials, along with regular follow-up has been collected and recorded in this database. In the future, the BCH-MB biobank could make it possible to validate the promising biomarkers already identified, discover unrevealed MB biomarkers, develop novel therapies, and establish personalized prognostic models for children with MB upon the support of its sufficient data and biomaterials, laying the foundation for individualized therapies of children with MB.
Collapse
Affiliation(s)
- Wei Yang
- Department of Neurosurgery, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, 100045, China
| | - Wenping Ma
- Department of Neurosurgery, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, 100045, China
| | - Jiansong Huang
- Department of Neurosurgery, Peking University International Hospital, Peking University Health Science Center, Peking University, Beijing, 102200, China
| | - Yingjie Cai
- Department of Neurosurgery, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, 100045, China
| | - Xiaojiao Peng
- Department of Neurosurgery, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, 100045, China
| | - Fengmao Zhao
- Department of Neurosurgery, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, 100045, China
| | - Di Zhang
- Department of Neurosurgery, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, 100045, China
| | - Zhewei Zou
- Department of Neurosurgery, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, 100045, China
| | - Hailang Sun
- Department of Neurosurgery, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, 100045, China.
| | - Xiang Qi
- Department of Neurosurgery, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, 100045, China.
| | - Ming Ge
- Department of Neurosurgery, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, 100045, China.
| |
Collapse
|
20
|
Calvo-Roitberg E, Daniels RF, Pai AA. Challenges in identifying mRNA transcript starts and ends from long-read sequencing data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.26.550536. [PMID: 37546743 PMCID: PMC10402045 DOI: 10.1101/2023.07.26.550536] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Long-read sequencing (LRS) technologies have the potential to revolutionize scientific discoveries in RNA biology, especially by enabling the comprehensive identification and quantification of full length mRNA isoforms. However, inherently high error rates make the analysis of long-read sequencing data challenging. While these error rates have been characterized for sequence and splice site identification, it is still unclear how accurately LRS reads represent transcript start and end sites. Here, we systematically assess the variability and accuracy of mRNA terminal ends identified by LRS reads across multiple sequencing platforms. We find substantial inconsistencies in both the start and end coordinates of LRS reads spanning a gene, such that LRS reads often fail to accurately recapitulate annotated or empirically derived terminal ends of mRNA molecules. To address this challenge, we introduce an approach to condition reads based on empirically derived terminal ends and identified a subset of reads that are more likely to represent full-length transcripts. Our approach can improve transcriptome analyses by enhancing the fidelity of transcript terminal end identification, but may result in lower power to quantify genes or discover novel isoforms. Thus, it is necessary to be cautious when selecting sequencing approaches and/or interpreting data from long-read RNA sequencing.
Collapse
Affiliation(s)
| | - Rachel F Daniels
- RNA Therapeutics Institute, University of Massachusetts Chan Medical School, Worcester, MA
| | - Athma A Pai
- RNA Therapeutics Institute, University of Massachusetts Chan Medical School, Worcester, MA
| |
Collapse
|
21
|
Audano PA, Beck CR. Small allelic variants are a source of ancestral bias in structural variant breakpoint placement. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.25.546295. [PMID: 37425850 PMCID: PMC10327140 DOI: 10.1101/2023.06.25.546295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
High-quality genome assemblies and sophisticated algorithms have increased sensitivity for a wide range of variant types, and breakpoint accuracy for structural variants (SVs, ≥ 50 bp) has improved to near basepair precision. Despite these advances, many SVs in unique regions of the genome are subject to systematic bias that affects breakpoint location. This ambiguity leads to less accurate variant comparisons across samples, and it obscures true breakpoint features needed for mechanistic inferences. To understand why SVs are not consistently placed, we reanalyzed 64 phased haplotypes constructed from long-read assemblies released by the Human Genome Structural Variation Consortium (HGSVC). We identified variable breakpoints for 882 SV insertions and 180 SV deletions not anchored in tandem repeats (TRs) or segmental duplications (SDs). While this is unexpectedly high for genome assemblies in unique loci, we find read-based callsets from the same sequencing data yielded 1,566 insertions and 986 deletions with inconsistent breakpoints also not anchored in TRs or SDs. When we investigated causes for breakpoint inaccuracy, we found sequence and assembly errors had minimal impact, but we observed a strong effect of ancestry. We confirmed that polymorphic mismatches and small indels are enriched at shifted breakpoints and that these polymorphisms are generally lost when breakpoints shift. Long tracts of homology, such as SVs mediated by transposable elements, increase the likelihood of imprecise SV calls and the distance they are shifted. Tandem Duplication (TD) breakpoints are the most heavily affected SV class with 14% of TDs placed at different locations across haplotypes. While graph genome methods normalize SV calls across many samples, the resulting breakpoints are sometimes incorrect, highlighting a need to tune graph methods for breakpoint accuracy. The breakpoint inconsistencies we characterize collectively affect ~5% of the SVs called in a human genome and underscore a need for algorithm development to improve SV databases, mitigate the impact of ancestry on breakpoint placement, and increase the value of callsets for investigating mutational processes.
Collapse
Affiliation(s)
- Peter A Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Department of Genetics and Genome Sciences, Institute for Systems Genomics, University of Connecticut Health Center, Farmington, CT, USA
| |
Collapse
|