1
|
Qin Q, Popic V, Yu H, White E, Khorgade A, Shin A, Wienand K, Dondi A, Beerenwinkel N, Vazquez F, Al’Khafaji AM, Haas BJ. CTAT-LR-fusion: accurate fusion transcript identification from long and short read isoform sequencing at bulk or single cell resolution. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.24.581862. [PMID: 38464114 PMCID: PMC10925146 DOI: 10.1101/2024.02.24.581862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Gene fusions are found as cancer drivers in diverse adult and pediatric cancers. Accurate detection of fusion transcripts is essential in cancer clinical diagnostics, prognostics, and for guiding therapeutic development. Most currently available methods for fusion transcript detection are compatible with Illumina RNA-seq involving highly accurate short read sequences. Recent advances in long read isoform sequencing enable the detection of fusion transcripts at unprecedented resolution in bulk and single cell samples. Here we developed a new computational tool CTAT-LR-fusion to detect fusion transcripts from long read RNA-seq with or without companion short reads, with applications to bulk or single cell transcriptomes. We demonstrate that CTAT-LR-fusion exceeds fusion detection accuracy of alternative methods as benchmarked with simulated and real long read RNA-seq. Using short and long read RNA-seq, we further apply CTAT-LR-fusion to bulk transcriptomes of nine tumor cell lines, and to tumor single cells derived from a melanoma sample and three metastatic high grade serous ovarian carcinoma samples. In both bulk and in single cell RNA-seq, long isoform reads yielded higher sensitivity for fusion detection than short reads with notable exceptions. By combining short and long reads in CTAT-LR-fusion, we are able to further maximize detection of fusion splicing isoforms and fusion-expressing tumor cells. CTAT-LR-fusion is available at https://github.com/TrinityCTAT/CTAT-LR-fusion/wiki.
Collapse
Affiliation(s)
- Qian Qin
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142 USA
| | - Victoria Popic
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142 USA
| | - Houlin Yu
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142 USA
| | - Emily White
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142 USA
| | - Akanksha Khorgade
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142 USA
| | - Asa Shin
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142 USA
| | - Kirsty Wienand
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142 USA
| | - Arthur Dondi
- ETH Zurich, Department of Biosystems Science and Engineering, Schanzenstrasse 44, 4056 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Schanzenstrasse 44, 4056 Basel, Switzerland
| | - Niko Beerenwinkel
- ETH Zurich, Department of Biosystems Science and Engineering, Schanzenstrasse 44, 4056 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Schanzenstrasse 44, 4056 Basel, Switzerland
| | - Francisca Vazquez
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142 USA
| | - Aziz M. Al’Khafaji
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142 USA
| | - Brian J. Haas
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142 USA
| |
Collapse
|
2
|
Karaoğlanoğlu F, Orabi B, Flannigan R, Chauve C, Hach F. TKSM: highly modular, user-customizable, and scalable transcriptomic sequencing long-read simulator. Bioinformatics 2024; 40:btae051. [PMID: 38273664 PMCID: PMC10868325 DOI: 10.1093/bioinformatics/btae051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 01/10/2024] [Accepted: 01/23/2024] [Indexed: 01/27/2024] Open
Abstract
MOTIVATION Transcriptomic long-read (LR) sequencing is an increasingly cost-effective technology for probing various RNA features. Numerous tools have been developed to tackle various transcriptomic sequencing tasks (e.g. isoform and gene fusion detection). However, the lack of abundant gold-standard datasets hinders the benchmarking of such tools. Therefore, the simulation of LR sequencing is an important and practical alternative. While the existing LR simulators aim to imitate the sequencing machine noise and to target specific library protocols, they lack some important library preparation steps (e.g. PCR) and are difficult to modify to new and changing library preparation techniques (e.g. single-cell LRs). RESULTS We present TKSM, a modular and scalable LR simulator, designed so that each RNA modification step is targeted explicitly by a specific module. This allows the user to assemble a simulation pipeline as a combination of TKSM modules to emulate a specific sequencing design. Additionally, the input/output of all the core modules of TKSM follows the same simple format (Molecule Description Format) allowing the user to easily extend TKSM with new modules targeting new library preparation steps. AVAILABILITY AND IMPLEMENTATION TKSM is available as an open source software at https://github.com/vpc-ccg/tksm.
Collapse
Affiliation(s)
- Fatih Karaoğlanoğlu
- Computing Science Department, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| | - Baraa Orabi
- Department of Computer Science, the University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - Ryan Flannigan
- Department of Urologic Sciences, the University of British Columbia, Vancouver, BC V5Z 1M9, Canada
- Vancouver Prostate Centre, Vancouver, BC V6H 3Z6, Canada
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| | - Faraz Hach
- Department of Computer Science, the University of British Columbia, Vancouver, BC V6T 1Z4, Canada
- Department of Urologic Sciences, the University of British Columbia, Vancouver, BC V5Z 1M9, Canada
- Vancouver Prostate Centre, Vancouver, BC V6H 3Z6, Canada
| |
Collapse
|
3
|
Shi Q, Li X, Liu Y, Chen Z, He X. FLIBase: a comprehensive repository of full-length isoforms across human cancers and tissues. Nucleic Acids Res 2024; 52:D124-D133. [PMID: 37697439 PMCID: PMC10767943 DOI: 10.1093/nar/gkad745] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 08/14/2023] [Accepted: 08/31/2023] [Indexed: 09/13/2023] Open
Abstract
Regulatory processes at the RNA transcript level play a crucial role in generating transcriptome diversity and proteome composition in human cells, impacting both physiological and pathological states. This study introduces FLIBase (www.FLIBase.org), a specialized database that focuses on annotating full-length isoforms using long-read sequencing techniques. We collected and integrated long-read (351 samples) and short-read (12 469 samples) RNA sequencing data from diverse normal and cancerous human tissues and cells. The current version of FLIBase comprises a total of 983 789 full-length spliced isoforms, identified through long-read sequences and verified using short-read exon-exon splice junctions. Of these, 188 248 isoforms have been annotated, while 795 541 isoforms remain unannotated. By overcoming the limitations of short-read RNA sequencing methods, FLIBase provides an accurate and comprehensive representation of full-length transcripts. These comprehensive annotations empower researchers to undertake various downstream analyses and investigations. Importantly, FLIBase exhibits a significant advantage in identifying a substantial number of previously unannotated isoforms and tumor-specific RNA transcripts. These tumor-specific RNA transcripts have the potential to serve as a source of immunogenic recurrent neoantigens. This remarkable discovery holds tremendous promise for advancing the development of tailored RNA-based diagnostic and therapeutic strategies for various types of human cancer.
Collapse
Affiliation(s)
- Qili Shi
- Fudan University Shanghai Cancer Center and Institutes of Biomedical Sciences, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Xinrong Li
- Fudan University Shanghai Cancer Center and Institutes of Biomedical Sciences, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Yizhe Liu
- Fudan University Shanghai Cancer Center and Institutes of Biomedical Sciences, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Zhiao Chen
- Fudan University Shanghai Cancer Center and Institutes of Biomedical Sciences, Shanghai Medical College, Fudan University, Shanghai 200032, China
- Key Laboratory of Breast Cancer in Shanghai, Fudan University Shanghai Cancer Center, Fudan University, Shanghai 200032, China
- Shanghai Key Laboratory of Radiation Oncology, Fudan University Shanghai Cancer Center, Fudan University, Shanghai 200032, China
| | - Xianghuo He
- Fudan University Shanghai Cancer Center and Institutes of Biomedical Sciences, Shanghai Medical College, Fudan University, Shanghai 200032, China
- Key Laboratory of Breast Cancer in Shanghai, Fudan University Shanghai Cancer Center, Fudan University, Shanghai 200032, China
- Shanghai Key Laboratory of Radiation Oncology, Fudan University Shanghai Cancer Center, Fudan University, Shanghai 200032, China
| |
Collapse
|
4
|
Kainth AS, Haddad GA, Hall JM, Ruthenburg AJ. Merging short and stranded long reads improves transcript assembly. PLoS Comput Biol 2023; 19:e1011576. [PMID: 37883581 PMCID: PMC10629667 DOI: 10.1371/journal.pcbi.1011576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 11/07/2023] [Accepted: 10/05/2023] [Indexed: 10/28/2023] Open
Abstract
Long-read RNA sequencing has arisen as a counterpart to short-read sequencing, with the potential to capture full-length isoforms, albeit at the cost of lower depth. Yet this potential is not fully realized due to inherent limitations of current long-read assembly methods and underdeveloped approaches to integrate short-read data. Here, we critically compare the existing methods and develop a new integrative approach to characterize a particularly challenging pool of low-abundance long noncoding RNA (lncRNA) transcripts from short- and long-read sequencing in two distinct cell lines. Our analysis reveals severe limitations in each of the sequencing platforms. For short-read assemblies, coverage declines at transcript termini resulting in ambiguous ends, and uneven low coverage results in segmentation of a single transcript into multiple transcripts. Conversely, long-read sequencing libraries lack depth and strand-of-origin information in cDNA-based methods, culminating in erroneous assembly and quantitation of transcripts. We also discover a cDNA synthesis artifact in long-read datasets that markedly impacts the identity and quantitation of assembled transcripts. Towards remediating these problems, we develop a computational pipeline to "strand" long-read cDNA libraries that rectifies inaccurate mapping and assembly of long-read transcripts. Leveraging the strengths of each platform and our computational stranding, we also present and benchmark a hybrid assembly approach that drastically increases the sensitivity and accuracy of full-length transcript assembly on the correct strand and improves detection of biological features of the transcriptome. When applied to a challenging set of under-annotated and cell-type variable lncRNA, our method resolves the segmentation problem of short-read sequencing and the depth problem of long-read sequencing, resulting in the assembly of coherent transcripts with precise 5' and 3' ends. Our workflow can be applied to existing datasets for superior demarcation of transcript ends and refined isoform structure, which can enable better differential gene expression analyses and molecular manipulations of transcripts.
Collapse
Affiliation(s)
- Amoldeep S. Kainth
- Department of Molecular Genetics and Cell Biology, The University of Chicago, Chicago, Illinois, United States of America
| | - Gabriela A. Haddad
- Committee on Genetics, Genomics and Systems Biology, The University of Chicago, Chicago, Illinois, United States of America
| | - Johnathon M. Hall
- Department of Molecular Genetics and Cell Biology, The University of Chicago, Chicago, Illinois, United States of America
| | - Alexander J. Ruthenburg
- Department of Molecular Genetics and Cell Biology, The University of Chicago, Chicago, Illinois, United States of America
- Committee on Genetics, Genomics and Systems Biology, The University of Chicago, Chicago, Illinois, United States of America
- Department of Biochemistry and Molecular Biology, The University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
5
|
Sala-Torra O, Reddy S, Hung LH, Beppu L, Wu D, Radich J, Yeung KY, Yeung CCS. Rapid detection of myeloid neoplasm fusions using single-molecule long-read sequencing. PLOS GLOBAL PUBLIC HEALTH 2023; 3:e0002267. [PMID: 37699001 PMCID: PMC10497132 DOI: 10.1371/journal.pgph.0002267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 07/17/2023] [Indexed: 09/14/2023]
Abstract
Recurrent gene fusions are common drivers of disease pathophysiology in leukemias. Identifying these structural variants helps stratify disease by risk and assists with therapy choice. Precise molecular diagnosis in low-and-middle-income countries (LMIC) is challenging given the complexity of assays, trained technical support, and the availability of reliable electricity. Current fusion detection methods require a long turnaround time (7-10 days) or advance knowledge of the genes involved in the fusions. Recent technology developments have made sequencing possible without a sophisticated molecular laboratory, potentially making molecular diagnosis accessible to remote areas and low-income settings. We describe a long-read sequencing DNA assay designed with CRISPR guides to select and enrich for recurrent leukemia fusion genes, that does not need a priori knowledge of the abnormality present. By applying rapid sequencing technology based on nanopores, we sequenced long pieces of genomic DNA and successfully detected fusion genes in cell lines and primary specimens (e.g., BCR::ABL1, PML::RARA, CBFB::MYH11, KMT2A::AFF1) using cloud-based bioinformatics workflows with novel custom fusion finder software. We detected fusion genes in 100% of cell lines with the expected breakpoints and confirmed the presence or absence of a recurrent fusion gene in 12 of 14 patient cases. With our optimized assay and cloud-based bioinformatics workflow, these assays and analyses could be performed in under 8 hours. The platform's portability, potential for adaptation to lower-cost devices, and integrated cloud analysis make this assay a candidate to be placed in settings like LMIC to bridge the need of bedside rapid molecular diagnostics.
Collapse
Affiliation(s)
- Olga Sala-Torra
- Translational Science and Therapeutics Division, Fred Hutchinson Cancer Center, Seattle, Washington, United States of America
- University of Washington, Seattle, Washington, United States of America
| | - Shishir Reddy
- University of Washington, Seattle, Washington, United States of America
| | - Ling-Hong Hung
- University of Washington, Seattle, Washington, United States of America
| | - Lan Beppu
- Translational Science and Therapeutics Division, Fred Hutchinson Cancer Center, Seattle, Washington, United States of America
| | - David Wu
- School of Engineering and Technology, University of Washington Tacoma, Tacoma, Washington, United States of America
| | - Jerald Radich
- Translational Science and Therapeutics Division, Fred Hutchinson Cancer Center, Seattle, Washington, United States of America
- School of Engineering and Technology, University of Washington Tacoma, Tacoma, Washington, United States of America
| | - Ka Yee Yeung
- University of Washington, Seattle, Washington, United States of America
| | - Cecilia C. S. Yeung
- Translational Science and Therapeutics Division, Fred Hutchinson Cancer Center, Seattle, Washington, United States of America
- School of Engineering and Technology, University of Washington Tacoma, Tacoma, Washington, United States of America
| |
Collapse
|
6
|
van Belzen IAEM, Cai C, van Tuil M, Badloe S, Strengman E, Janse A, Verwiel ETP, van der Leest DFM, Kester L, Molenaar JJ, Meijerink J, Drost J, Peng WC, Kerstens HHD, Tops BBJ, Holstege FCP, Kemmeren P, Hehir-Kwa JY. Systematic discovery of gene fusions in pediatric cancer by integrating RNA-seq and WGS. BMC Cancer 2023; 23:618. [PMID: 37400763 DOI: 10.1186/s12885-023-11054-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Accepted: 03/08/2023] [Indexed: 07/05/2023] Open
Abstract
BACKGROUND Gene fusions are important cancer drivers in pediatric cancer and their accurate detection is essential for diagnosis and treatment. Clinical decision-making requires high confidence and precision of detection. Recent developments show RNA sequencing (RNA-seq) is promising for genome-wide detection of fusion products but hindered by many false positives that require extensive manual curation and impede discovery of pathogenic fusions. METHODS We developed Fusion-sq to overcome existing disadvantages of detecting gene fusions. Fusion-sq integrates and "fuses" evidence from RNA-seq and whole genome sequencing (WGS) using intron-exon gene structure to identify tumor-specific protein coding gene fusions. Fusion-sq was then applied to the data generated from a pediatric pan-cancer cohort of 128 patients by WGS and RNA sequencing. RESULTS In a pediatric pan-cancer cohort of 128 patients, we identified 155 high confidence tumor-specific gene fusions and their underlying structural variants (SVs). This includes all clinically relevant fusions known to be present in this cohort (30 patients). Fusion-sq distinguishes healthy-occurring from tumor-specific fusions and resolves fusions in amplified regions and copy number unstable genomes. A high gene fusion burden is associated with copy number instability. We identified 27 potentially pathogenic fusions involving oncogenes or tumor-suppressor genes characterized by underlying SVs, in some cases leading to expression changes indicative of activating or disruptive effects. CONCLUSIONS Our results indicate how clinically relevant and potentially pathogenic gene fusions can be identified and their functional effects investigated by combining WGS and RNA-seq. Integrating RNA fusion predictions with underlying SVs advances fusion detection beyond extensive manual filtering. Taken together, we developed a method for identifying candidate gene fusions that is suitable for precision oncology applications. Our method provides multi-omics evidence for assessing the pathogenicity of tumor-specific gene fusions for future clinical decision making.
Collapse
Affiliation(s)
| | - Casey Cai
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | - Marc van Tuil
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | - Shashi Badloe
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | - Eric Strengman
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | - Alex Janse
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | | | | | - Lennart Kester
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | - Jan J Molenaar
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
- Department of Pharmaceutical Sciences, Utrecht University, Utrecht, The Netherlands
| | - Jules Meijerink
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | - Jarno Drost
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
| | - Weng Chuan Peng
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | | | - Bastiaan B J Tops
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | | | - Patrick Kemmeren
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands.
- Center for Molecular Medicine, UMC Utrecht and Utrecht University, Utrecht, The Netherlands.
| | - Jayne Y Hehir-Kwa
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands.
| |
Collapse
|
7
|
Haas BJ, Dobin A, Ghandi M, Van Arsdale A, Tickle T, Robinson JT, Gillani R, Kasif S, Regev A. Targeted in silico characterization of fusion transcripts in tumor and normal tissues via FusionInspector. CELL REPORTS METHODS 2023; 3:100467. [PMID: 37323575 PMCID: PMC10261907 DOI: 10.1016/j.crmeth.2023.100467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 02/28/2023] [Accepted: 04/14/2023] [Indexed: 06/17/2023]
Abstract
Here, we present FusionInspector for in silico characterization and interpretation of candidate fusion transcripts from RNA sequencing (RNA-seq) and exploration of their sequence and expression characteristics. We applied FusionInspector to thousands of tumor and normal transcriptomes and identified statistical and experimental features enriched among biologically impactful fusions. Through clustering and machine learning, we identified large collections of fusions potentially relevant to tumor and normal biological processes. We show that biologically relevant fusions are enriched for relatively high expression of the fusion transcript, imbalanced fusion allelic ratios, and canonical splicing patterns, and are deficient in sequence microhomologies between partner genes. We demonstrate that FusionInspector accurately validates fusion transcripts in silico and helps characterize numerous understudied fusions in tumor and normal tissue samples. FusionInspector is freely available as open source for screening, characterization, and visualization of candidate fusions via RNA-seq, and facilitates transparent explanation and interpretation of machine-learning predictions and their experimental sources.
Collapse
Affiliation(s)
- Brian J. Haas
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Graduate Program in Bioinformatics, Boston University, Boston, MA 02215, USA
| | | | | | - Anne Van Arsdale
- Department of Obstetrics and Gynecology and Women’s Health, Albert Einstein Montefiore Medical Center, Bronx, NY 10461, USA
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| | - Timothy Tickle
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - James T. Robinson
- School of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Riaz Gillani
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA 02215, USA
- Boston Children’s Hospital, Boston, MA 02115, USA
| | - Simon Kasif
- Graduate Program in Bioinformatics, Boston University, Boston, MA 02215, USA
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
| | - Aviv Regev
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
| |
Collapse
|
8
|
Chen Y, Wang Y, Chen W, Tan Z, Song Y, Chen H, Chong Z. Gene Fusion Detection and Characterization in Long-Read Cancer Transcriptome Sequencing Data with FusionSeeker. Cancer Res 2023; 83:28-33. [PMID: 36318117 PMCID: PMC9812290 DOI: 10.1158/0008-5472.can-22-1628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 09/07/2022] [Accepted: 10/28/2022] [Indexed: 01/05/2023]
Abstract
Gene fusions are prevalent in a wide array of cancer types with different frequencies. Long-read transcriptome sequencing technologies, such as PacBio, Iso-Seq, and Nanopore direct RNA sequencing, provide full-length transcript sequencing reads, which could facilitate detection of gene fusions. In this work, we developed a method, FusionSeeker, to comprehensively characterize gene fusions in long-read cancer transcriptome data and reconstruct accurate fused transcripts from raw reads. FusionSeeker identified gene fusions in both exonic and intronic regions, allowing comprehensive characterization of gene fusions in cancer transcriptomes. Fused transcript sequences were reconstructed with FusionSeeker by correcting sequencing errors in the raw reads through partial order alignment algorithm. Using these accurate transcript sequences, FusionSeeker refined gene fusion breakpoint positions and predicted breakpoints at single bp resolution. Overall, FusionSeeker will enable users to discover gene fusions accurately using long-read data, which can facilitate downstream functional analysis as well as improved cancer diagnosis and treatment. SIGNIFICANCE FusionSeeker is a new method to discover gene fusions and reconstruct fused transcript sequences in long-read cancer transcriptome sequencing data to help identify novel gene fusions important for tumorigenesis and progression.
Collapse
Affiliation(s)
- Yu Chen
- Department of Genetics, Heersink School of Medicine, University of Alabama at Birmingham, AL 35294, USA.,Informatics Institute, Heersink School of Medicine, University of Alabama at Birmingham, AL 35294, USA
| | - Yiqing Wang
- Department of Computer Science, College of Arts and Sciences, University of Alabama at Birmingham, AL 35294, USA
| | - Weisheng Chen
- Department of Surgery, Heersink School of Medicine, University of Alabama at Birmingham, AL 35294, USA
| | - Zhengzhi Tan
- Department of Computer Science, College of Arts and Sciences, University of Alabama at Birmingham, AL 35294, USA
| | - Yuwei Song
- Department of Genetics, Heersink School of Medicine, University of Alabama at Birmingham, AL 35294, USA.,Informatics Institute, Heersink School of Medicine, University of Alabama at Birmingham, AL 35294, USA
| | | | - Herbert Chen
- Department of Surgery, Heersink School of Medicine, University of Alabama at Birmingham, AL 35294, USA.,Department of Biomedical Engineering, School of Engineering, University of Alabama at Birmingham, AL 35294, USA
| | - Zechen Chong
- Department of Genetics, Heersink School of Medicine, University of Alabama at Birmingham, AL 35294, USA.,Informatics Institute, Heersink School of Medicine, University of Alabama at Birmingham, AL 35294, USA.,Correspondence author: Zechen Chong, Ph.D., Mailing address: THT 134, 1720 2 Ave S, Birmingham AL 35226, Phone: 205-801-7590,
| |
Collapse
|
9
|
Wu S, Schmitz U. Single-cell and long-read sequencing to enhance modelling of splicing and cell-fate determination. Comput Struct Biotechnol J 2023; 21:2373-2380. [PMID: 37066125 PMCID: PMC10091034 DOI: 10.1016/j.csbj.2023.03.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 03/13/2023] [Accepted: 03/13/2023] [Indexed: 04/03/2023] Open
Abstract
Single-cell sequencing technologies have revolutionised the life sciences and biomedical research. Single-cell sequencing provides high-resolution data on cell heterogeneity, allowing high-fidelity cell type identification, and lineage tracking. Computational algorithms and mathematical models have been developed to make sense of the data, compensate for errors and simulate the biological processes, which has led to breakthroughs in our understanding of cell differentiation, cell-fate determination and tissue cell composition. The development of long-read (a.k.a. third-generation) sequencing technologies has produced powerful tools for investigating alternative splicing, isoform expression (at the RNA level), genome assembly and the detection of complex structural variants (at the DNA level). In this review, we provide an overview of the recent advancements in single-cell and long-read sequencing technologies, with a particular focus on the computational algorithms that help in correcting, analysing, and interpreting the resulting data. Additionally, we review some mathematical models that use single-cell and long-read sequencing data to study cell-fate determination and alternative splicing, respectively. Moreover, we highlight the emerging opportunities in modelling cell-fate determination that result from the combination of single-cell and long-read sequencing technologies.
Collapse
|
10
|
Dorney R, Dhungel BP, Rasko JEJ, Hebbard L, Schmitz U. Recent advances in cancer fusion transcript detection. Brief Bioinform 2022; 24:6918739. [PMID: 36527429 PMCID: PMC9851307 DOI: 10.1093/bib/bbac519] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 10/11/2022] [Accepted: 10/31/2022] [Indexed: 12/23/2022] Open
Abstract
Extensive investigation of gene fusions in cancer has led to the discovery of novel biomarkers and therapeutic targets. To date, most studies have neglected chromosomal rearrangement-independent fusion transcripts and complex fusion structures such as double or triple-hop fusions, and fusion-circRNAs. In this review, we untangle fusion-related terminology and propose a classification system involving both gene and transcript fusions. We highlight the importance of RNA-level fusions and how long-read sequencing approaches can improve detection and characterization. Moreover, we discuss novel bioinformatic tools to identify fusions in long-read sequencing data and strategies to experimentally validate and functionally characterize fusion transcripts.
Collapse
Affiliation(s)
- Ryley Dorney
- epartment of Molecular & Cell Biology, College of Public Health, Medical & Vet Sciences, James Cook University, Douglas, QLD 4811, Australia,Centre for Tropical Bioinformatics and Molecular Biology, Australian Institute of Tropical Health and Medicine, James Cook University, Cairns 4878, Australia
| | - Bijay P Dhungel
- Gene and Stem Cell Therapy Program Centenary Institute, The University of Sydney, Camperdown, NSW 2050, Australia,Faculty of Medicine & Health, The University of Sydney, Camperdown, NSW 2006, Australia,Centre for Tropical Bioinformatics and Molecular Biology, Australian Institute of Tropical Health and Medicine, James Cook University, Cairns 4878, Australia
| | - John E J Rasko
- Gene and Stem Cell Therapy Program Centenary Institute, The University of Sydney, Camperdown, NSW 2050, Australia,Faculty of Medicine & Health, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Lionel Hebbard
- epartment of Molecular & Cell Biology, College of Public Health, Medical & Vet Sciences, James Cook University, Douglas, QLD 4811, Australia,Storr Liver Centre, Westmead Institute for Medical Research, Westmead Hospital and University of Sydney, Sydney, New South Wales, Australia
| | - Ulf Schmitz
- Corresponding author. Ulf Schmitz, Department of Molecular and Cell Biology, College of Public Health, Medical and Vet Sciences, James Cook University, Douglas, QLD 4811, Australia. E-mail:
| |
Collapse
|
11
|
Kiyose H, Nakagawa H, Ono A, Aikata H, Ueno M, Hayami S, Yamaue H, Chayama K, Shimada M, Wong JH, Fujimoto A. Comprehensive analysis of full-length transcripts reveals novel splicing abnormalities and oncogenic transcripts in liver cancer. PLoS Genet 2022; 18:e1010342. [PMID: 35926060 PMCID: PMC9380957 DOI: 10.1371/journal.pgen.1010342] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 08/16/2022] [Accepted: 07/14/2022] [Indexed: 12/24/2022] Open
Abstract
Genes generate transcripts of various functions by alternative splicing. However, in most transcriptome studies, short-reads sequencing technologies (next-generation sequencers) have been used, leaving full-length transcripts unobserved directly. Although long-reads sequencing technologies would enable the sequencing of full-length transcripts, the data analysis is difficult. In this study, we developed an analysis pipeline named SPLICE and analyzed cDNA sequences from 42 pairs of hepatocellular carcinoma (HCC) and matched non-cancerous livers with an Oxford Nanopore sequencer. Our analysis detected 46,663 transcripts from the protein-coding genes in the HCCs and the matched non-cancerous livers, of which 5,366 (11.5%) were novel. A comparison of expression levels identified 9,933 differentially expressed transcripts (DETs) in 4,744 genes. Interestingly, 746 genes with DETs, including the LINE1-MET transcript, were not found by a gene-level analysis. We also found that fusion transcripts of transposable elements and hepatitis B virus (HBV) were overexpressed in HCCs. In vitro experiments on DETs showed that LINE1-MET and HBV-human transposable elements promoted cell growth. Furthermore, fusion gene detection showed novel recurrent fusion events that were not detected in the short-reads. These results suggest the efficiency of full-length transcriptome studies and the importance of splicing variants in carcinogenesis. Genes generate transcripts of various functions by alternative splicing. However, in most transcriptome studies, short-reads sequencing technologies (next-generation sequencers) have been used, leaving full-length transcripts unobserved directly. In this study, we developed an analysis pipeline named SPLICE for long-read transcriptome sequencing and analyzed cDNA sequences from 42 pairs of hepatocellular carcinoma (HCC), and matched non-cancerous livers with an Oxford Nanopore sequencer. Our analysis detected 5,366 novel transcripts and 9,933 differentially expressed transcripts in 4,744 genes between HCCs and non-cancerous livers. An analysis of hepatitis B virus (HBV) transcripts showed that fusion transcripts of the HBV gene and human transposable elements were overexpressed in HBV-infected HCCs. We also identified fusion genes that were not found in the short-reads. These results suggest that long-reads sequencing technologies provide a fuller understanding of cancer transcripts and that our method contributes to the analysis of transcriptome sequences by such technologies.
Collapse
Affiliation(s)
- Hiroki Kiyose
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
- Department of Drug Discovery Medicine, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Hidewaki Nakagawa
- Laboratory for Cancer Genomics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Atsushi Ono
- Department of Gastroenterology and Metabolism, Graduate School of Biomedical and Health Sciences, Hiroshima University, Hiroshima, Japan
| | - Hiroshi Aikata
- Department of Gastroenterology and Metabolism, Graduate School of Biomedical and Health Sciences, Hiroshima University, Hiroshima, Japan
| | - Masaki Ueno
- Department of Surgery II, Wakayama Medical University, Wakayama, Japan
| | - Shinya Hayami
- Department of Surgery II, Wakayama Medical University, Wakayama, Japan
| | - Hiroki Yamaue
- Department of Surgery II, Wakayama Medical University, Wakayama, Japan
| | - Kazuaki Chayama
- Collaborative Research Laboratory of Medical Innovation, Graduate School of Biomedical and Health Sciences, Hiroshima University, Hiroshima, Japan
- Research Center for Hepatology and Gastroenterology, Graduate School of Biomedical and Health Sciences, Hiroshima University, Hiroshima, Japan
- RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Mihoko Shimada
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Jing Hao Wong
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Akihiro Fujimoto
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
- * E-mail:
| |
Collapse
|
12
|
Vo T, Brownmiller T, Hall K, Jones TL, Choudhari S, Grammatikakis I, Ludwig KR, Caplen NJ. HNRNPH1 destabilizes the G-quadruplex structures formed by G-rich RNA sequences that regulate the alternative splicing of an oncogenic fusion transcript. Nucleic Acids Res 2022; 50:6474-6496. [PMID: 35639772 DOI: 10.1093/nar/gkac409] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2021] [Revised: 04/07/2022] [Accepted: 05/06/2022] [Indexed: 11/13/2022] Open
Abstract
In the presence of physiological monovalent cations, thousands of RNA G-rich sequences can form parallel G-quadruplexes (G4s) unless RNA-binding proteins inhibit, destabilize, or resolve the formation of such secondary RNA structures. Here, we have used a disease-relevant model system to investigate the biophysical properties of the RNA-binding protein HNRNPH1's interaction with G-rich sequences. We demonstrate the importance of two EWSR1-exon 8 G-rich regions in mediating the exclusion of this exon from the oncogenic EWS-FLI1 transcripts expressed in a subset of Ewing sarcomas, using complementary analysis of tumor data, long-read sequencing, and minigene studies. We determined that HNRNPH1 binds the EWSR1-exon 8 G-rich sequences with low nM affinities irrespective of whether in a non-G4 or G4 state but exhibits different kinetics depending on RNA structure. Specifically, HNRNPH1 associates and dissociates from G4-folded RNA faster than the identical sequences in a non-G4 state. Importantly, we demonstrate using gel shift and spectroscopic assays that HNRNPH1, particularly the qRRM1-qRRM2 domains, destabilizes the G4s formed by the EWSR1-exon 8 G-rich sequences in a non-catalytic fashion. Our results indicate that HNRNPH1's binding of G-rich sequences favors the accumulation of RNA in a non-G4 state and that this contributes to its regulation of RNA processing.
Collapse
Affiliation(s)
- Tam Vo
- Functional Genetics Section, Genetics Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Tayvia Brownmiller
- Functional Genetics Section, Genetics Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Katherine Hall
- Functional Genetics Section, Genetics Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Tamara L Jones
- Functional Genetics Section, Genetics Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Sulbha Choudhari
- CCR-SF Bioinformatics Group, Biomedical Informatics and Data Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, 21702, USA
| | - Ioannis Grammatikakis
- Regulatory RNAs and Cancer Section, Genetics Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Katelyn R Ludwig
- Functional Genetics Section, Genetics Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Natasha J Caplen
- Functional Genetics Section, Genetics Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| |
Collapse
|
13
|
Liu T, He J, Dong K, Wang X, Zhang L, Ren R, Huang S, Sun X, Pan W, Wang W, Yang P, Yang T, Zhang Z. Genome-wide identification of quantitative trait loci for morpho-agronomic and yield-related traits in foxtail millet (Setaria italica) across multi-environments. Mol Genet Genomics 2022; 297:873-888. [PMID: 35451683 PMCID: PMC9130181 DOI: 10.1007/s00438-022-01894-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Accepted: 03/31/2022] [Indexed: 11/21/2022]
Abstract
Foxtail millet (Setaria italica) is an ideal model of genetic system for functional genomics of the Panicoideae crop. Identification of QTL responsible for morpho-agronomic and yield-related traits facilitates dissection of genetic control and breeding in cereal crops. Here, based on a Yugu1 × Longgu7 RIL population and genome-wide resequencing data, an updated linkage map harboring 2297 bin and 74 SSR markers was constructed, spanning 1315.1 cM with an average distance of 0.56 cM between adjacent markers. A total of 221 QTL for 17 morpho-agronomic and yield-related traits explaining 5.5 ~ 36% of phenotypic variation were identified across multi-environments. Of these, 109 QTL were detected in two to nine environments, including the most stable qLMS6.1 harboring a promising candidate gene Seita.6G250500, of which 70 were repeatedly identified in different trials in the same geographic location, suggesting that foxtail millet has more identical genetic modules under the similar ecological environment. One hundred-thirty QTL with overlapping intervals formed 22 QTL clusters. Furthermore, six superior recombinant inbred lines, RIL35, RIL48, RIL77, RIL80, RIL115 and RIL125 with transgressive inheritance and enrichment of favorable alleles in plant height, tiller, panicle morphology and yield related-traits were screened by hierarchical cluster. These identified QTL, QTL clusters and superior lines lay ground for further gene-trait association studies and breeding practice in foxtail millet.
Collapse
Affiliation(s)
- Tianpeng Liu
- College of Agronomy and Biotechnology, Southwest University, Chongqing, 400716, China
- Crop Research Institute, Gansu Academy of Agricultural Sciences, Lanzhou, 730070, China
| | - Jihong He
- Crop Research Institute, Gansu Academy of Agricultural Sciences, Lanzhou, 730070, China
| | - Kongjun Dong
- Crop Research Institute, Gansu Academy of Agricultural Sciences, Lanzhou, 730070, China
| | - Xuewen Wang
- Institute of Plant Breeding, Genetics, and Genomics, University of Georgia, Athens, GA, 30601, USA
| | - Lei Zhang
- Crop Research Institute, Gansu Academy of Agricultural Sciences, Lanzhou, 730070, China
| | - Ruiyu Ren
- Crop Research Institute, Gansu Academy of Agricultural Sciences, Lanzhou, 730070, China
| | - Sha Huang
- College of Agronomy and Biotechnology, Southwest University, Chongqing, 400716, China
| | - Xiaoting Sun
- College of Agronomy and Biotechnology, Southwest University, Chongqing, 400716, China
| | - Wanxiang Pan
- College of Life Science and Technology, Gansu Agricultural University, Lanzhou, 730070, China
| | - Wenwen Wang
- College of Agronomy and Biotechnology, Southwest University, Chongqing, 400716, China
| | - Peng Yang
- College of Agronomy and Biotechnology, Southwest University, Chongqing, 400716, China
| | - Tianyu Yang
- Crop Research Institute, Gansu Academy of Agricultural Sciences, Lanzhou, 730070, China.
| | - Zhengsheng Zhang
- College of Agronomy and Biotechnology, Southwest University, Chongqing, 400716, China.
| |
Collapse
|
14
|
Wang J, Bhakta N, Ayer Miller V, Revsine M, Litzow MR, Paietta E, Fedoriw Y, Roberts KG, Gu Z, Mullighan CG, Jones CD, Alexander TB. Acute Leukemia Classification Using Transcriptional Profiles From Low-Cost Nanopore mRNA Sequencing. JCO Precis Oncol 2022; 6:e2100326. [PMID: 35442720 PMCID: PMC9200386 DOI: 10.1200/po.21.00326] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
PURPOSE Most cases of pediatric acute leukemia occur in low- and middle-income countries, where health centers lack the tools required for accurate diagnosis and disease classification. Recent research shows the robustness of using unbiased short-read RNA sequencing to classify genomic subtypes of acute leukemia. Compared with short-read sequencing, nanopore sequencing has low capital and consumable costs, making it suitable for use in locations with limited health infrastructure. MATERIALS AND METHODS We show the feasibility of nanopore mRNA sequencing on 134 cryopreserved acute leukemia specimens (26 acute myeloid leukemia [AML], 73 B-lineage acute lymphoblastic leukemia [B-ALL], 34 T-lineage acute lymphoblastic leukemia, and one acute undifferentiated leukemia). Using multiple library preparation approaches, we generated long-read transcripts for each sample. We developed a novel composite classification approach to predict acute leukemia lineage and major B-ALL and AML molecular subtypes directly from gene expression profiles. RESULTS We demonstrate accurate classification of acute leukemia samples into AML, B-ALL, or T-lineage acute lymphoblastic leukemia (96.2% of cases are classifiable with a probability of > 0.8, with 100% accuracy) and further classification into clinically actionable genomic subtypes using shallow RNA nanopore sequencing, with 96.2% accuracy for major AML subtypes and 94.1% accuracy for major B-lineage acute lymphoblastic leukemia subtypes. CONCLUSION Transcriptional profiling of acute leukemia samples using nanopore technology for diagnostic classification is feasible and accurate, which has the potential to improve the accuracy of cancer diagnosis in low-resource settings.
Collapse
Affiliation(s)
- Jeremy Wang
- Department of Genetics, University of North Carolina, Chapel Hill, NC
| | - Nickhill Bhakta
- Department of Global Pediatric Medicine, St Jude Children's Research Hospital, Memphis, TN
| | - Vanessa Ayer Miller
- Office of Clinical Translational Research, University of North Carolina, Chapel Hill, NC
| | - Mahler Revsine
- Department of Biology, University of North Carolina, Chapel Hill, NC
| | - Mark R. Litzow
- Division of Hematology and Transplant Center, Mayo Clinic Rochester, Rochester, MN
| | | | - Yuri Fedoriw
- Department of Pathology and Laboratory Medicine, University of North Carolina, Chapel Hill, NC
| | - Kathryn G. Roberts
- Department of Pathology, St Jude Children's Research Hospital, Memphis, TN
| | - Zhaohui Gu
- Department of Computational and Quantitative Medicine & Systems Biology, Beckman Research Institute of City of Hope, Duarte, CA
| | | | - Corbin D. Jones
- Department of Biology, University of North Carolina, Chapel Hill, NC
| | - Thomas B. Alexander
- Department of Pathology and Laboratory Medicine, University of North Carolina, Chapel Hill, NC,Department of Pediatrics, University of North Carolina, Chapel Hill, NC,Thomas B. Alexander, MD, MPH, Department of Pediatrics and Department of Pathology and Laboratory Medicine, University of North Carolina Chapel Hill, 170 Manning Dr, 1185A Houpt Building, CB#7236, Chapel Hill, NC 27599;e-mail:
| |
Collapse
|
15
|
Davidson NM, Chen Y, Sadras T, Ryland GL, Blombery P, Ekert PG, Göke J, Oshlack A. JAFFAL: detecting fusion genes with long-read transcriptome sequencing. Genome Biol 2022; 23:10. [PMID: 34991664 PMCID: PMC8739696 DOI: 10.1186/s13059-021-02588-5] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Accepted: 12/22/2021] [Indexed: 12/26/2022] Open
Abstract
In cancer, fusions are important diagnostic markers and targets for therapy. Long-read transcriptome sequencing allows the discovery of fusions with their full-length isoform structure. However, due to higher sequencing error rates, fusion finding algorithms designed for short reads do not work. Here we present JAFFAL, to identify fusions from long-read transcriptome sequencing. We validate JAFFAL using simulations, cell lines, and patient data from Nanopore and PacBio. We apply JAFFAL to single-cell data and find fusions spanning three genes demonstrating transcripts detected from complex rearrangements. JAFFAL is available at https://github.com/Oshlack/JAFFA/wiki .
Collapse
Affiliation(s)
- Nadia M Davidson
- Peter MacCallum Cancer Centre, Victoria, Australia.
- School of BioSciences, University of Melbourne, Victoria, Australia.
- The Walter and Eliza Hall Institute, Victoria, Australia.
| | - Ying Chen
- Genome Institute of Singapore, Singapore, Singapore
| | - Teresa Sadras
- Peter MacCallum Cancer Centre, Victoria, Australia
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Victoria, Australia
| | - Georgina L Ryland
- Peter MacCallum Cancer Centre, Victoria, Australia
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Victoria, Australia
- Centre for Cancer Research, University of Melbourne, Victoria, Australia
| | - Piers Blombery
- Peter MacCallum Cancer Centre, Victoria, Australia
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Victoria, Australia
| | - Paul G Ekert
- Peter MacCallum Cancer Centre, Victoria, Australia
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Victoria, Australia
- Children's Cancer Institute, Lowy Cancer Centre, UNSW, Sydney, NSW, Australia
- School of Women's and Children's Health, UNSW, Sydney, NSW, Australia
- Murdoch Children's Research Institute, Victoria, Australia
| | - Jonathan Göke
- Genome Institute of Singapore, Singapore, Singapore
- National Cancer Centre Singapore, Singapore, Singapore
| | - Alicia Oshlack
- Peter MacCallum Cancer Centre, Victoria, Australia.
- School of BioSciences, University of Melbourne, Victoria, Australia.
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Victoria, Australia.
| |
Collapse
|
16
|
Nanopore sequencing technology, bioinformatics and applications. Nat Biotechnol 2021; 39:1348-1365. [PMID: 34750572 PMCID: PMC8988251 DOI: 10.1038/s41587-021-01108-x] [Citation(s) in RCA: 409] [Impact Index Per Article: 136.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Accepted: 09/22/2021] [Indexed: 12/13/2022]
Abstract
Rapid advances in nanopore technologies for sequencing single long DNA and RNA molecules have led to substantial improvements in accuracy, read length and throughput. These breakthroughs have required extensive development of experimental and bioinformatics methods to fully exploit nanopore long reads for investigations of genomes, transcriptomes, epigenomes and epitranscriptomes. Nanopore sequencing is being applied in genome assembly, full-length transcript detection and base modification detection and in more specialized areas, such as rapid clinical diagnoses and outbreak surveillance. Many opportunities remain for improving data quality and analytical approaches through the development of new nanopores, base-calling methods and experimental protocols tailored to particular applications.
Collapse
|