1
|
Su X, Lin Q, Liu B, Zhou C, Lu L, Lin Z, Si J, Ding Y, Duan S. The promising role of nanopore sequencing in cancer diagnostics and treatment. CELL INSIGHT 2025; 4:100229. [PMID: 39995512 PMCID: PMC11849079 DOI: 10.1016/j.cellin.2025.100229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/14/2024] [Revised: 01/13/2025] [Accepted: 01/14/2025] [Indexed: 02/26/2025]
Abstract
Cancer arises from genetic alterations that impact both the genome and transcriptome. The utilization of nanopore sequencing offers a powerful means of detecting these alterations due to its unique capacity for long single-molecule sequencing. In the context of DNA analysis, nanopore sequencing excels in identifying structural variations (SVs), copy number variations (CNVs), gene fusions within SVs, and mutations in specific genes, including those involving DNA modifications and DNA adducts. In the field of RNA research, nanopore sequencing proves invaluable in discerning differentially expressed transcripts, uncovering novel elements linked to transcriptional regulation, and identifying alternative splicing events and RNA modifications at the single-molecule level. Furthermore, nanopore sequencing extends its reach to detecting microorganisms, encompassing bacteria and viruses, that are intricately associated with tumorigenesis and the development of cancer. Consequently, the application prospects of nanopore sequencing in tumor diagnosis and personalized treatment are expansive, encompassing tasks such as tumor identification and classification, the tailoring of treatment strategies, and the screening of prospective patients. In essence, this technology stands poised to unearth novel mechanisms underlying tumorigenesis while providing dependable support for the diagnosis and treatment of cancer.
Collapse
Affiliation(s)
- Xinming Su
- Department of Clinical Medicine, School of Medicine, Hangzhou City University, Hangzhou 310015, Zhejiang, China
- Key Laboratory of Novel Targets and Drug Study for Neural Repair of Zhejiang Province, Hangzhou City University, Hangzhou 310015, Zhejiang, China
| | - Qingyuan Lin
- The Second Clinical Medical College, Zhejiang Chinese Medicine University BinJiang College, Hangzhou 310053, Zhejiang, China
| | - Bin Liu
- Department of Clinical Medicine, School of Medicine, Hangzhou City University, Hangzhou 310015, Zhejiang, China
| | - Chuntao Zhou
- Department of Clinical Medicine, School of Medicine, Hangzhou City University, Hangzhou 310015, Zhejiang, China
| | - Liuyi Lu
- Department of Clinical Medicine, School of Medicine, Hangzhou City University, Hangzhou 310015, Zhejiang, China
| | - Zihao Lin
- Department of Clinical Medicine, School of Medicine, Hangzhou City University, Hangzhou 310015, Zhejiang, China
| | - Jiahua Si
- Department of Clinical Medicine, School of Medicine, Hangzhou City University, Hangzhou 310015, Zhejiang, China
- Key Laboratory of Novel Targets and Drug Study for Neural Repair of Zhejiang Province, Hangzhou City University, Hangzhou 310015, Zhejiang, China
| | - Yuemin Ding
- Department of Clinical Medicine, School of Medicine, Hangzhou City University, Hangzhou 310015, Zhejiang, China
- Institute of Translational Medicine, Hangzhou City University, Hangzhou 310015, Zhejiang, China
- Key Laboratory of Novel Targets and Drug Study for Neural Repair of Zhejiang Province, Hangzhou City University, Hangzhou 310015, Zhejiang, China
| | - Shiwei Duan
- Department of Clinical Medicine, School of Medicine, Hangzhou City University, Hangzhou 310015, Zhejiang, China
- Institute of Translational Medicine, Hangzhou City University, Hangzhou 310015, Zhejiang, China
- Key Laboratory of Novel Targets and Drug Study for Neural Repair of Zhejiang Province, Hangzhou City University, Hangzhou 310015, Zhejiang, China
| |
Collapse
|
2
|
Monzó C, Liu T, Conesa A. Transcriptomics in the era of long-read sequencing. Nat Rev Genet 2025:10.1038/s41576-025-00828-z. [PMID: 40155769 DOI: 10.1038/s41576-025-00828-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/20/2025] [Indexed: 04/01/2025]
Abstract
Transcriptome sequencing revolutionized the analysis of gene expression, providing an unbiased approach to gene detection and quantification that enabled the discovery of novel isoforms, alternative splicing events and fusion transcripts. However, although short-read sequencing technologies have surpassed the limited dynamic range of previous technologies such as microarrays, they have limitations, for example, in resolving full-length transcripts and complex isoforms. Over the past 5 years, long-read sequencing technologies have matured considerably, with improvements in instrumentation and analytical methods, enabling their application to RNA sequencing (RNA-seq). Benchmarking studies are beginning to identify the strengths and limitations of long-read RNA-seq, although there remains a need for comprehensive resources to guide newcomers through the intricacies of this approach. In this Review, we provide a comprehensive overview of the long-read RNA-seq workflow, from library preparation and sequencing challenges to core data processing, downstream analyses and emerging developments. We present an extensive inventory of experimental and analytical methods and discuss current challenges and prospects.
Collapse
Affiliation(s)
- Carolina Monzó
- Institute for Integrative Systems Biology, Spanish National Research Council, Paterna, Valencia, Spain.
| | - Tianyuan Liu
- Institute for Integrative Systems Biology, Spanish National Research Council, Paterna, Valencia, Spain
| | - Ana Conesa
- Institute for Integrative Systems Biology, Spanish National Research Council, Paterna, Valencia, Spain.
| |
Collapse
|
3
|
Ament IH, DeBruyne N, Wang F, Lin L. Long-read RNA sequencing: A transformative technology for exploring transcriptome complexity in human diseases. Mol Ther 2025; 33:883-894. [PMID: 39563027 PMCID: PMC11897757 DOI: 10.1016/j.ymthe.2024.11.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2024] [Revised: 10/30/2024] [Accepted: 11/15/2024] [Indexed: 11/21/2024] Open
Abstract
Long-read RNA sequencing (RNA-seq) is emerging as a powerful and versatile technology for studying human transcriptomes. By enabling the end-to-end sequencing of full-length transcripts, long-read RNA-seq opens up avenues for investigating various RNA species and features that cannot be reliably interrogated by standard short-read RNA-seq methods. In this review, we present an overview of long-read RNA-seq, delineating its strengths over short-read RNA-seq, as well as summarizing recent advances in experimental and computational approaches to boost the power of long-read-based transcriptomics. We describe a wide range of applications of long-read RNA-seq, and highlight its expanding role as a foundational technology for exploring transcriptome variations in human diseases.
Collapse
Affiliation(s)
| | - Nicole DeBruyne
- Graduate Group in Cell and Molecular Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Feng Wang
- Center for Computational and Genomic Medicine, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.
| | - Lan Lin
- Center for Computational and Genomic Medicine, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.
| |
Collapse
|
4
|
Dorney R, Reis-das-Mercês L, Schmitz U. Architects and Partners: The Dual Roles of Non-coding RNAs in Gene Fusion Events. Methods Mol Biol 2025; 2883:231-255. [PMID: 39702711 DOI: 10.1007/978-1-0716-4290-0_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2024]
Abstract
Extensive research into gene fusions in cancer and other diseases has led to the discovery of novel biomarkers and therapeutic targets. Concurrently, various bioinformatics tools have been developed for fusion detection in RNA sequencing data, which, in the age of increasing affordability of sequencing, have delivered a large-scale identification of transcriptomic abnormalities. Historically, the focus of fusion transcript research was predominantly on coding RNAs and their resultant proteins, often overlooking non-coding RNAs (ncRNAs). This chapter discusses how ncRNAs are integral players in the landscape of gene fusions, detailing their contributions to the formation of gene fusions and their presence in chimeric transcripts. We delve into both linear and the more recently identified circular fusion RNAs, providing a comprehensive overview of the computational methodologies used to detect ncRNA-involved gene fusions. Additionally, we examine the inherent biases and limitations of these bioinformatics approaches, offering insights into the challenges and future directions in this dynamic field.
Collapse
Affiliation(s)
- Ryley Dorney
- Biomedical Sciences and Molecular Biology, College of Public Health, Medical & Vet Sciences, James Cook University, Douglas, QLD, Australia
- Centre for Tropical Bioinformatics and Molecular Biology, Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, Australia
| | - Laís Reis-das-Mercês
- Laboratory of Human and Medical Genetics, Institute of Biological Sciences, Federal University of Pará, Belem, PA, Brazil
| | - Ulf Schmitz
- Biomedical Sciences and Molecular Biology, College of Public Health, Medical & Vet Sciences, James Cook University, Douglas, QLD, Australia.
- Centre for Tropical Bioinformatics and Molecular Biology, Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, Australia.
- Computational BioMedicine Lab, Centenary Institute, The University of Sydney, Camperdown, NSW, Australia.
- Faculty of Medicine & Health, The University of Sydney, Camperdown, NSW, Australia.
| |
Collapse
|
5
|
Masuda K, Sota Y, Matsuda H. Gene Fusion Detection in Long-Read Transcriptome Datasets from Multiple Cancer Cell Lines. FRONT BIOSCI-LANDMRK 2024; 29:413. [PMID: 39735992 DOI: 10.31083/j.fbl2912413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2024] [Revised: 10/19/2024] [Accepted: 10/30/2024] [Indexed: 12/31/2024]
Abstract
BACKGROUND Fusion genes are important biomarkers in cancer research because their expression can produce abnormal proteins with oncogenic properties. Long-read RNA sequencing (long-read RNA-seq), which can sequence full-length mRNA transcripts, facilitates the detection of such fusion genes. Several tools have been proposed for detecting fusion genes in long-read RNA-seq datasets derived from cancer cells. However, the high sequencing error rate in long-read RNA-seq makes fusion gene detection challenging. METHODS To address this issue, additional steps were incorporated into the fusion detection tool to improve detection accuracy. These steps include anchoring breakpoints to exon boundaries, realigning unaligned regions, and clustering breakpoints. To evaluate the accuracy of our tool in detecting fusion genes, we compared its detection accuracy with two representative existing tools, JAFFAL and FusionSeeker. RESULTS Our tool outperformed the two existing tools in detecting fusion genes, as demonstrated in long-read RNA-seq datasets. We also identified potentially novel fusion genes consistently detected across multiple tools or datasets. CONCLUSIONS The application of our tool to the detection of fusion genes in long-read RNA-seq datasets from two different cancer cell lines demonstrated the detection effectiveness of this tool.
Collapse
Affiliation(s)
- Keigo Masuda
- Graduate School of Information Science and Technology, Osaka University, 565-0871 Suita, Osaka, Japan
| | - Yoshiaki Sota
- Graduate School of Medicine, Osaka University, 565-0871 Suita, Osaka, Japan
| | - Hideo Matsuda
- Graduate School of Information Science and Technology, Osaka University, 565-0871 Suita, Osaka, Japan
| |
Collapse
|
6
|
Zhu XT, Sanz-Jimenez P, Ning XT, Tahir Ul Qamar M, Chen LL. Direct RNA sequencing in plants: Practical applications and future perspectives. PLANT COMMUNICATIONS 2024; 5:101064. [PMID: 39155503 PMCID: PMC11589328 DOI: 10.1016/j.xplc.2024.101064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Revised: 07/17/2024] [Accepted: 08/14/2024] [Indexed: 08/20/2024]
Abstract
The transcriptome serves as a bridge that links genomic variation to phenotypic diversity. A vast number of studies using next-generation RNA sequencing (RNA-seq) over the last 2 decades have emphasized the essential roles of the plant transcriptome in response to developmental and environmental conditions, providing numerous insights into the dynamic changes, evolutionary traces, and elaborate regulation of the plant transcriptome. With substantial improvement in accuracy and throughput, direct RNA sequencing (DRS) has emerged as a new and powerful sequencing platform for precise detection of native and full-length transcripts, overcoming many limitations such as read length and PCR bias that are inherent to short-read RNA-seq. Here, we review recent advances in dissecting the complexity and diversity of plant transcriptomes using DRS as the main technological approach, covering many aspects of RNA metabolism, including novel isoforms, poly(A) tails, and RNA modification, and we propose a comprehensive workflow for processing of plant DRS data. Many challenges to the application of DRS in plants, such as the need for machine learning tools tailored to plant transcriptomes, remain to be overcome, and together we outline future biological questions that can be addressed by DRS, such as allele-specific RNA modification. This technology provides convenient support on which the connection of distinct RNA features is tightly built, sustainably refining our understanding of the biological functions of the plant transcriptome.
Collapse
Affiliation(s)
- Xi-Tong Zhu
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, College of Life Science and Technology, Guangxi University, Nanning 530004, China.
| | - Pablo Sanz-Jimenez
- National Key Laboratory of Crop Genetic Improvement, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Xiao-Tong Ning
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, College of Life Science and Technology, Guangxi University, Nanning 530004, China
| | - Muhammad Tahir Ul Qamar
- Integrative Omics and Molecular Modeling Laboratory, Department of Bioinformatics and Biotechnology, Government College University Faisalabad (GCUF), Faisalabad 38000, Pakistan
| | - Ling-Ling Chen
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, College of Life Science and Technology, Guangxi University, Nanning 530004, China.
| |
Collapse
|
7
|
Yang L, Zhang X, Wang F, Zhang L, Li J, Yue JX. NanoTrans: an integrated computational framework for comprehensive transcriptome analysis with nanopore direct RNA sequencing. J Genet Genomics 2024; 51:1300-1309. [PMID: 39004399 DOI: 10.1016/j.jgg.2024.07.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 07/03/2024] [Accepted: 07/04/2024] [Indexed: 07/16/2024]
Abstract
Nanopore direct RNA sequencing (DRS) provides the direct access to native RNA strands with full-length information, shedding light on rich qualitative and quantitative properties of gene expression profiles. Here with NanoTrans, we present an integrated computational framework that comprehensively covers all major DRS-based application scopes, including isoform clustering and quantification, poly(A) tail length estimation, RNA modification profiling, and fusion gene detection. In addition to its merit in providing such a streamlined one-stop solution, NanoTrans also shines in its workflow-orientated modular design, batch processing capability, all-in-one tabular and graphic report output, as well as automatic installation and configuration supports. Finally, by applying NanoTrans to real DRS datasets of yeast, Arabidopsis, as well as human embryonic kidney and cancer cell lines, we further demonstrate its utility, effectiveness, and efficacy across a wide range of DRS-based application settings.
Collapse
Affiliation(s)
- Ludong Yang
- State Key Laboratory of Oncology in South China, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, Guangdong 510060, China
| | - Xinxin Zhang
- State Key Laboratory of Oncology in South China, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, Guangdong 510060, China
| | - Fan Wang
- State Key Laboratory of Oncology in South China, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, Guangdong 510060, China; Department of Medical Oncology, The Affiliated Huai'an No.1 People's Hospital of Nanjing Medical University, Huai'an, Jiangsu 223200, China
| | - Li Zhang
- State Key Laboratory of Oncology in South China, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, Guangdong 510060, China.
| | - Jing Li
- State Key Laboratory of Oncology in South China, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, Guangdong 510060, China.
| | - Jia-Xing Yue
- State Key Laboratory of Oncology in South China, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, Guangdong 510060, China.
| |
Collapse
|
8
|
Chen S, Wang H, Zhang D, Chen R, Luo J. Readon: a novel algorithm to identify read-through transcripts with long-read sequencing data. Bioinformatics 2024; 40:btae336. [PMID: 38808568 PMCID: PMC11162696 DOI: 10.1093/bioinformatics/btae336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 04/30/2024] [Accepted: 05/26/2024] [Indexed: 05/30/2024] Open
Abstract
MOTIVATION There are many clustered transcriptionally active regions in the human genome, in which the transcription complex cannot immediately terminate transcription at the upstream gene termination site, but instead continues to transcribe intergenic regions and downstream genes, resulting in read-through transcripts. Several studies have demonstrated the regulatory roles of read-through transcripts in tumorigenesis and development. However, limited by the read length of next-generation sequencing, discovery of read-through transcripts has been slow. For long but also erroneous third-generation sequencing data, this study developed a novel minimizer sketch algorithm to accurately and quickly identify read-through transcripts. RESULTS Readon initially splits the reference sequence into distinct active regions. It employs a sliding window approach within each region, calculates minimizers, and constructs the specialized structured arrays for query indexing. Following initial alignment anchor screening of candidate read-through transcripts, further confirmation steps are executed. Comparative assessments against existing software reveal Readon's superior performance on both simulated and validated real data. Additionally, two downstream tools are provided: one for predicting whether a read-through transcript is likely to undergo nonsense-mediated decay or encodes a protein, and another for visualizing splicing patterns. AVAILABILITY AND IMPLEMENTATION Readon is freely available on GitHub (https://github.com/Bulabula45/Readon).
Collapse
Affiliation(s)
- Siang Chen
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Hao Wang
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Dongdong Zhang
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Runsheng Chen
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jianjun Luo
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
9
|
Qin Q, Popic V, Yu H, White E, Khorgade A, Shin A, Wienand K, Dondi A, Beerenwinkel N, Vazquez F, Al’Khafaji AM, Haas BJ. CTAT-LR-fusion: accurate fusion transcript identification from long and short read isoform sequencing at bulk or single cell resolution. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.24.581862. [PMID: 38464114 PMCID: PMC10925146 DOI: 10.1101/2024.02.24.581862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Gene fusions are found as cancer drivers in diverse adult and pediatric cancers. Accurate detection of fusion transcripts is essential in cancer clinical diagnostics, prognostics, and for guiding therapeutic development. Most currently available methods for fusion transcript detection are compatible with Illumina RNA-seq involving highly accurate short read sequences. Recent advances in long read isoform sequencing enable the detection of fusion transcripts at unprecedented resolution in bulk and single cell samples. Here we developed a new computational tool CTAT-LR-fusion to detect fusion transcripts from long read RNA-seq with or without companion short reads, with applications to bulk or single cell transcriptomes. We demonstrate that CTAT-LR-fusion exceeds fusion detection accuracy of alternative methods as benchmarked with simulated and real long read RNA-seq. Using short and long read RNA-seq, we further apply CTAT-LR-fusion to bulk transcriptomes of nine tumor cell lines, and to tumor single cells derived from a melanoma sample and three metastatic high grade serous ovarian carcinoma samples. In both bulk and in single cell RNA-seq, long isoform reads yielded higher sensitivity for fusion detection than short reads with notable exceptions. By combining short and long reads in CTAT-LR-fusion, we are able to further maximize detection of fusion splicing isoforms and fusion-expressing tumor cells. CTAT-LR-fusion is available at https://github.com/TrinityCTAT/CTAT-LR-fusion/wiki.
Collapse
Affiliation(s)
- Qian Qin
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142 USA
| | - Victoria Popic
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142 USA
| | - Houlin Yu
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142 USA
| | - Emily White
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142 USA
| | - Akanksha Khorgade
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142 USA
| | - Asa Shin
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142 USA
| | - Kirsty Wienand
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142 USA
| | - Arthur Dondi
- ETH Zurich, Department of Biosystems Science and Engineering, Schanzenstrasse 44, 4056 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Schanzenstrasse 44, 4056 Basel, Switzerland
| | - Niko Beerenwinkel
- ETH Zurich, Department of Biosystems Science and Engineering, Schanzenstrasse 44, 4056 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Schanzenstrasse 44, 4056 Basel, Switzerland
| | - Francisca Vazquez
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142 USA
| | - Aziz M. Al’Khafaji
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142 USA
| | - Brian J. Haas
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142 USA
| |
Collapse
|
10
|
Karaoğlanoğlu F, Orabi B, Flannigan R, Chauve C, Hach F. TKSM: highly modular, user-customizable, and scalable transcriptomic sequencing long-read simulator. Bioinformatics 2024; 40:btae051. [PMID: 38273664 PMCID: PMC10868325 DOI: 10.1093/bioinformatics/btae051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 01/10/2024] [Accepted: 01/23/2024] [Indexed: 01/27/2024] Open
Abstract
MOTIVATION Transcriptomic long-read (LR) sequencing is an increasingly cost-effective technology for probing various RNA features. Numerous tools have been developed to tackle various transcriptomic sequencing tasks (e.g. isoform and gene fusion detection). However, the lack of abundant gold-standard datasets hinders the benchmarking of such tools. Therefore, the simulation of LR sequencing is an important and practical alternative. While the existing LR simulators aim to imitate the sequencing machine noise and to target specific library protocols, they lack some important library preparation steps (e.g. PCR) and are difficult to modify to new and changing library preparation techniques (e.g. single-cell LRs). RESULTS We present TKSM, a modular and scalable LR simulator, designed so that each RNA modification step is targeted explicitly by a specific module. This allows the user to assemble a simulation pipeline as a combination of TKSM modules to emulate a specific sequencing design. Additionally, the input/output of all the core modules of TKSM follows the same simple format (Molecule Description Format) allowing the user to easily extend TKSM with new modules targeting new library preparation steps. AVAILABILITY AND IMPLEMENTATION TKSM is available as an open source software at https://github.com/vpc-ccg/tksm.
Collapse
Affiliation(s)
- Fatih Karaoğlanoğlu
- Computing Science Department, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| | - Baraa Orabi
- Department of Computer Science, the University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - Ryan Flannigan
- Department of Urologic Sciences, the University of British Columbia, Vancouver, BC V5Z 1M9, Canada
- Vancouver Prostate Centre, Vancouver, BC V6H 3Z6, Canada
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| | - Faraz Hach
- Department of Computer Science, the University of British Columbia, Vancouver, BC V6T 1Z4, Canada
- Department of Urologic Sciences, the University of British Columbia, Vancouver, BC V5Z 1M9, Canada
- Vancouver Prostate Centre, Vancouver, BC V6H 3Z6, Canada
| |
Collapse
|
11
|
Shi Q, Li X, Liu Y, Chen Z, He X. FLIBase: a comprehensive repository of full-length isoforms across human cancers and tissues. Nucleic Acids Res 2024; 52:D124-D133. [PMID: 37697439 PMCID: PMC10767943 DOI: 10.1093/nar/gkad745] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 08/14/2023] [Accepted: 08/31/2023] [Indexed: 09/13/2023] Open
Abstract
Regulatory processes at the RNA transcript level play a crucial role in generating transcriptome diversity and proteome composition in human cells, impacting both physiological and pathological states. This study introduces FLIBase (www.FLIBase.org), a specialized database that focuses on annotating full-length isoforms using long-read sequencing techniques. We collected and integrated long-read (351 samples) and short-read (12 469 samples) RNA sequencing data from diverse normal and cancerous human tissues and cells. The current version of FLIBase comprises a total of 983 789 full-length spliced isoforms, identified through long-read sequences and verified using short-read exon-exon splice junctions. Of these, 188 248 isoforms have been annotated, while 795 541 isoforms remain unannotated. By overcoming the limitations of short-read RNA sequencing methods, FLIBase provides an accurate and comprehensive representation of full-length transcripts. These comprehensive annotations empower researchers to undertake various downstream analyses and investigations. Importantly, FLIBase exhibits a significant advantage in identifying a substantial number of previously unannotated isoforms and tumor-specific RNA transcripts. These tumor-specific RNA transcripts have the potential to serve as a source of immunogenic recurrent neoantigens. This remarkable discovery holds tremendous promise for advancing the development of tailored RNA-based diagnostic and therapeutic strategies for various types of human cancer.
Collapse
Affiliation(s)
- Qili Shi
- Fudan University Shanghai Cancer Center and Institutes of Biomedical Sciences, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Xinrong Li
- Fudan University Shanghai Cancer Center and Institutes of Biomedical Sciences, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Yizhe Liu
- Fudan University Shanghai Cancer Center and Institutes of Biomedical Sciences, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Zhiao Chen
- Fudan University Shanghai Cancer Center and Institutes of Biomedical Sciences, Shanghai Medical College, Fudan University, Shanghai 200032, China
- Key Laboratory of Breast Cancer in Shanghai, Fudan University Shanghai Cancer Center, Fudan University, Shanghai 200032, China
- Shanghai Key Laboratory of Radiation Oncology, Fudan University Shanghai Cancer Center, Fudan University, Shanghai 200032, China
| | - Xianghuo He
- Fudan University Shanghai Cancer Center and Institutes of Biomedical Sciences, Shanghai Medical College, Fudan University, Shanghai 200032, China
- Key Laboratory of Breast Cancer in Shanghai, Fudan University Shanghai Cancer Center, Fudan University, Shanghai 200032, China
- Shanghai Key Laboratory of Radiation Oncology, Fudan University Shanghai Cancer Center, Fudan University, Shanghai 200032, China
| |
Collapse
|
12
|
Zong L, Zhu Y, Jiang Y, Xia Y, Liu Q, Wang J, Gao S, Luo B, Yuan Y, Zhou J, Jiang S. An optimized workflow of full-length transcriptome sequencing for accurate fusion transcript identification. RNA Biol 2024; 21:122-131. [PMID: 39540613 PMCID: PMC11572239 DOI: 10.1080/15476286.2024.2425527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Revised: 10/23/2024] [Accepted: 10/25/2024] [Indexed: 11/16/2024] Open
Abstract
Next-generation sequencing has revolutionized cancer genomics by enabling high-throughput mutation screening yet detecting fusion genes reliably remains challenging. Long-read sequencing offers potential for accurate fusion transcript identification, though challenges persist. In this study, we present an optimized workflow using nanopore sequencing technology to precisely identify fusion transcripts. Our approach encompasses a tailored library preparation protocol, data processing, and fusion gene analysis pipeline. We evaluated the performance using Universal Human Reference RNA and human adenocarcinoma cell lines. Our optimized nanopore sequencing workflow generated high-quality full-length transcriptome data characterized by an extended length distribution and comprehensive transcript coverage. Validation experiments confirmed novel fusion events with potential clinical relevance. Our protocol aims to mitigate biases and enhance accuracy, facilitating increased adoption in clinical diagnostics. Continued advancements in long-read sequencing promise deeper insights into fusion gene biology and improved cancer diagnostics.
Collapse
Affiliation(s)
- Liang Zong
- Department of Biology and Genetics, College of Life Sciences and Health, Wuhan University of Science and Technology, Wuhan, China
- Wuhan BGI Technology Service Co. Ltd., BGI-Wuhan, Wuhan, China
| | - Yabing Zhu
- BGI Tech Solutions Co. Ltd., BGI-Shenzhen, Shenzhen, China
| | - Yuan Jiang
- Wuhan BGI Technology Service Co. Ltd., BGI-Wuhan, Wuhan, China
| | - Ying Xia
- Wuhan BGI Technology Service Co. Ltd., BGI-Wuhan, Wuhan, China
| | - Qun Liu
- Wuhan BGI Technology Service Co. Ltd., BGI-Wuhan, Wuhan, China
| | - Jing Wang
- Wuhan BGI Technology Service Co. Ltd., BGI-Wuhan, Wuhan, China
| | - Song Gao
- Wuhan BGI Technology Service Co. Ltd., BGI-Wuhan, Wuhan, China
| | - Bei Luo
- Wuhan BGI Technology Service Co. Ltd., BGI-Wuhan, Wuhan, China
| | - Yongxian Yuan
- BGI Tech Solutions Co. Ltd., BGI-Shenzhen, Shenzhen, China
| | - Jingjiao Zhou
- Department of Biology and Genetics, College of Life Sciences and Health, Wuhan University of Science and Technology, Wuhan, China
| | - Sanjie Jiang
- BGI Tech Solutions Co. Ltd., BGI-Shenzhen, Shenzhen, China
| |
Collapse
|
13
|
Kainth AS, Haddad GA, Hall JM, Ruthenburg AJ. Merging short and stranded long reads improves transcript assembly. PLoS Comput Biol 2023; 19:e1011576. [PMID: 37883581 PMCID: PMC10629667 DOI: 10.1371/journal.pcbi.1011576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 11/07/2023] [Accepted: 10/05/2023] [Indexed: 10/28/2023] Open
Abstract
Long-read RNA sequencing has arisen as a counterpart to short-read sequencing, with the potential to capture full-length isoforms, albeit at the cost of lower depth. Yet this potential is not fully realized due to inherent limitations of current long-read assembly methods and underdeveloped approaches to integrate short-read data. Here, we critically compare the existing methods and develop a new integrative approach to characterize a particularly challenging pool of low-abundance long noncoding RNA (lncRNA) transcripts from short- and long-read sequencing in two distinct cell lines. Our analysis reveals severe limitations in each of the sequencing platforms. For short-read assemblies, coverage declines at transcript termini resulting in ambiguous ends, and uneven low coverage results in segmentation of a single transcript into multiple transcripts. Conversely, long-read sequencing libraries lack depth and strand-of-origin information in cDNA-based methods, culminating in erroneous assembly and quantitation of transcripts. We also discover a cDNA synthesis artifact in long-read datasets that markedly impacts the identity and quantitation of assembled transcripts. Towards remediating these problems, we develop a computational pipeline to "strand" long-read cDNA libraries that rectifies inaccurate mapping and assembly of long-read transcripts. Leveraging the strengths of each platform and our computational stranding, we also present and benchmark a hybrid assembly approach that drastically increases the sensitivity and accuracy of full-length transcript assembly on the correct strand and improves detection of biological features of the transcriptome. When applied to a challenging set of under-annotated and cell-type variable lncRNA, our method resolves the segmentation problem of short-read sequencing and the depth problem of long-read sequencing, resulting in the assembly of coherent transcripts with precise 5' and 3' ends. Our workflow can be applied to existing datasets for superior demarcation of transcript ends and refined isoform structure, which can enable better differential gene expression analyses and molecular manipulations of transcripts.
Collapse
Affiliation(s)
- Amoldeep S. Kainth
- Department of Molecular Genetics and Cell Biology, The University of Chicago, Chicago, Illinois, United States of America
| | - Gabriela A. Haddad
- Committee on Genetics, Genomics and Systems Biology, The University of Chicago, Chicago, Illinois, United States of America
| | - Johnathon M. Hall
- Department of Molecular Genetics and Cell Biology, The University of Chicago, Chicago, Illinois, United States of America
| | - Alexander J. Ruthenburg
- Department of Molecular Genetics and Cell Biology, The University of Chicago, Chicago, Illinois, United States of America
- Committee on Genetics, Genomics and Systems Biology, The University of Chicago, Chicago, Illinois, United States of America
- Department of Biochemistry and Molecular Biology, The University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
14
|
Sala-Torra O, Reddy S, Hung LH, Beppu L, Wu D, Radich J, Yeung KY, Yeung CCS. Rapid detection of myeloid neoplasm fusions using single-molecule long-read sequencing. PLOS GLOBAL PUBLIC HEALTH 2023; 3:e0002267. [PMID: 37699001 PMCID: PMC10497132 DOI: 10.1371/journal.pgph.0002267] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 07/17/2023] [Indexed: 09/14/2023]
Abstract
Recurrent gene fusions are common drivers of disease pathophysiology in leukemias. Identifying these structural variants helps stratify disease by risk and assists with therapy choice. Precise molecular diagnosis in low-and-middle-income countries (LMIC) is challenging given the complexity of assays, trained technical support, and the availability of reliable electricity. Current fusion detection methods require a long turnaround time (7-10 days) or advance knowledge of the genes involved in the fusions. Recent technology developments have made sequencing possible without a sophisticated molecular laboratory, potentially making molecular diagnosis accessible to remote areas and low-income settings. We describe a long-read sequencing DNA assay designed with CRISPR guides to select and enrich for recurrent leukemia fusion genes, that does not need a priori knowledge of the abnormality present. By applying rapid sequencing technology based on nanopores, we sequenced long pieces of genomic DNA and successfully detected fusion genes in cell lines and primary specimens (e.g., BCR::ABL1, PML::RARA, CBFB::MYH11, KMT2A::AFF1) using cloud-based bioinformatics workflows with novel custom fusion finder software. We detected fusion genes in 100% of cell lines with the expected breakpoints and confirmed the presence or absence of a recurrent fusion gene in 12 of 14 patient cases. With our optimized assay and cloud-based bioinformatics workflow, these assays and analyses could be performed in under 8 hours. The platform's portability, potential for adaptation to lower-cost devices, and integrated cloud analysis make this assay a candidate to be placed in settings like LMIC to bridge the need of bedside rapid molecular diagnostics.
Collapse
Affiliation(s)
- Olga Sala-Torra
- Translational Science and Therapeutics Division, Fred Hutchinson Cancer Center, Seattle, Washington, United States of America
- University of Washington, Seattle, Washington, United States of America
| | - Shishir Reddy
- University of Washington, Seattle, Washington, United States of America
| | - Ling-Hong Hung
- University of Washington, Seattle, Washington, United States of America
| | - Lan Beppu
- Translational Science and Therapeutics Division, Fred Hutchinson Cancer Center, Seattle, Washington, United States of America
| | - David Wu
- School of Engineering and Technology, University of Washington Tacoma, Tacoma, Washington, United States of America
| | - Jerald Radich
- Translational Science and Therapeutics Division, Fred Hutchinson Cancer Center, Seattle, Washington, United States of America
- School of Engineering and Technology, University of Washington Tacoma, Tacoma, Washington, United States of America
| | - Ka Yee Yeung
- University of Washington, Seattle, Washington, United States of America
| | - Cecilia C. S. Yeung
- Translational Science and Therapeutics Division, Fred Hutchinson Cancer Center, Seattle, Washington, United States of America
- School of Engineering and Technology, University of Washington Tacoma, Tacoma, Washington, United States of America
| |
Collapse
|
15
|
van Belzen IAEM, Cai C, van Tuil M, Badloe S, Strengman E, Janse A, Verwiel ETP, van der Leest DFM, Kester L, Molenaar JJ, Meijerink J, Drost J, Peng WC, Kerstens HHD, Tops BBJ, Holstege FCP, Kemmeren P, Hehir-Kwa JY. Systematic discovery of gene fusions in pediatric cancer by integrating RNA-seq and WGS. BMC Cancer 2023; 23:618. [PMID: 37400763 DOI: 10.1186/s12885-023-11054-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Accepted: 03/08/2023] [Indexed: 07/05/2023] Open
Abstract
BACKGROUND Gene fusions are important cancer drivers in pediatric cancer and their accurate detection is essential for diagnosis and treatment. Clinical decision-making requires high confidence and precision of detection. Recent developments show RNA sequencing (RNA-seq) is promising for genome-wide detection of fusion products but hindered by many false positives that require extensive manual curation and impede discovery of pathogenic fusions. METHODS We developed Fusion-sq to overcome existing disadvantages of detecting gene fusions. Fusion-sq integrates and "fuses" evidence from RNA-seq and whole genome sequencing (WGS) using intron-exon gene structure to identify tumor-specific protein coding gene fusions. Fusion-sq was then applied to the data generated from a pediatric pan-cancer cohort of 128 patients by WGS and RNA sequencing. RESULTS In a pediatric pan-cancer cohort of 128 patients, we identified 155 high confidence tumor-specific gene fusions and their underlying structural variants (SVs). This includes all clinically relevant fusions known to be present in this cohort (30 patients). Fusion-sq distinguishes healthy-occurring from tumor-specific fusions and resolves fusions in amplified regions and copy number unstable genomes. A high gene fusion burden is associated with copy number instability. We identified 27 potentially pathogenic fusions involving oncogenes or tumor-suppressor genes characterized by underlying SVs, in some cases leading to expression changes indicative of activating or disruptive effects. CONCLUSIONS Our results indicate how clinically relevant and potentially pathogenic gene fusions can be identified and their functional effects investigated by combining WGS and RNA-seq. Integrating RNA fusion predictions with underlying SVs advances fusion detection beyond extensive manual filtering. Taken together, we developed a method for identifying candidate gene fusions that is suitable for precision oncology applications. Our method provides multi-omics evidence for assessing the pathogenicity of tumor-specific gene fusions for future clinical decision making.
Collapse
Affiliation(s)
| | - Casey Cai
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | - Marc van Tuil
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | - Shashi Badloe
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | - Eric Strengman
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | - Alex Janse
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | | | | | - Lennart Kester
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | - Jan J Molenaar
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
- Department of Pharmaceutical Sciences, Utrecht University, Utrecht, The Netherlands
| | - Jules Meijerink
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | - Jarno Drost
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
| | - Weng Chuan Peng
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | | | - Bastiaan B J Tops
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | | | - Patrick Kemmeren
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands.
- Center for Molecular Medicine, UMC Utrecht and Utrecht University, Utrecht, The Netherlands.
| | - Jayne Y Hehir-Kwa
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands.
| |
Collapse
|
16
|
Haas BJ, Dobin A, Ghandi M, Van Arsdale A, Tickle T, Robinson JT, Gillani R, Kasif S, Regev A. Targeted in silico characterization of fusion transcripts in tumor and normal tissues via FusionInspector. CELL REPORTS METHODS 2023; 3:100467. [PMID: 37323575 PMCID: PMC10261907 DOI: 10.1016/j.crmeth.2023.100467] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 02/28/2023] [Accepted: 04/14/2023] [Indexed: 06/17/2023]
Abstract
Here, we present FusionInspector for in silico characterization and interpretation of candidate fusion transcripts from RNA sequencing (RNA-seq) and exploration of their sequence and expression characteristics. We applied FusionInspector to thousands of tumor and normal transcriptomes and identified statistical and experimental features enriched among biologically impactful fusions. Through clustering and machine learning, we identified large collections of fusions potentially relevant to tumor and normal biological processes. We show that biologically relevant fusions are enriched for relatively high expression of the fusion transcript, imbalanced fusion allelic ratios, and canonical splicing patterns, and are deficient in sequence microhomologies between partner genes. We demonstrate that FusionInspector accurately validates fusion transcripts in silico and helps characterize numerous understudied fusions in tumor and normal tissue samples. FusionInspector is freely available as open source for screening, characterization, and visualization of candidate fusions via RNA-seq, and facilitates transparent explanation and interpretation of machine-learning predictions and their experimental sources.
Collapse
Affiliation(s)
- Brian J. Haas
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Graduate Program in Bioinformatics, Boston University, Boston, MA 02215, USA
| | | | | | - Anne Van Arsdale
- Department of Obstetrics and Gynecology and Women’s Health, Albert Einstein Montefiore Medical Center, Bronx, NY 10461, USA
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| | - Timothy Tickle
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - James T. Robinson
- School of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Riaz Gillani
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA 02215, USA
- Boston Children’s Hospital, Boston, MA 02115, USA
| | - Simon Kasif
- Graduate Program in Bioinformatics, Boston University, Boston, MA 02215, USA
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
| | - Aviv Regev
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
| |
Collapse
|
17
|
Wu S, Schmitz U. Single-cell and long-read sequencing to enhance modelling of splicing and cell-fate determination. Comput Struct Biotechnol J 2023; 21:2373-2380. [PMID: 37066125 PMCID: PMC10091034 DOI: 10.1016/j.csbj.2023.03.023] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 03/13/2023] [Accepted: 03/13/2023] [Indexed: 04/03/2023] Open
Abstract
Single-cell sequencing technologies have revolutionised the life sciences and biomedical research. Single-cell sequencing provides high-resolution data on cell heterogeneity, allowing high-fidelity cell type identification, and lineage tracking. Computational algorithms and mathematical models have been developed to make sense of the data, compensate for errors and simulate the biological processes, which has led to breakthroughs in our understanding of cell differentiation, cell-fate determination and tissue cell composition. The development of long-read (a.k.a. third-generation) sequencing technologies has produced powerful tools for investigating alternative splicing, isoform expression (at the RNA level), genome assembly and the detection of complex structural variants (at the DNA level). In this review, we provide an overview of the recent advancements in single-cell and long-read sequencing technologies, with a particular focus on the computational algorithms that help in correcting, analysing, and interpreting the resulting data. Additionally, we review some mathematical models that use single-cell and long-read sequencing data to study cell-fate determination and alternative splicing, respectively. Moreover, we highlight the emerging opportunities in modelling cell-fate determination that result from the combination of single-cell and long-read sequencing technologies.
Collapse
Affiliation(s)
- Siyuan Wu
- Department of Molecular & Cell Biology, James Cook University, Townsville 4811, Queensland, Australia
- Centre for Tropical Bioinformatics and Molecular Biology, James Cook University, Cairns 4870, Queensland, Australia
- School of Mathematics, Monash University, Melbourne 3800, Victoria, Australia
| | - Ulf Schmitz
- Department of Molecular & Cell Biology, James Cook University, Townsville 4811, Queensland, Australia
- Centre for Tropical Bioinformatics and Molecular Biology, James Cook University, Cairns 4870, Queensland, Australia
| |
Collapse
|
18
|
Chen Y, Wang Y, Chen W, Tan Z, Song Y, Human Genome Structural Variation Consortium, Chen H, Chong Z. Gene Fusion Detection and Characterization in Long-Read Cancer Transcriptome Sequencing Data with FusionSeeker. Cancer Res 2023; 83:28-33. [PMID: 36318117 PMCID: PMC9812290 DOI: 10.1158/0008-5472.can-22-1628] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 09/07/2022] [Accepted: 10/28/2022] [Indexed: 01/05/2023]
Abstract
Gene fusions are prevalent in a wide array of cancer types with different frequencies. Long-read transcriptome sequencing technologies, such as PacBio, Iso-Seq, and Nanopore direct RNA sequencing, provide full-length transcript sequencing reads, which could facilitate detection of gene fusions. In this work, we developed a method, FusionSeeker, to comprehensively characterize gene fusions in long-read cancer transcriptome data and reconstruct accurate fused transcripts from raw reads. FusionSeeker identified gene fusions in both exonic and intronic regions, allowing comprehensive characterization of gene fusions in cancer transcriptomes. Fused transcript sequences were reconstructed with FusionSeeker by correcting sequencing errors in the raw reads through partial order alignment algorithm. Using these accurate transcript sequences, FusionSeeker refined gene fusion breakpoint positions and predicted breakpoints at single bp resolution. Overall, FusionSeeker will enable users to discover gene fusions accurately using long-read data, which can facilitate downstream functional analysis as well as improved cancer diagnosis and treatment. SIGNIFICANCE FusionSeeker is a new method to discover gene fusions and reconstruct fused transcript sequences in long-read cancer transcriptome sequencing data to help identify novel gene fusions important for tumorigenesis and progression.
Collapse
Affiliation(s)
- Yu Chen
- Department of Genetics, Heersink School of Medicine, University of Alabama at Birmingham, AL 35294, USA.,Informatics Institute, Heersink School of Medicine, University of Alabama at Birmingham, AL 35294, USA
| | - Yiqing Wang
- Department of Computer Science, College of Arts and Sciences, University of Alabama at Birmingham, AL 35294, USA
| | - Weisheng Chen
- Department of Surgery, Heersink School of Medicine, University of Alabama at Birmingham, AL 35294, USA
| | - Zhengzhi Tan
- Department of Computer Science, College of Arts and Sciences, University of Alabama at Birmingham, AL 35294, USA
| | - Yuwei Song
- Department of Genetics, Heersink School of Medicine, University of Alabama at Birmingham, AL 35294, USA.,Informatics Institute, Heersink School of Medicine, University of Alabama at Birmingham, AL 35294, USA
| | | | - Herbert Chen
- Department of Surgery, Heersink School of Medicine, University of Alabama at Birmingham, AL 35294, USA.,Department of Biomedical Engineering, School of Engineering, University of Alabama at Birmingham, AL 35294, USA
| | - Zechen Chong
- Department of Genetics, Heersink School of Medicine, University of Alabama at Birmingham, AL 35294, USA.,Informatics Institute, Heersink School of Medicine, University of Alabama at Birmingham, AL 35294, USA.,Correspondence author: Zechen Chong, Ph.D., Mailing address: THT 134, 1720 2 Ave S, Birmingham AL 35226, Phone: 205-801-7590,
| |
Collapse
|
19
|
Dorney R, Dhungel BP, Rasko JEJ, Hebbard L, Schmitz U. Recent advances in cancer fusion transcript detection. Brief Bioinform 2022; 24:6918739. [PMID: 36527429 PMCID: PMC9851307 DOI: 10.1093/bib/bbac519] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 10/11/2022] [Accepted: 10/31/2022] [Indexed: 12/23/2022] Open
Abstract
Extensive investigation of gene fusions in cancer has led to the discovery of novel biomarkers and therapeutic targets. To date, most studies have neglected chromosomal rearrangement-independent fusion transcripts and complex fusion structures such as double or triple-hop fusions, and fusion-circRNAs. In this review, we untangle fusion-related terminology and propose a classification system involving both gene and transcript fusions. We highlight the importance of RNA-level fusions and how long-read sequencing approaches can improve detection and characterization. Moreover, we discuss novel bioinformatic tools to identify fusions in long-read sequencing data and strategies to experimentally validate and functionally characterize fusion transcripts.
Collapse
Affiliation(s)
- Ryley Dorney
- epartment of Molecular & Cell Biology, College of Public Health, Medical & Vet Sciences, James Cook University, Douglas, QLD 4811, Australia,Centre for Tropical Bioinformatics and Molecular Biology, Australian Institute of Tropical Health and Medicine, James Cook University, Cairns 4878, Australia
| | - Bijay P Dhungel
- Gene and Stem Cell Therapy Program Centenary Institute, The University of Sydney, Camperdown, NSW 2050, Australia,Faculty of Medicine & Health, The University of Sydney, Camperdown, NSW 2006, Australia,Centre for Tropical Bioinformatics and Molecular Biology, Australian Institute of Tropical Health and Medicine, James Cook University, Cairns 4878, Australia
| | - John E J Rasko
- Gene and Stem Cell Therapy Program Centenary Institute, The University of Sydney, Camperdown, NSW 2050, Australia,Faculty of Medicine & Health, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Lionel Hebbard
- epartment of Molecular & Cell Biology, College of Public Health, Medical & Vet Sciences, James Cook University, Douglas, QLD 4811, Australia,Storr Liver Centre, Westmead Institute for Medical Research, Westmead Hospital and University of Sydney, Sydney, New South Wales, Australia
| | - Ulf Schmitz
- Corresponding author. Ulf Schmitz, Department of Molecular and Cell Biology, College of Public Health, Medical and Vet Sciences, James Cook University, Douglas, QLD 4811, Australia. E-mail:
| |
Collapse
|
20
|
Kiyose H, Nakagawa H, Ono A, Aikata H, Ueno M, Hayami S, Yamaue H, Chayama K, Shimada M, Wong JH, Fujimoto A. Comprehensive analysis of full-length transcripts reveals novel splicing abnormalities and oncogenic transcripts in liver cancer. PLoS Genet 2022; 18:e1010342. [PMID: 35926060 PMCID: PMC9380957 DOI: 10.1371/journal.pgen.1010342] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 08/16/2022] [Accepted: 07/14/2022] [Indexed: 12/24/2022] Open
Abstract
Genes generate transcripts of various functions by alternative splicing. However, in most transcriptome studies, short-reads sequencing technologies (next-generation sequencers) have been used, leaving full-length transcripts unobserved directly. Although long-reads sequencing technologies would enable the sequencing of full-length transcripts, the data analysis is difficult. In this study, we developed an analysis pipeline named SPLICE and analyzed cDNA sequences from 42 pairs of hepatocellular carcinoma (HCC) and matched non-cancerous livers with an Oxford Nanopore sequencer. Our analysis detected 46,663 transcripts from the protein-coding genes in the HCCs and the matched non-cancerous livers, of which 5,366 (11.5%) were novel. A comparison of expression levels identified 9,933 differentially expressed transcripts (DETs) in 4,744 genes. Interestingly, 746 genes with DETs, including the LINE1-MET transcript, were not found by a gene-level analysis. We also found that fusion transcripts of transposable elements and hepatitis B virus (HBV) were overexpressed in HCCs. In vitro experiments on DETs showed that LINE1-MET and HBV-human transposable elements promoted cell growth. Furthermore, fusion gene detection showed novel recurrent fusion events that were not detected in the short-reads. These results suggest the efficiency of full-length transcriptome studies and the importance of splicing variants in carcinogenesis. Genes generate transcripts of various functions by alternative splicing. However, in most transcriptome studies, short-reads sequencing technologies (next-generation sequencers) have been used, leaving full-length transcripts unobserved directly. In this study, we developed an analysis pipeline named SPLICE for long-read transcriptome sequencing and analyzed cDNA sequences from 42 pairs of hepatocellular carcinoma (HCC), and matched non-cancerous livers with an Oxford Nanopore sequencer. Our analysis detected 5,366 novel transcripts and 9,933 differentially expressed transcripts in 4,744 genes between HCCs and non-cancerous livers. An analysis of hepatitis B virus (HBV) transcripts showed that fusion transcripts of the HBV gene and human transposable elements were overexpressed in HBV-infected HCCs. We also identified fusion genes that were not found in the short-reads. These results suggest that long-reads sequencing technologies provide a fuller understanding of cancer transcripts and that our method contributes to the analysis of transcriptome sequences by such technologies.
Collapse
Affiliation(s)
- Hiroki Kiyose
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
- Department of Drug Discovery Medicine, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Hidewaki Nakagawa
- Laboratory for Cancer Genomics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Atsushi Ono
- Department of Gastroenterology and Metabolism, Graduate School of Biomedical and Health Sciences, Hiroshima University, Hiroshima, Japan
| | - Hiroshi Aikata
- Department of Gastroenterology and Metabolism, Graduate School of Biomedical and Health Sciences, Hiroshima University, Hiroshima, Japan
| | - Masaki Ueno
- Department of Surgery II, Wakayama Medical University, Wakayama, Japan
| | - Shinya Hayami
- Department of Surgery II, Wakayama Medical University, Wakayama, Japan
| | - Hiroki Yamaue
- Department of Surgery II, Wakayama Medical University, Wakayama, Japan
| | - Kazuaki Chayama
- Collaborative Research Laboratory of Medical Innovation, Graduate School of Biomedical and Health Sciences, Hiroshima University, Hiroshima, Japan
- Research Center for Hepatology and Gastroenterology, Graduate School of Biomedical and Health Sciences, Hiroshima University, Hiroshima, Japan
- RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Mihoko Shimada
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Jing Hao Wong
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Akihiro Fujimoto
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
- * E-mail:
| |
Collapse
|
21
|
Vo T, Brownmiller T, Hall K, Jones TL, Choudhari S, Grammatikakis I, Ludwig K, Caplen N. HNRNPH1 destabilizes the G-quadruplex structures formed by G-rich RNA sequences that regulate the alternative splicing of an oncogenic fusion transcript. Nucleic Acids Res 2022; 50:6474-6496. [PMID: 35639772 PMCID: PMC9226515 DOI: 10.1093/nar/gkac409] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2021] [Revised: 04/07/2022] [Accepted: 05/06/2022] [Indexed: 11/13/2022] Open
Abstract
In the presence of physiological monovalent cations, thousands of RNA G-rich sequences can form parallel G-quadruplexes (G4s) unless RNA-binding proteins inhibit, destabilize, or resolve the formation of such secondary RNA structures. Here, we have used a disease-relevant model system to investigate the biophysical properties of the RNA-binding protein HNRNPH1's interaction with G-rich sequences. We demonstrate the importance of two EWSR1-exon 8 G-rich regions in mediating the exclusion of this exon from the oncogenic EWS-FLI1 transcripts expressed in a subset of Ewing sarcomas, using complementary analysis of tumor data, long-read sequencing, and minigene studies. We determined that HNRNPH1 binds the EWSR1-exon 8 G-rich sequences with low nM affinities irrespective of whether in a non-G4 or G4 state but exhibits different kinetics depending on RNA structure. Specifically, HNRNPH1 associates and dissociates from G4-folded RNA faster than the identical sequences in a non-G4 state. Importantly, we demonstrate using gel shift and spectroscopic assays that HNRNPH1, particularly the qRRM1-qRRM2 domains, destabilizes the G4s formed by the EWSR1-exon 8 G-rich sequences in a non-catalytic fashion. Our results indicate that HNRNPH1's binding of G-rich sequences favors the accumulation of RNA in a non-G4 state and that this contributes to its regulation of RNA processing.
Collapse
Affiliation(s)
- Tam Vo
- Functional Genetics Section, Genetics Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Tayvia Brownmiller
- Functional Genetics Section, Genetics Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Katherine Hall
- Functional Genetics Section, Genetics Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Tamara L Jones
- Functional Genetics Section, Genetics Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Sulbha Choudhari
- CCR-SF Bioinformatics Group, Biomedical Informatics and Data Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, 21702, USA
| | - Ioannis Grammatikakis
- Regulatory RNAs and Cancer Section, Genetics Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Katelyn R Ludwig
- Functional Genetics Section, Genetics Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Natasha J Caplen
- Functional Genetics Section, Genetics Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| |
Collapse
|
22
|
Liu T, He J, Dong K, Wang X, Zhang L, Ren R, Huang S, Sun X, Pan W, Wang W, Yang P, Yang T, Zhang Z. Genome-wide identification of quantitative trait loci for morpho-agronomic and yield-related traits in foxtail millet (Setaria italica) across multi-environments. Mol Genet Genomics 2022; 297:873-888. [PMID: 35451683 PMCID: PMC9130181 DOI: 10.1007/s00438-022-01894-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Accepted: 03/31/2022] [Indexed: 11/21/2022]
Abstract
Foxtail millet (Setaria italica) is an ideal model of genetic system for functional genomics of the Panicoideae crop. Identification of QTL responsible for morpho-agronomic and yield-related traits facilitates dissection of genetic control and breeding in cereal crops. Here, based on a Yugu1 × Longgu7 RIL population and genome-wide resequencing data, an updated linkage map harboring 2297 bin and 74 SSR markers was constructed, spanning 1315.1 cM with an average distance of 0.56 cM between adjacent markers. A total of 221 QTL for 17 morpho-agronomic and yield-related traits explaining 5.5 ~ 36% of phenotypic variation were identified across multi-environments. Of these, 109 QTL were detected in two to nine environments, including the most stable qLMS6.1 harboring a promising candidate gene Seita.6G250500, of which 70 were repeatedly identified in different trials in the same geographic location, suggesting that foxtail millet has more identical genetic modules under the similar ecological environment. One hundred-thirty QTL with overlapping intervals formed 22 QTL clusters. Furthermore, six superior recombinant inbred lines, RIL35, RIL48, RIL77, RIL80, RIL115 and RIL125 with transgressive inheritance and enrichment of favorable alleles in plant height, tiller, panicle morphology and yield related-traits were screened by hierarchical cluster. These identified QTL, QTL clusters and superior lines lay ground for further gene-trait association studies and breeding practice in foxtail millet.
Collapse
Affiliation(s)
- Tianpeng Liu
- College of Agronomy and Biotechnology, Southwest University, Chongqing, 400716, China
- Crop Research Institute, Gansu Academy of Agricultural Sciences, Lanzhou, 730070, China
| | - Jihong He
- Crop Research Institute, Gansu Academy of Agricultural Sciences, Lanzhou, 730070, China
| | - Kongjun Dong
- Crop Research Institute, Gansu Academy of Agricultural Sciences, Lanzhou, 730070, China
| | - Xuewen Wang
- Institute of Plant Breeding, Genetics, and Genomics, University of Georgia, Athens, GA, 30601, USA
| | - Lei Zhang
- Crop Research Institute, Gansu Academy of Agricultural Sciences, Lanzhou, 730070, China
| | - Ruiyu Ren
- Crop Research Institute, Gansu Academy of Agricultural Sciences, Lanzhou, 730070, China
| | - Sha Huang
- College of Agronomy and Biotechnology, Southwest University, Chongqing, 400716, China
| | - Xiaoting Sun
- College of Agronomy and Biotechnology, Southwest University, Chongqing, 400716, China
| | - Wanxiang Pan
- College of Life Science and Technology, Gansu Agricultural University, Lanzhou, 730070, China
| | - Wenwen Wang
- College of Agronomy and Biotechnology, Southwest University, Chongqing, 400716, China
| | - Peng Yang
- College of Agronomy and Biotechnology, Southwest University, Chongqing, 400716, China
| | - Tianyu Yang
- Crop Research Institute, Gansu Academy of Agricultural Sciences, Lanzhou, 730070, China.
| | - Zhengsheng Zhang
- College of Agronomy and Biotechnology, Southwest University, Chongqing, 400716, China.
| |
Collapse
|
23
|
Wang J, Bhakta N, Ayer Miller V, Revsine M, Litzow MR, Paietta E, Fedoriw Y, Roberts KG, Gu Z, Mullighan CG, Jones CD, Alexander TB. Acute Leukemia Classification Using Transcriptional Profiles From Low-Cost Nanopore mRNA Sequencing. JCO Precis Oncol 2022; 6:e2100326. [PMID: 35442720 PMCID: PMC9200386 DOI: 10.1200/po.21.00326] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 02/03/2022] [Accepted: 03/09/2022] [Indexed: 01/14/2023] Open
Abstract
PURPOSE Most cases of pediatric acute leukemia occur in low- and middle-income countries, where health centers lack the tools required for accurate diagnosis and disease classification. Recent research shows the robustness of using unbiased short-read RNA sequencing to classify genomic subtypes of acute leukemia. Compared with short-read sequencing, nanopore sequencing has low capital and consumable costs, making it suitable for use in locations with limited health infrastructure. MATERIALS AND METHODS We show the feasibility of nanopore mRNA sequencing on 134 cryopreserved acute leukemia specimens (26 acute myeloid leukemia [AML], 73 B-lineage acute lymphoblastic leukemia [B-ALL], 34 T-lineage acute lymphoblastic leukemia, and one acute undifferentiated leukemia). Using multiple library preparation approaches, we generated long-read transcripts for each sample. We developed a novel composite classification approach to predict acute leukemia lineage and major B-ALL and AML molecular subtypes directly from gene expression profiles. RESULTS We demonstrate accurate classification of acute leukemia samples into AML, B-ALL, or T-lineage acute lymphoblastic leukemia (96.2% of cases are classifiable with a probability of > 0.8, with 100% accuracy) and further classification into clinically actionable genomic subtypes using shallow RNA nanopore sequencing, with 96.2% accuracy for major AML subtypes and 94.1% accuracy for major B-lineage acute lymphoblastic leukemia subtypes. CONCLUSION Transcriptional profiling of acute leukemia samples using nanopore technology for diagnostic classification is feasible and accurate, which has the potential to improve the accuracy of cancer diagnosis in low-resource settings.
Collapse
Affiliation(s)
- Jeremy Wang
- Department of Genetics, University of North Carolina, Chapel Hill, NC
| | - Nickhill Bhakta
- Department of Global Pediatric Medicine, St Jude Children's Research Hospital, Memphis, TN
| | - Vanessa Ayer Miller
- Office of Clinical Translational Research, University of North Carolina, Chapel Hill, NC
| | - Mahler Revsine
- Department of Biology, University of North Carolina, Chapel Hill, NC
| | - Mark R. Litzow
- Division of Hematology and Transplant Center, Mayo Clinic Rochester, Rochester, MN
| | | | - Yuri Fedoriw
- Department of Pathology and Laboratory Medicine, University of North Carolina, Chapel Hill, NC
| | - Kathryn G. Roberts
- Department of Pathology, St Jude Children's Research Hospital, Memphis, TN
| | - Zhaohui Gu
- Department of Computational and Quantitative Medicine & Systems Biology, Beckman Research Institute of City of Hope, Duarte, CA
| | | | - Corbin D. Jones
- Department of Biology, University of North Carolina, Chapel Hill, NC
| | - Thomas B. Alexander
- Department of Pathology and Laboratory Medicine, University of North Carolina, Chapel Hill, NC
- Department of Pediatrics, University of North Carolina, Chapel Hill, NC
| |
Collapse
|
24
|
Davidson NM, Chen Y, Sadras T, Ryland GL, Blombery P, Ekert PG, Göke J, Oshlack A. JAFFAL: detecting fusion genes with long-read transcriptome sequencing. Genome Biol 2022; 23:10. [PMID: 34991664 PMCID: PMC8739696 DOI: 10.1186/s13059-021-02588-5] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Accepted: 12/22/2021] [Indexed: 12/26/2022] Open
Abstract
In cancer, fusions are important diagnostic markers and targets for therapy. Long-read transcriptome sequencing allows the discovery of fusions with their full-length isoform structure. However, due to higher sequencing error rates, fusion finding algorithms designed for short reads do not work. Here we present JAFFAL, to identify fusions from long-read transcriptome sequencing. We validate JAFFAL using simulations, cell lines, and patient data from Nanopore and PacBio. We apply JAFFAL to single-cell data and find fusions spanning three genes demonstrating transcripts detected from complex rearrangements. JAFFAL is available at https://github.com/Oshlack/JAFFA/wiki .
Collapse
Affiliation(s)
- Nadia M Davidson
- Peter MacCallum Cancer Centre, Victoria, Australia.
- School of BioSciences, University of Melbourne, Victoria, Australia.
- The Walter and Eliza Hall Institute, Victoria, Australia.
| | - Ying Chen
- Genome Institute of Singapore, Singapore, Singapore
| | - Teresa Sadras
- Peter MacCallum Cancer Centre, Victoria, Australia
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Victoria, Australia
| | - Georgina L Ryland
- Peter MacCallum Cancer Centre, Victoria, Australia
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Victoria, Australia
- Centre for Cancer Research, University of Melbourne, Victoria, Australia
| | - Piers Blombery
- Peter MacCallum Cancer Centre, Victoria, Australia
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Victoria, Australia
| | - Paul G Ekert
- Peter MacCallum Cancer Centre, Victoria, Australia
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Victoria, Australia
- Children's Cancer Institute, Lowy Cancer Centre, UNSW, Sydney, NSW, Australia
- School of Women's and Children's Health, UNSW, Sydney, NSW, Australia
- Murdoch Children's Research Institute, Victoria, Australia
| | - Jonathan Göke
- Genome Institute of Singapore, Singapore, Singapore
- National Cancer Centre Singapore, Singapore, Singapore
| | - Alicia Oshlack
- Peter MacCallum Cancer Centre, Victoria, Australia.
- School of BioSciences, University of Melbourne, Victoria, Australia.
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Victoria, Australia.
| |
Collapse
|
25
|
Wang Y, Zhao Y, Bollas A, Wang Y, Au KF. Nanopore sequencing technology, bioinformatics and applications. Nat Biotechnol 2021; 39:1348-1365. [PMID: 34750572 PMCID: PMC8988251 DOI: 10.1038/s41587-021-01108-x] [Citation(s) in RCA: 804] [Impact Index Per Article: 201.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Accepted: 09/22/2021] [Indexed: 12/13/2022]
Abstract
Rapid advances in nanopore technologies for sequencing single long DNA and RNA molecules have led to substantial improvements in accuracy, read length and throughput. These breakthroughs have required extensive development of experimental and bioinformatics methods to fully exploit nanopore long reads for investigations of genomes, transcriptomes, epigenomes and epitranscriptomes. Nanopore sequencing is being applied in genome assembly, full-length transcript detection and base modification detection and in more specialized areas, such as rapid clinical diagnoses and outbreak surveillance. Many opportunities remain for improving data quality and analytical approaches through the development of new nanopores, base-calling methods and experimental protocols tailored to particular applications.
Collapse
Affiliation(s)
- Yunhao Wang
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA
| | - Yue Zhao
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA
- Biomedical Informatics Shared Resources, The Ohio State University, Columbus, OH, USA
| | - Audrey Bollas
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA
| | - Yuru Wang
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA
| | - Kin Fai Au
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA.
- Biomedical Informatics Shared Resources, The Ohio State University, Columbus, OH, USA.
| |
Collapse
|