1
|
Dorney R, Reis-das-Mercês L, Schmitz U. Architects and Partners: The Dual Roles of Non-coding RNAs in Gene Fusion Events. Methods Mol Biol 2025; 2883:231-255. [PMID: 39702711 DOI: 10.1007/978-1-0716-4290-0_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2024]
Abstract
Extensive research into gene fusions in cancer and other diseases has led to the discovery of novel biomarkers and therapeutic targets. Concurrently, various bioinformatics tools have been developed for fusion detection in RNA sequencing data, which, in the age of increasing affordability of sequencing, have delivered a large-scale identification of transcriptomic abnormalities. Historically, the focus of fusion transcript research was predominantly on coding RNAs and their resultant proteins, often overlooking non-coding RNAs (ncRNAs). This chapter discusses how ncRNAs are integral players in the landscape of gene fusions, detailing their contributions to the formation of gene fusions and their presence in chimeric transcripts. We delve into both linear and the more recently identified circular fusion RNAs, providing a comprehensive overview of the computational methodologies used to detect ncRNA-involved gene fusions. Additionally, we examine the inherent biases and limitations of these bioinformatics approaches, offering insights into the challenges and future directions in this dynamic field.
Collapse
Affiliation(s)
- Ryley Dorney
- Biomedical Sciences and Molecular Biology, College of Public Health, Medical & Vet Sciences, James Cook University, Douglas, QLD, Australia
- Centre for Tropical Bioinformatics and Molecular Biology, Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, Australia
| | - Laís Reis-das-Mercês
- Laboratory of Human and Medical Genetics, Institute of Biological Sciences, Federal University of Pará, Belem, PA, Brazil
| | - Ulf Schmitz
- Biomedical Sciences and Molecular Biology, College of Public Health, Medical & Vet Sciences, James Cook University, Douglas, QLD, Australia.
- Centre for Tropical Bioinformatics and Molecular Biology, Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, Australia.
- Computational BioMedicine Lab, Centenary Institute, The University of Sydney, Camperdown, NSW, Australia.
- Faculty of Medicine & Health, The University of Sydney, Camperdown, NSW, Australia.
| |
Collapse
|
2
|
Tan Y, Mohanty V, Liang S, Dou J, Ma J, Kim KH, Bonder MJ, Shi X, Lee C, Chong Z, Chen K. Novornabreak: Local Assembly for Novel Splice Junction and Fusion Transcript Detection from RNA-Seq Data. JOURNAL OF BIOINFORMATICS AND SYSTEMS BIOLOGY : OPEN ACCESS 2023; 6:74-81. [PMID: 39301431 PMCID: PMC11412692 DOI: 10.26502/jbsb.5107050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/22/2024]
Abstract
We present novoRNABreak, a unified framework for cancer specific novel splice junction and fusion transcript detection in RNA-seq data obtained from human cancer samples. novoRNABreak is based on a local assembly model, which offers a tradeoff between the alignment-based and de novo whole transcriptome assembly (WTA) methods. This approach is accurate and sensitive in assembling novel junctions that are difficult to directly align or have multiple alignments. Additionally, it is more efficient due to the strategy that focuses on junctions rather than full length transcripts. The performance of novoRNABreak is demonstrated by a comprehensive set of experiments using synthetic data generated based on genome reference, as well as real RNA-seq data from breast cancer and prostate cancer samples. The results show that our tool has a better performance by fully utilizing unmapped reads and precisely identifying the junctions where short reads or small exons have multiple alignments. novoRNABreak is a fully-fledged program available on GitHub (https://github.com/KChen-lab/novoRNABreak).
Collapse
Affiliation(s)
- Yukun Tan
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas, 77030, USA
| | - Vakul Mohanty
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas, 77030, USA
| | - Shaoheng Liang
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas, 77030, USA
| | - Jinzhuang Dou
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas, 77030, USA
| | - Jun Ma
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas, 77030, USA
| | - Kun Hee Kim
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas, 77030, USA
| | - Marc Jan Bonder
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, 69120, Germany
| | - Xinghua Shi
- Department of Computer & Information Sciences, College of Science and Technology, Temple University, Philadelphia, PA, 19122, USA
| | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
| | - Zechen Chong
- Department of Genetics, the University of Alabama at Birmingham, Birmingham, AL, 35233, USA
| | - Ken Chen
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas, 77030, USA
| |
Collapse
|
3
|
Jin Z, Huang W, Shen N, Li J, Wang X, Dong J, Park PJ, Xi R. Single-cell gene fusion detection by scFusion. Nat Commun 2022; 13:1084. [PMID: 35228538 PMCID: PMC8885711 DOI: 10.1038/s41467-022-28661-6] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Accepted: 02/03/2022] [Indexed: 11/09/2022] Open
Abstract
Gene fusions can play important roles in tumor initiation and progression. While fusion detection so far has been from bulk samples, full-length single-cell RNA sequencing (scRNA-seq) offers the possibility of detecting gene fusions at the single-cell level. However, scRNA-seq data have a high noise level and contain various technical artifacts that can lead to spurious fusion discoveries. Here, we present a computational tool, scFusion, for gene fusion detection based on scRNA-seq. We evaluate the performance of scFusion using simulated and five real scRNA-seq datasets and find that scFusion can efficiently and sensitively detect fusions with a low false discovery rate. In a T cell dataset, scFusion detects the invariant TCR gene recombinations in mucosal-associated invariant T cells that many methods developed for bulk data fail to detect; in a multiple myeloma dataset, scFusion detects the known recurrent fusion IgH-WHSC1, which is associated with overexpression of the WHSC1 oncogene. Our results demonstrate that scFusion can be used to investigate cellular heterogeneity of gene fusions and their transcriptional impact at the single-cell level.
Collapse
Affiliation(s)
- Zijie Jin
- School of Mathematical Sciences, Peking University, Beijing, 100871, China
| | - Wenjian Huang
- Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China
| | - Ning Shen
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, 311121, China
- Department of Biomedical Informatics, Harvard Medical School, Boston, 02115, MA, USA
| | - Juan Li
- Department of Biomedical Engineering, College of Engineering, Peking University, Beijing, 100871, China
| | - Xiaochen Wang
- School of Mathematical Sciences, Peking University, Beijing, 100871, China
| | - Jiqiao Dong
- GeneX Health Co. Ltd, Beijing, 100195, China
| | - Peter J Park
- Department of Biomedical Informatics, Harvard Medical School, Boston, 02115, MA, USA
| | - Ruibin Xi
- School of Mathematical Sciences, Peking University, Beijing, 100871, China.
- Center for Statistical Science, Peking University, Beijing, 100871, China.
| |
Collapse
|
4
|
Hussen BM, Abdullah ST, Salihi A, Sabir DK, Sidiq KR, Rasul MF, Hidayat HJ, Ghafouri-Fard S, Taheri M, Jamali E. The emerging roles of NGS in clinical oncology and personalized medicine. Pathol Res Pract 2022; 230:153760. [PMID: 35033746 DOI: 10.1016/j.prp.2022.153760] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Revised: 12/29/2021] [Accepted: 01/06/2022] [Indexed: 02/07/2023]
Abstract
Next-generation sequencing (NGS) has been increasingly popular in genomics studies over the last decade, as new sequencing technology has been created and improved. Recently, NGS started to be used in clinical oncology to improve cancer therapy through diverse modalities ranging from finding novel and rare cancer mutations, discovering cancer mutation carriers to reaching specific therapeutic approaches known as personalized medicine (PM). PM has the potential to minimize medical expenses by shifting the current traditional medical approach of treating cancer and other diseases to an individualized preventive and predictive approach. Currently, NGS can speed up in the early diagnosis of diseases and discover pharmacogenetic markers that help in personalizing therapies. Despite the tremendous growth in our understanding of genetics, NGS holds the added advantage of providing more comprehensive picture of cancer landscape and uncovering cancer development pathways. In this review, we provided a complete overview of potential NGS applications in scientific and clinical oncology, with a particular emphasis on pharmacogenomics in the direction of precision medicine treatment options.
Collapse
Affiliation(s)
- Bashdar Mahmud Hussen
- Department Pharmacognosy, College of Pharmacy, Hawler Medical University, Kurdistan Region, Erbil, Iraq; Center of Research and Strategic Studies, Lebanese French University, Kurdistan Region, Erbil, Iraq
| | - Sara Tharwat Abdullah
- Department of Pharmacology and Toxicology, College of Pharmacy, Hawler Medical University, Erbil, Iraq
| | - Abbas Salihi
- Center of Research and Strategic Studies, Lebanese French University, Kurdistan Region, Erbil, Iraq; Department of Biology, College of Science, Salahaddin University, Kurdistan Region, Erbil, Iraq
| | - Dana Khdr Sabir
- Department of Medical Laboratory Sciences, Charmo University, Kurdistan Region, Iraq
| | - Karzan R Sidiq
- Department of Biology, College of Education, University of Sulaimani, Sulaimani 334, Kurdistan, Iraq
| | - Mohammed Fatih Rasul
- Department of Medical Analysis, Faculty of Applied Science, Tishk International University, Kurdistan Region, Erbil, Iraq
| | - Hazha Jamal Hidayat
- Department of Biology, College of Education, Salahaddin University, Kurdistan Region, Erbil, Iraq
| | - Soudeh Ghafouri-Fard
- Department of Medical Genetics, School of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Mohammad Taheri
- Institute of Human Genetics, Jena University Hospital, Jena, Germany; Urology and Nephrology Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| | - Elena Jamali
- Skull Base Research Center, Loghman Hakim Hospital, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
5
|
Detroja R, Gorohovski A, Giwa O, Baum G, Frenkel-Morgenstern M. ChiTaH: a fast and accurate tool for identifying known human chimeric sequences from high-throughput sequencing data. NAR Genom Bioinform 2021; 3:lqab112. [PMID: 34859212 PMCID: PMC8633610 DOI: 10.1093/nargab/lqab112] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Revised: 10/21/2021] [Accepted: 11/22/2021] [Indexed: 12/16/2022] Open
Abstract
Fusion genes or chimeras typically comprise sequences from two different genes. The chimeric RNAs of such joined sequences often serve as cancer drivers. Identifying such driver fusions in a given cancer or complex disease is important for diagnosis and treatment. The advent of next-generation sequencing technologies, such as DNA-Seq or RNA-Seq, together with the development of suitable computational tools, has made the global identification of chimeras in tumors possible. However, the testing of over 20 computational methods showed these to be limited in terms of chimera prediction sensitivity, specificity, and accurate quantification of junction reads. These shortcomings motivated us to develop the first ‘reference-based’ approach termed ChiTaH (Chimeric Transcripts from High–throughput sequencing data). ChiTaH uses 43,466 non–redundant known human chimeras as a reference database to map sequencing reads and to accurately identify chimeric reads. We benchmarked ChiTaH and four other methods to identify human chimeras, leveraging both simulated and real sequencing datasets. ChiTaH was found to be the most accurate and fastest method for identifying known human chimeras from simulated and sequencing datasets. Moreover, especially ChiTaH uncovered heterogeneity of the BCR-ABL1 chimera in both bulk and single-cells of the K-562 cell line, which was confirmed experimentally.
Collapse
Affiliation(s)
- Rajesh Detroja
- Cancer Genomics and BioComputing of Complex Diseases Lab, Azrieli Faculty of Medicine, Bar-Ilan University, Safed 1311502, Israel
| | - Alessandro Gorohovski
- Cancer Genomics and BioComputing of Complex Diseases Lab, Azrieli Faculty of Medicine, Bar-Ilan University, Safed 1311502, Israel
| | - Olawumi Giwa
- Cancer Genomics and BioComputing of Complex Diseases Lab, Azrieli Faculty of Medicine, Bar-Ilan University, Safed 1311502, Israel
| | - Gideon Baum
- Cancer Genomics and BioComputing of Complex Diseases Lab, Azrieli Faculty of Medicine, Bar-Ilan University, Safed 1311502, Israel
| | - Milana Frenkel-Morgenstern
- Cancer Genomics and BioComputing of Complex Diseases Lab, Azrieli Faculty of Medicine, Bar-Ilan University, Safed 1311502, Israel
| |
Collapse
|
6
|
Jilani M, Haspel N. Computational Methods for Detecting Large-Scale Structural Rearrangements in Chromosomes. Bioinformatics 2021. [DOI: 10.36255/exonpublications.bioinformatics.2021.ch3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
7
|
Liu Q, Hu Y, Stucky A, Fang L, Zhong JF, Wang K. LongGF: computational algorithm and software tool for fast and accurate detection of gene fusions by long-read transcriptome sequencing. BMC Genomics 2020; 21:793. [PMID: 33372596 PMCID: PMC7771079 DOI: 10.1186/s12864-020-07207-4] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Accepted: 10/29/2020] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Long-read RNA-Seq techniques can generate reads that encompass a large proportion or the entire mRNA/cDNA molecules, so they are expected to address inherited limitations of short-read RNA-Seq techniques that typically generate < 150 bp reads. However, there is a general lack of software tools for gene fusion detection from long-read RNA-seq data, which takes into account the high basecalling error rates and the presence of alignment errors. RESULTS In this study, we developed a fast computational tool, LongGF, to efficiently detect candidate gene fusions from long-read RNA-seq data, including cDNA sequencing data and direct mRNA sequencing data. We evaluated LongGF on tens of simulated long-read RNA-seq datasets, and demonstrated its superior performance in gene fusion detection. We also tested LongGF on a Nanopore direct mRNA sequencing dataset and a PacBio sequencing dataset generated on a mixture of 10 cancer cell lines, and found that LongGF achieved better performance to detect known gene fusions over existing computational tools. Furthermore, we tested LongGF on a Nanopore cDNA sequencing dataset on acute myeloid leukemia, and pinpointed the exact location of a translocation (previously known in cytogenetic resolution) in base resolution, which was further validated by Sanger sequencing. CONCLUSIONS In summary, LongGF will greatly facilitate the discovery of candidate gene fusion events from long-read RNA-Seq data, especially in cancer samples. LongGF is implemented in C++ and is available at https://github.com/WGLab/LongGF .
Collapse
Affiliation(s)
- Qian Liu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Yu Hu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Andres Stucky
- Department of Otolaryngology, Keck School of Medicine, University of Southern California, Los Angeles, CA, 90033, USA
| | - Li Fang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Jiang F Zhong
- Department of Otolaryngology, Keck School of Medicine, University of Southern California, Los Angeles, CA, 90033, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
8
|
Yang X, Saito Y, Rao A, Kim HJ, Singh P, Scott E, Larson M, Pan W, Desai M, Hubbell E. Alignment-free filtering for cfNA fusion fragments. Bioinformatics 2020; 35:i225-i232. [PMID: 31510681 PMCID: PMC6612805 DOI: 10.1093/bioinformatics/btz346] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Motivation Cell-free nucleic acid (cfNA) sequencing data require improvements to existing fusion detection methods along multiple axes: high depth of sequencing, low allele fractions, short fragment lengths and specialized barcodes, such as unique molecular identifiers. Results AF4 was developed to address these challenges. It uses a novel alignment-free kmer-based method to detect candidate fusion fragments with high sensitivity and orders of magnitude faster than existing tools. Candidate fragments are then filtered using a max-cover criterion that significantly reduces spurious matches while retaining authentic fusion fragments. This efficient first stage reduces the data sufficiently that commonly used criteria can process the remaining information, or sophisticated filtering policies that may not scale to the raw reads can be used. AF4 provides both targeted and de novo fusion detection modes. We demonstrate both modes in benchmark simulated and real RNA-seq data as well as clinical and cell-line cfNA data. Availability and implementation AF4 is open sourced, licensed under Apache License 2.0, and is available at: https://github.com/grailbio/bio/tree/master/fusion.
Collapse
|
9
|
Kim P, Jang YE, Lee S. FusionScan: accurate prediction of fusion genes from RNA-Seq data. Genomics Inform 2019; 17:e26. [PMID: 31610622 PMCID: PMC6808644 DOI: 10.5808/gi.2019.17.3.e26] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Accepted: 03/21/2019] [Indexed: 01/10/2023] Open
Abstract
Identification of fusion gene is of prominent importance in cancer research field because of their potential as carcinogenic drivers. RNA sequencing (RNA-Seq) data have been the most useful source for identification of fusion transcripts. Although a number of algorithms have been developed thus far, most programs produce too many false-positives, thus making experimental confirmation almost impossible. We still lack a reliable program that achieves high precision with reasonable recall rate. Here, we present FusionScan, a highly optimized tool for predicting fusion transcripts from RNA-Seq data. We specifically search for split reads composed of intact exons at the fusion boundaries. Using 269 known fusion cases as the reference, we have implemented various mapping and filtering strategies to remove false-positives without discarding genuine fusions. In the performance test using three cell line datasets with validated fusion cases (NCI-H660, K562, and MCF-7), FusionScan outperformed other existing programs by a considerable margin, achieving the precision and recall rates of 60% and 79%, respectively. Simulation test also demonstrated that FusionScan recovered most of true positives without producing an overwhelming number of false-positives regardless of sequencing depth and read length. The computation time was comparable to other leading tools. We also provide several curative means to help users investigate the details of fusion candidates easily. We believe that FusionScan would be a reliable, efficient and convenient program for detecting fusion transcripts that meet the requirements in the clinical and experimental community. FusionScan is freely available at http://fusionscan.ewha.ac.kr/.
Collapse
Affiliation(s)
- Pora Kim
- Ewha Research Center for Systems Biology (ERCSB), Ewha Womans University, Seoul 03760, Korea
| | - Ye Eun Jang
- Ewha Research Center for Systems Biology (ERCSB), Ewha Womans University, Seoul 03760, Korea.,Department of Bio-Information Science, Ewha Womans University, Seoul 03760, Korea
| | - Sanghyuk Lee
- Ewha Research Center for Systems Biology (ERCSB), Ewha Womans University, Seoul 03760, Korea.,Department of Bio-Information Science, Ewha Womans University, Seoul 03760, Korea.,Department of Life Science, Ewha Womans University, Seoul 03760, Korea
| |
Collapse
|
10
|
Tang Y, Ma S, Wang X, Xing Q, Huang T, Liu H, Li Q, Zhang Y, Zhang K, Yao M, Yang GL, Li H, Zang X, Yang B, Guan F. Identification of chimeric RNAs in human infant brains and their implications in neural differentiation. Int J Biochem Cell Biol 2019; 111:19-26. [DOI: 10.1016/j.biocel.2019.03.012] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Revised: 03/06/2019] [Accepted: 03/30/2019] [Indexed: 02/07/2023]
|
11
|
Vu TN, Deng W, Trac QT, Calza S, Hwang W, Pawitan Y. A fast detection of fusion genes from paired-end RNA-seq data. BMC Genomics 2018; 19:786. [PMID: 30382840 PMCID: PMC6211471 DOI: 10.1186/s12864-018-5156-1] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2017] [Accepted: 10/10/2018] [Indexed: 01/03/2023] Open
Abstract
Background Fusion genes are known to be drivers of many common cancers, so they are potential markers for diagnosis, prognosis or therapy response. The advent of paired-end RNA sequencing enhances our ability to discover fusion genes. While there are available methods, routine analyses of large number of samples are still limited due to high computational demands. Results We develop FuSeq, a fast and accurate method to discover fusion genes based on quasi-mapping to quickly map the reads, extract initial candidates from split reads and fusion equivalence classes of mapped reads, and finally apply multiple filters and statistical tests to get the final candidates. We apply FuSeq to four validated datasets: breast cancer, melanoma and glioma datasets, and one spike-in dataset. The results reveal high sensitivity and specificity in all datasets, and compare well against other methods such as FusionMap, TRUP, TopHat-Fusion, SOAPfuse and JAFFA. In terms of computational time, FuSeq is two-fold faster than FusionMap and orders of magnitude faster than the other methods. Conclusions With this advantage of less computational demands, FuSeq makes it practical to investigate fusion genes in large numbers of samples. FuSeq is implemented in C++ and R, and available at https://github.com/nghiavtr/FuSeqfor non-commercial uses. Electronic supplementary material The online version of this article (10.1186/s12864-018-5156-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Trung Nghia Vu
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Nobels väg 12A, Stockholm, 17177, Sweden
| | - Wenjiang Deng
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Nobels väg 12A, Stockholm, 17177, Sweden
| | - Quang Thinh Trac
- Department of Computational Sciences and Engineering, VNU University of Engineering and Technology, Xuan Thuy, 144, Hanoi, 84024, Vietnam
| | - Stefano Calza
- Department of Molecular and Translational Medicine, University of Brescia, Viale Europa, 11, Brescia, 25125, Italy
| | - Woochang Hwang
- Data Science for Knowledge Creation Research Center, Seoul National University, Seoul, 151-747, South Korea
| | - Yudi Pawitan
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Nobels väg 12A, Stockholm, 17177, Sweden.
| |
Collapse
|
12
|
Yu Y, Liu J, Liu X, Zhang Y, Magner E, Lehnert E, Qian C, Liu J. SeqOthello: querying RNA-seq experiments at scale. Genome Biol 2018; 19:167. [PMID: 30340508 PMCID: PMC6194578 DOI: 10.1186/s13059-018-1535-9] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2018] [Accepted: 09/11/2018] [Indexed: 01/31/2023] Open
Abstract
We present SeqOthello, an ultra-fast and memory-efficient indexing structure to support arbitrary sequence query against large collections of RNA-seq experiments. It takes SeqOthello only 5 min and 19.1 GB memory to conduct a global survey of 11,658 fusion events against 10,113 TCGA Pan-Cancer RNA-seq datasets. The query recovers 92.7% of tier-1 fusions curated by TCGA Fusion Gene Database and reveals 270 novel occurrences, all of which are present as tumor-specific. By providing a reference-free, alignment-free, and parameter-free sequence search system, SeqOthello will enable large-scale integrative studies using sequence-level data, an undertaking not previously practicable for many individual labs.
Collapse
Affiliation(s)
- Ye Yu
- Department of Computer Science, University of Kentucky, 301 Rose St, Lexington, KY, 40508, USA
| | - Jinpeng Liu
- Department of Computer Science, University of Kentucky, 301 Rose St, Lexington, KY, 40508, USA
| | - Xinan Liu
- Department of Computer Science, University of Kentucky, 301 Rose St, Lexington, KY, 40508, USA
| | - Yi Zhang
- Department of Computer Science, University of Kentucky, 301 Rose St, Lexington, KY, 40508, USA
| | - Eamonn Magner
- Department of Computer Science, University of Kentucky, 301 Rose St, Lexington, KY, 40508, USA
| | - Erik Lehnert
- Seven Bridges Genomics Inc, 1 Main St, 5th Floor, Suite 500, Cambridge, MA, 02142, USA
| | - Chen Qian
- Department of Computer Engineering, University of California Santa Cruz, 1156 High Street, Santa Cruz, CA, 95064, USA
| | - Jinze Liu
- Department of Computer Science, University of Kentucky, 301 Rose St, Lexington, KY, 40508, USA.
| |
Collapse
|
13
|
Clinical cancer genomic profiling by three-platform sequencing of whole genome, whole exome and transcriptome. Nat Commun 2018; 9:3962. [PMID: 30262806 PMCID: PMC6160438 DOI: 10.1038/s41467-018-06485-7] [Citation(s) in RCA: 147] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2017] [Accepted: 08/24/2018] [Indexed: 12/17/2022] Open
Abstract
To evaluate the potential of an integrated clinical test to detect diverse classes of somatic and germline mutations relevant to pediatric oncology, we performed three-platform whole-genome (WGS), whole exome (WES) and transcriptome (RNA-Seq) sequencing of tumors and normal tissue from 78 pediatric cancer patients in a CLIA-certified, CAP-accredited laboratory. Our analysis pipeline achieves high accuracy by cross-validating variants between sequencing types, thereby removing the need for confirmatory testing, and facilitates comprehensive reporting in a clinically-relevant timeframe. Three-platform sequencing has a positive predictive value of 97–99, 99, and 91% for somatic SNVs, indels and structural variations, respectively, based on independent experimental verification of 15,225 variants. We report 240 pathogenic variants across all cases, including 84 of 86 known from previous diagnostic testing (98% sensitivity). Combined WES and RNA-Seq, the current standard for precision oncology, achieved only 78% sensitivity. These results emphasize the critical need for incorporating WGS in pediatric oncology testing. Clinical oncology is rapidly adopting next-generation sequencing technology for nucleotide variant and indel detection. Here the authors present a three-platform approach (whole-genome, whole-exome, and whole-transcriptome) in pediatric patients for the detection of diverse types of germline and somatic variants.
Collapse
|
14
|
Chwalenia K, Facemire L, Li H. Chimeric RNAs in cancer and normal physiology. WILEY INTERDISCIPLINARY REVIEWS-RNA 2017; 8. [DOI: 10.1002/wrna.1427] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2016] [Revised: 04/27/2017] [Accepted: 04/28/2017] [Indexed: 12/20/2022]
Affiliation(s)
- Katarzyna Chwalenia
- Department of Pathology, School of Medicine; University of Virginia; Charlottesville VA USA
| | - Loryn Facemire
- Department of Pathology, School of Medicine; University of Virginia; Charlottesville VA USA
| | - Hui Li
- Department of Pathology, School of Medicine; University of Virginia; Charlottesville VA USA
- Department of Biochemistry and Molecular Genetics, School of Medicine; University of Virginia; Charlottesville VA USA
| |
Collapse
|
15
|
Okonechnikov K, Imai-Matsushima A, Paul L, Seitz A, Meyer TF, Garcia-Alcalde F. InFusion: Advancing Discovery of Fusion Genes and Chimeric Transcripts from Deep RNA-Sequencing Data. PLoS One 2016; 11:e0167417. [PMID: 27907167 PMCID: PMC5132003 DOI: 10.1371/journal.pone.0167417] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2016] [Accepted: 11/14/2016] [Indexed: 12/21/2022] Open
Abstract
Analysis of fusion transcripts has become increasingly important due to their link with cancer development. Since high-throughput sequencing approaches survey fusion events exhaustively, several computational methods for the detection of gene fusions from RNA-seq data have been developed. This kind of analysis, however, is complicated by native trans-splicing events, the splicing-induced complexity of the transcriptome and biases and artefacts introduced in experiments and data analysis. There are a number of tools available for the detection of fusions from RNA-seq data; however, certain differences in specificity and sensitivity between commonly used approaches have been found. The ability to detect gene fusions of different types, including isoform fusions and fusions involving non-coding regions, has not been thoroughly studied yet. Here, we propose a novel computational toolkit called InFusion for fusion gene detection from RNA-seq data. InFusion introduces several unique features, such as discovery of fusions involving intergenic regions, and detection of anti-sense transcription in chimeric RNAs based on strand-specificity. Our approach demonstrates superior detection accuracy on simulated data and several public RNA-seq datasets. This improved performance was also evident when evaluating data from RNA deep-sequencing of two well-established prostate cancer cell lines. InFusion identified 26 novel fusion events that were validated in vitro, including alternatively spliced gene fusion isoforms and chimeric transcripts that include intergenic regions. The toolkit is freely available to download from http:/bitbucket.org/kokonech/infusion.
Collapse
Affiliation(s)
- Konstantin Okonechnikov
- Department of Molecular Biology, Max Planck Institute for Infection Biology, Berlin, Germany
| | - Aki Imai-Matsushima
- Department of Molecular Biology, Max Planck Institute for Infection Biology, Berlin, Germany
| | - Lukas Paul
- Lexogen GmbH, Campus Vienna Biocenter 5, Vienna, Austria
| | | | - Thomas F. Meyer
- Department of Molecular Biology, Max Planck Institute for Infection Biology, Berlin, Germany
- * E-mail: (FGA); (TFM)
| | - Fernando Garcia-Alcalde
- Department of Molecular Biology, Max Planck Institute for Infection Biology, Berlin, Germany
- * E-mail: (FGA); (TFM)
| |
Collapse
|
16
|
Kumar S, Razzaq SK, Vo AD, Gautam M, Li H. Identifying fusion transcripts using next generation sequencing. WILEY INTERDISCIPLINARY REVIEWS. RNA 2016; 7:811-823. [PMID: 27485475 PMCID: PMC5065767 DOI: 10.1002/wrna.1382] [Citation(s) in RCA: 69] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/20/2016] [Revised: 07/05/2016] [Accepted: 07/07/2016] [Indexed: 01/14/2023]
Abstract
Fusion transcripts (i.e., chimeric RNAs) resulting from gene fusions have been used successfully for cancer diagnosis, prognosis, and therapeutic applications. In addition, many fusion transcripts are found in normal human cell lines and tissues, with some data supporting their role in normal physiology. Besides chromosomal rearrangement, intergenic splicing can generate them. Global identification of fusion transcripts becomes possible with the help of next generation sequencing technology like RNA-Seq. In the past decade, major advancements have been made for chimeric RNA discovery due to the development of advanced sequencing platform and software packages. However, current software tools behave differently in terms of specificity, sensitivity, time, and computational memory usage. Recent benchmarking studies showed that none of the tools are inclusive. The development of high performance (accurate and fast), and user-friendly fusion detection tool/pipeline is still an open quest. In this article, we review the existing software packages for fusion detection. We explain the methods of the tools, and discuss various factors that affect fusion detection. We summarize conclusions drawn from several comparative studies, and then discuss some of the pitfalls of these studies. We also describe the limitations of current tools, and suggest directions for future development. WIREs RNA 2016, 7:811-823. doi: 10.1002/wrna.1382 For further resources related to this article, please visit the WIREs website.
Collapse
Affiliation(s)
- Shailesh Kumar
- Department of Pathology, School of Medicine, University of Virginia, Charlottesville, VA, USA
| | - Sundus Khalid Razzaq
- Department of Pathology, School of Medicine, University of Virginia, Charlottesville, VA, USA
| | - Angie Duy Vo
- Department of Pathology, School of Medicine, University of Virginia, Charlottesville, VA, USA
| | - Mamta Gautam
- Department of Pathology, School of Medicine, University of Virginia, Charlottesville, VA, USA
| | - Hui Li
- Department of Pathology, School of Medicine, University of Virginia, Charlottesville, VA, USA.
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA, USA.
| |
Collapse
|
17
|
Shlien A, Raine K, Fuligni F, Arnold R, Nik-Zainal S, Dronov S, Mamanova L, Rosic A, Ju YS, Cooke SL, Ramakrishna M, Papaemmanuil E, Davies HR, Tarpey PS, Van Loo P, Wedge DC, Jones DR, Martin S, Marshall J, Anderson E, Hardy C, Barbashina V, Aparicio SAJR, Sauer T, Garred Ø, Vincent-Salomon A, Mariani O, Boyault S, Fatima A, Langerød A, Borg Å, Thomas G, Richardson AL, Børresen-Dale AL, Polyak K, Stratton MR, Campbell PJ. Direct Transcriptional Consequences of Somatic Mutation in Breast Cancer. Cell Rep 2016; 16:2032-46. [PMID: 27498871 PMCID: PMC4987284 DOI: 10.1016/j.celrep.2016.07.028] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2014] [Revised: 06/03/2016] [Accepted: 07/14/2016] [Indexed: 12/02/2022] Open
Abstract
Disordered transcriptomes of cancer encompass direct effects of somatic mutation on transcription, coordinated secondary pathway alterations, and increased transcriptional noise. To catalog the rules governing how somatic mutation exerts direct transcriptional effects, we developed an exhaustive pipeline for analyzing RNA sequencing data, which we integrated with whole genomes from 23 breast cancers. Using X-inactivation analyses, we found that cancer cells are more transcriptionally active than intermixed stromal cells. This is especially true in estrogen receptor (ER)-negative tumors. Overall, 59% of substitutions were expressed. Nonsense mutations showed lower expression levels than expected, with patterns characteristic of nonsense-mediated decay. 14% of 4,234 rearrangements caused transcriptional abnormalities, including exon skips, exon reusage, fusions, and premature polyadenylation. We found productive, stable transcription from sense-to-antisense gene fusions and gene-to-intergenic rearrangements, suggesting that these mutation classes drive more transcriptional disruption than previously suspected. Systematic integration of transcriptome with genome data reveals the rules by which transcriptional machinery interprets somatic mutation. Greater transcriptional activity in cancer than stromal cells, particularly when ER-ve Intron mutations only infrequently affect splicing, even at essential splice sites Distinctive RNA effects of sense-to-antisense and gene-to-intergenic rearrangements Exhaustive pipeline for identifying aberrant transcripts from RNA-sequencing data
Collapse
Affiliation(s)
- Adam Shlien
- Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK.
| | - Keiran Raine
- Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Fabio Fuligni
- Department of Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON M5G 1X8, Canada
| | - Roland Arnold
- Department of Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON M5G 1X8, Canada
| | - Serena Nik-Zainal
- Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Serge Dronov
- Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Lira Mamanova
- Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Andrej Rosic
- Department of Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON M5G 1X8, Canada
| | - Young Seok Ju
- Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Susanna L Cooke
- Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Manasa Ramakrishna
- Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Elli Papaemmanuil
- Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Helen R Davies
- Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Patrick S Tarpey
- Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Peter Van Loo
- Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK; Department of Human Genetics, University of Leuven, 3000 Leuven, Belgium
| | - David C Wedge
- Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK
| | - David R Jones
- Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Sancha Martin
- Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK
| | - John Marshall
- Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Elizabeth Anderson
- Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Claire Hardy
- Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK
| | | | - Violetta Barbashina
- Breakthrough Breast Cancer, The Institute of Cancer Research, London SM2 5NG, UK
| | | | - Torill Sauer
- Department of Pathology, Oslo University Hospital, 0450 Oslo, Norway
| | - Øystein Garred
- Department of Pathology, Oslo University Hospital, 0450 Oslo, Norway
| | | | | | | | | | - Anita Langerød
- Department of Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, 0379 Oslo, Norway; K.G. Jebsen Center for Breast Cancer Research, Institute for Clinical Medicine, Faculty of Medicine, University of Oslo, 0316 Oslo, Norway
| | - Åke Borg
- Department of Oncology, Lund University, SE-221 00 Lund, Sweden
| | - Gilles Thomas
- Synergie Lyon Cancer, Centre Léon Bérard, 69008 Lyon, France
| | | | - Anne-Lise Børresen-Dale
- Department of Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, 0379 Oslo, Norway; K.G. Jebsen Center for Breast Cancer Research, Institute for Clinical Medicine, Faculty of Medicine, University of Oslo, 0316 Oslo, Norway
| | - Kornelia Polyak
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Michael R Stratton
- Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Peter J Campbell
- Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK; Department of Haematology, Addenbrooke's Hospital, Cambridge CB2 0QQ, UK; Department of Haematology, University of Cambridge, Cambridge CB2 1TN, UK.
| |
Collapse
|
18
|
Latysheva NS, Babu MM. Discovering and understanding oncogenic gene fusions through data intensive computational approaches. Nucleic Acids Res 2016; 44:4487-503. [PMID: 27105842 PMCID: PMC4889949 DOI: 10.1093/nar/gkw282] [Citation(s) in RCA: 121] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2016] [Accepted: 03/24/2016] [Indexed: 12/21/2022] Open
Abstract
Although gene fusions have been recognized as important drivers of cancer for decades, our understanding of the prevalence and function of gene fusions has been revolutionized by the rise of next-generation sequencing, advances in bioinformatics theory and an increasing capacity for large-scale computational biology. The computational work on gene fusions has been vastly diverse, and the present state of the literature is fragmented. It will be fruitful to merge three camps of gene fusion bioinformatics that appear to rarely cross over: (i) data-intensive computational work characterizing the molecular biology of gene fusions; (ii) development research on fusion detection tools, candidate fusion prioritization algorithms and dedicated fusion databases and (iii) clinical research that seeks to either therapeutically target fusion transcripts and proteins or leverages advances in detection tools to perform large-scale surveys of gene fusion landscapes in specific cancer types. In this review, we unify these different-yet highly complementary and symbiotic-approaches with the view that increased synergy will catalyze advancements in gene fusion identification, characterization and significance evaluation.
Collapse
Affiliation(s)
- Natasha S Latysheva
- MRC Laboratory of Molecular Biology, Francis Crick Ave, Cambridge CB2 0QH, United Kingdom
| | - M Madan Babu
- MRC Laboratory of Molecular Biology, Francis Crick Ave, Cambridge CB2 0QH, United Kingdom
| |
Collapse
|
19
|
Chuang TJ, Wu CS, Chen CY, Hung LY, Chiang TW, Yang MY. NCLscan: accurate identification of non-co-linear transcripts (fusion, trans-splicing and circular RNA) with a good balance between sensitivity and precision. Nucleic Acids Res 2016; 44:e29. [PMID: 26442529 PMCID: PMC4756807 DOI: 10.1093/nar/gkv1013] [Citation(s) in RCA: 87] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2015] [Revised: 09/23/2015] [Accepted: 09/24/2015] [Indexed: 12/19/2022] Open
Abstract
Analysis of RNA-seq data often detects numerous 'non-co-linear' (NCL) transcripts, which comprised sequence segments that are topologically inconsistent with their corresponding DNA sequences in the reference genome. However, detection of NCL transcripts involves two major challenges: removal of false positives arising from alignment artifacts and discrimination between different types of NCL transcripts (trans-spliced, circular or fusion transcripts). Here, we developed a new NCL-transcript-detecting method ('NCLscan'), which utilized a stepwise alignment strategy to almost completely eliminate false calls (>98% precision) without sacrificing true positives, enabling NCLscan outperform 18 other publicly-available tools (including fusion- and circular-RNA-detecting tools) in terms of sensitivity and precision, regardless of the generation strategy of simulated dataset, type of intragenic or intergenic NCL event, read depth of coverage, read length or expression level of NCL transcript. With the high accuracy, NCLscan was applied to distinguishing between trans-spliced, circular and fusion transcripts on the basis of poly(A)- and nonpoly(A)-selected RNA-seq data. We showed that circular RNAs were expressed more ubiquitously, more abundantly and less cell type-specifically than trans-spliced and fusion transcripts. Our study thus describes a robust pipeline for the discovery of NCL transcripts, and sheds light on the fundamental biology of these non-canonical RNA events in human transcriptome.
Collapse
Affiliation(s)
- Trees-Juen Chuang
- Division of Physical and Computational Genomics, Genomics Research Center, Academia Sinica, Taipei 11529, Taiwan
| | - Chan-Shuo Wu
- Division of Physical and Computational Genomics, Genomics Research Center, Academia Sinica, Taipei 11529, Taiwan
| | - Chia-Ying Chen
- Division of Physical and Computational Genomics, Genomics Research Center, Academia Sinica, Taipei 11529, Taiwan
| | - Li-Yuan Hung
- Division of Physical and Computational Genomics, Genomics Research Center, Academia Sinica, Taipei 11529, Taiwan
| | - Tai-Wei Chiang
- Division of Physical and Computational Genomics, Genomics Research Center, Academia Sinica, Taipei 11529, Taiwan
| | - Min-Yu Yang
- Division of Physical and Computational Genomics, Genomics Research Center, Academia Sinica, Taipei 11529, Taiwan
| |
Collapse
|
20
|
Comparative assessment of methods for the fusion transcripts detection from RNA-Seq data. Sci Rep 2016; 6:21597. [PMID: 26862001 PMCID: PMC4748267 DOI: 10.1038/srep21597] [Citation(s) in RCA: 115] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2015] [Accepted: 01/27/2016] [Indexed: 12/12/2022] Open
Abstract
RNA-Seq made possible the global identification of fusion transcripts, i.e. "chimeric RNAs". Even though various software packages have been developed to serve this purpose, they behave differently in different datasets provided by different developers. It is important for both users, and developers to have an unbiased assessment of the performance of existing fusion detection tools. Toward this goal, we compared the performance of 12 well-known fusion detection software packages. We evaluated the sensitivity, false discovery rate, computing time, and memory usage of these tools in four different datasets (positive, negative, mixed, and test). We conclude that some tools are better than others in terms of sensitivity, positive prediction value, time consumption and memory usage. We also observed small overlaps of the fusions detected by different tools in the real dataset (test dataset). This could be due to false discoveries by various tools, but could also be due to the reason that none of the tools are inclusive. We have found that the performance of the tools depends on the quality, read length, and number of reads of the RNA-Seq data. We recommend that users choose the proper tools for their purpose based on the properties of their RNA-Seq data.
Collapse
|
21
|
Izuogu OG, Alhasan AA, Alafghani HM, Santibanez-Koref M, Elliott DJ, Elliot DJ, Jackson MS. PTESFinder: a computational method to identify post-transcriptional exon shuffling (PTES) events. BMC Bioinformatics 2016; 17:31. [PMID: 26758031 PMCID: PMC4711006 DOI: 10.1186/s12859-016-0881-4] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2015] [Accepted: 01/06/2016] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Transcripts, which have been subject to Post-transcriptional exon shuffling (PTES), have an exon order inconsistent with the underlying genomic sequence. These have been identified in a wide variety of tissues and cell types from many eukaryotes, and are now known to be mostly circular, cytoplasmic, and non-coding. Although there is no uniformly ascribed function, several have been shown to be involved in gene regulation. Accurate identification of these transcripts can, however, be difficult due to artefacts from a wide variety of sources. RESULTS Here, we present a computational method, PTESFinder, to identify these transcripts from high throughput RNAseq data. Uniquely, it systematically excludes potential artefacts emanating from pseudogenes, segmental duplications, and template switching, and outputs both PTES and canonical exon junction counts to facilitate comparative analyses. In comparison with four existing methods, PTESFinder achieves highest specificity and comparable sensitivity at a variety of read depths. PTESFinder also identifies between 13 % and 41.6 % more structures, compared to publicly available methods recently used to identify human circular RNAs. CONCLUSIONS With high sensitivity and specificity, user-adjustable filters that target known sources of false positives, and tailored output to facilitate comparison of transcript levels, PTESFinder will facilitate the discovery and analysis of these poorly understood transcripts.
Collapse
Affiliation(s)
- Osagie G Izuogu
- Institute of Genetic Medicine, Newcastle University, Newcastle Upon Tyne, UK.
| | - Abd A Alhasan
- Institute of Genetic Medicine, Newcastle University, Newcastle Upon Tyne, UK.
| | - Hani M Alafghani
- Security Forces Hostpital, P. O. Box 2748-24268-8541, Makkah, Kingdom of Saudi Arabia.
| | | | | | - David J Elliot
- Institute of Genetic Medicine, Newcastle University, Newcastle Upon Tyne, UK.
| | - Michael S Jackson
- Institute of Genetic Medicine, Newcastle University, Newcastle Upon Tyne, UK.
| |
Collapse
|
22
|
Arsenijevic V, Davis-Dusenbery BN. Reproducible, Scalable Fusion Gene Detection from RNA-Seq. Methods Mol Biol 2016; 1381:223-37. [PMID: 26667464 DOI: 10.1007/978-1-4939-3204-7_13] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Chromosomal rearrangements resulting in the creation of novel gene products, termed fusion genes, have been identified as driving events in the development of multiple types of cancer. As these gene products typically do not exist in normal cells, they represent valuable prognostic and therapeutic targets. Advances in next-generation sequencing and computational approaches have greatly improved our ability to detect and identify fusion genes. Nevertheless, these approaches require significant computational resources. Here we describe an approach which leverages cloud computing technologies to perform fusion gene detection from RNA sequencing data at any scale. We additionally highlight methods to enhance reproducibility of bioinformatics analyses which may be applied to any next-generation sequencing experiment.
Collapse
Affiliation(s)
- Vladan Arsenijevic
- Department of Bioinformatics, Seven Bridges Genomics, One Broadway, 14th Floor, Cambridge, MA, 02142, USA
| | - Brandi N Davis-Dusenbery
- Department of Bioinformatics, Seven Bridges Genomics, One Broadway, 14th Floor, Cambridge, MA, 02142, USA.
| |
Collapse
|
23
|
Liu S, Tsai WH, Ding Y, Chen R, Fang Z, Huo Z, Kim S, Ma T, Chang TY, Priedigkeit NM, Lee AV, Luo J, Wang HW, Chung IF, Tseng GC. Comprehensive evaluation of fusion transcript detection algorithms and a meta-caller to combine top performing methods in paired-end RNA-seq data. Nucleic Acids Res 2015; 44:e47. [PMID: 26582927 PMCID: PMC4797269 DOI: 10.1093/nar/gkv1234] [Citation(s) in RCA: 112] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2015] [Accepted: 10/24/2015] [Indexed: 12/31/2022] Open
Abstract
Background: Fusion transcripts are formed by either fusion genes (DNA level) or trans-splicing events (RNA level). They have been recognized as a promising tool for diagnosing, subtyping and treating cancers. RNA-seq has become a precise and efficient standard for genome-wide screening of such aberration events. Many fusion transcript detection algorithms have been developed for paired-end RNA-seq data but their performance has not been comprehensively evaluated to guide practitioners. In this paper, we evaluated 15 popular algorithms by their precision and recall trade-off, accuracy of supporting reads and computational cost. We further combine top-performing methods for improved ensemble detection. Results: Fifteen fusion transcript detection tools were compared using three synthetic data sets under different coverage, read length, insert size and background noise, and three real data sets with selected experimental validations. No single method dominantly performed the best but SOAPfuse generally performed well, followed by FusionCatcher and JAFFA. We further demonstrated the potential of a meta-caller algorithm by combining top performing methods to re-prioritize candidate fusion transcripts with high confidence that can be followed by experimental validation. Conclusion: Our result provides insightful recommendations when applying individual tool or combining top performers to identify fusion transcript candidates.
Collapse
Affiliation(s)
- Silvia Liu
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, 130 De Soto Street, Pittsburgh, PA 15261, USA Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Biomedical Science Tower 3, 3501 Fifth Avenue, Pittsburgh, PA 15213, USA
| | - Wei-Hsiang Tsai
- Institute of Biomedical Informatics, National Yang-Ming University, No. 155, Sec. 2, Linong Street, Beitou District, Taipei 112, Taiwan
| | - Ying Ding
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, 130 De Soto Street, Pittsburgh, PA 15261, USA Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Biomedical Science Tower 3, 3501 Fifth Avenue, Pittsburgh, PA 15213, USA
| | - Rui Chen
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, 130 De Soto Street, Pittsburgh, PA 15261, USA
| | - Zhou Fang
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, 130 De Soto Street, Pittsburgh, PA 15261, USA
| | - Zhiguang Huo
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, 130 De Soto Street, Pittsburgh, PA 15261, USA
| | - SungHwan Kim
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, 130 De Soto Street, Pittsburgh, PA 15261, USA
| | - Tianzhou Ma
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, 130 De Soto Street, Pittsburgh, PA 15261, USA
| | - Ting-Yu Chang
- Institute of Microbiology and Immunology, National Yang-Ming University, No. 155, Sec. 2, Linong Street, Beitou District, Taipei 112, Taiwan
| | - Nolan Michael Priedigkeit
- Molecular Pharmacology, School of Medicine, University of Pittsburgh, 3550 Terrace Street, Pittsburgh, PA 15261, USA
| | - Adrian V Lee
- Magee-Women's Research Institute, 204 Craft Avenue, Pittsburgh, PA 15213, USA
| | - Jianhua Luo
- Department of Pathology, School of Medicine, University of Pittsburgh, 3550 Terrace Street, Pittsburgh, PA 15261, USA
| | - Hsei-Wei Wang
- Institute of Biomedical Informatics, National Yang-Ming University, No. 155, Sec. 2, Linong Street, Beitou District, Taipei 112, Taiwan Institute of Microbiology and Immunology, National Yang-Ming University, No. 155, Sec. 2, Linong Street, Beitou District, Taipei 112, Taiwan Center for Systems and Synthetic Biology, National Yang-Ming University, No. 155, Sec. 2, Linong Street, Beitou District, Taipei 112, Taiwan
| | - I-Fang Chung
- Institute of Biomedical Informatics, National Yang-Ming University, No. 155, Sec. 2, Linong Street, Beitou District, Taipei 112, Taiwan Center for Systems and Synthetic Biology, National Yang-Ming University, No. 155, Sec. 2, Linong Street, Beitou District, Taipei 112, Taiwan
| | - George C Tseng
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, 130 De Soto Street, Pittsburgh, PA 15261, USA Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Biomedical Science Tower 3, 3501 Fifth Avenue, Pittsburgh, PA 15213, USA
| |
Collapse
|
24
|
Zhu X, Leung HCM, Wang R, Chin FYL, Yiu SM, Quan G, Li Y, Zhang R, Jiang Q, Liu B, Dong Y, Zhou G, Wang Y. misFinder: identify mis-assemblies in an unbiased manner using reference and paired-end reads. BMC Bioinformatics 2015; 16:386. [PMID: 26573684 PMCID: PMC4647709 DOI: 10.1186/s12859-015-0818-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2015] [Accepted: 11/06/2015] [Indexed: 11/10/2022] Open
Abstract
Background Because of the short read length of high throughput sequencing data, assembly errors are introduced in genome assembly, which may have adverse impact to the downstream data analysis. Several tools have been developed to eliminate these errors by either 1) comparing the assembled sequences with some similar reference genome, or 2) analyzing paired-end reads aligned to the assembled sequences and determining inconsistent features alone mis-assembled sequences. However, the former approach cannot distinguish real structural variations between the target genome and the reference genome while the latter approach could have many false positive detections (correctly assembled sequence being considered as mis-assembled sequence). Results We present misFinder, a tool that aims to identify the assembly errors with high accuracy in an unbiased way and correct these errors at their mis-assembled positions to improve the assembly accuracy for downstream analysis. It combines the information of reference (or close related reference) genome and aligned paired-end reads to the assembled sequence. Assembly errors and correct assemblies corresponding to structural variations can be detected by comparing the genome reference and assembled sequence. Different types of assembly errors can then be distinguished from the mis-assembled sequence by analyzing the aligned paired-end reads using multiple features derived from coverage and consistence of insert distance to obtain high confident error calls. Conclusions We tested the performance of misFinder on both simulated and real paired-end reads data, and misFinder gave accurate error calls with only very few miscalls. And, we further compared misFinder with QUAST and REAPR. misFinder outperformed QUAST and REAPR by 1) identified more true positive mis-assemblies with very few false positives and false negatives, and 2) distinguished the correct assemblies corresponding to structural variations from mis-assembled sequence. misFinder can be freely downloaded from https://github.com/hitbio/misFinder. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0818-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xiao Zhu
- College of Computer Sciences and Information Engineering, Harbin Normal University, Harbin, Heilongjiang, China. .,Center for Bioinformatics, School of Computer Sciences and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China.
| | - Henry C M Leung
- Department of Computer Science, University of Hong Kong, Pokfulam Road, Hong Kong, China.
| | - Rongjie Wang
- Center for Bioinformatics, School of Computer Sciences and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China.
| | - Francis Y L Chin
- Department of Computer Science, University of Hong Kong, Pokfulam Road, Hong Kong, China.
| | - Siu Ming Yiu
- Department of Computer Science, University of Hong Kong, Pokfulam Road, Hong Kong, China.
| | - Guangri Quan
- Center for Bioinformatics, School of Computer Sciences and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China.
| | - Yajie Li
- The Fourth Affiliated Hospital of Harbin Medical University, Harbin, Heilongjiang, China.
| | - Rui Zhang
- The Fourth Affiliated Hospital of Harbin Medical University, Harbin, Heilongjiang, China.
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China.
| | - Bo Liu
- Center for Bioinformatics, School of Computer Sciences and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China.
| | - Yucui Dong
- Department of Immunology, Harbin Medical University, Harbin, Heilongjiang, China.
| | - Guohui Zhou
- College of Computer Sciences and Information Engineering, Harbin Normal University, Harbin, Heilongjiang, China.
| | - Yadong Wang
- Center for Bioinformatics, School of Computer Sciences and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China.
| |
Collapse
|
25
|
Zhang J, White NM, Schmidt HK, Fulton RS, Tomlinson C, Warren WC, Wilson RK, Maher CA. INTEGRATE: gene fusion discovery using whole genome and transcriptome data. Genome Res 2015; 26:108-18. [PMID: 26556708 PMCID: PMC4691743 DOI: 10.1101/gr.186114.114] [Citation(s) in RCA: 96] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2014] [Accepted: 11/09/2015] [Indexed: 12/13/2022]
Abstract
While next-generation sequencing (NGS) has become the primary technology for discovering gene fusions, we are still faced with the challenge of ensuring that causative mutations are not missed while minimizing false positives. Currently, there are many computational tools that predict structural variations (SV) and gene fusions using whole genome (WGS) and transcriptome sequencing (RNA-seq) data separately. However, as both WGS and RNA-seq have their limitations when used independently, we hypothesize that the orthogonal validation from integrating both data could generate a sensitive and specific approach for detecting high-confidence gene fusion predictions. Fortunately, decreasing NGS costs have resulted in a growing quantity of patients with both data available. Therefore, we developed a gene fusion discovery tool, INTEGRATE, that leverages both RNA-seq and WGS data to reconstruct gene fusion junctions and genomic breakpoints by split-read mapping. To evaluate INTEGRATE, we compared it with eight additional gene fusion discovery tools using the well-characterized breast cell line HCC1395 and peripheral blood lymphocytes derived from the same patient (HCC1395BL). The predictions subsequently underwent a targeted validation leading to the discovery of 131 novel fusions in addition to the seven previously reported fusions. Overall, INTEGRATE only missed six out of the 138 validated fusions and had the highest accuracy of the nine tools evaluated. Additionally, we applied INTEGRATE to 62 breast cancer patients from The Cancer Genome Atlas (TCGA) and found multiple recurrent gene fusions including a subset involving estrogen receptor. Taken together, INTEGRATE is a highly sensitive and accurate tool that is freely available for academic use.
Collapse
Affiliation(s)
- Jin Zhang
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, Missouri 63110, USA; Department of Internal Medicine, Division of Oncology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Nicole M White
- Department of Internal Medicine, Division of Oncology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Heather K Schmidt
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Robert S Fulton
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, Missouri 63110, USA; Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Chad Tomlinson
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Wesley C Warren
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Richard K Wilson
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, Missouri 63110, USA; Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Christopher A Maher
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, Missouri 63110, USA; Department of Internal Medicine, Division of Oncology, Washington University School of Medicine, St. Louis, Missouri 63110, USA; Alvin J. Siteman Cancer Center, Washington University School of Medicine, St. Louis, Missouri 63110, USA; Department of Biomedical Engineering, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| |
Collapse
|
26
|
Luthra R, Chen H, Roy-Chowdhuri S, Singh RR. Next-Generation Sequencing in Clinical Molecular Diagnostics of Cancer: Advantages and Challenges. Cancers (Basel) 2015; 7:2023-36. [PMID: 26473927 PMCID: PMC4695874 DOI: 10.3390/cancers7040874] [Citation(s) in RCA: 103] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2015] [Revised: 09/21/2015] [Accepted: 10/01/2015] [Indexed: 11/16/2022] Open
Abstract
The application of next-generation sequencing (NGS) to characterize cancer genomes has resulted in the discovery of numerous genetic markers. Consequently, the number of markers that warrant routine screening in molecular diagnostic laboratories, often from limited tumor material, has increased. This increased demand has been difficult to manage by traditional low- and/or medium-throughput sequencing platforms. Massively parallel sequencing capabilities of NGS provide a much-needed alternative for mutation screening in multiple genes with a single low investment of DNA. However, implementation of NGS technologies, most of which are for research use only (RUO), in a diagnostic laboratory, needs extensive validation in order to establish Clinical Laboratory Improvement Amendments (CLIA) and College of American Pathologists (CAP)-compliant performance characteristics. Here, we have reviewed approaches for validation of NGS technology for routine screening of tumors. We discuss the criteria for selecting gene markers to include in the NGS panel and the deciding factors for selecting target capture approaches and sequencing platforms. We also discuss challenges in result reporting, storage and retrieval of the voluminous sequencing data and the future potential of clinical NGS.
Collapse
Affiliation(s)
- Rajyalakshmi Luthra
- Department of Hematopathology, The University of Texas MD Anderson Cancer Center, 8515 Fannin Street, Houston, TX 77054, USA.
| | - Hui Chen
- Department of Pathology, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Blvd, Houston, TX-77030, USA.
| | - Sinchita Roy-Chowdhuri
- Department of Pathology, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Blvd, Houston, TX-77030, USA.
| | - R Rajesh Singh
- Department of Hematopathology, The University of Texas MD Anderson Cancer Center, 8515 Fannin Street, Houston, TX 77054, USA.
| |
Collapse
|
27
|
Spaced Seed Data Structures for De Novo Assembly. Int J Genomics 2015; 2015:196591. [PMID: 26539459 PMCID: PMC4619942 DOI: 10.1155/2015/196591] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2015] [Accepted: 03/30/2015] [Indexed: 01/18/2023] Open
Abstract
De novo assembly of the genome of a species is essential in the absence of a reference genome sequence. Many scalable assembly algorithms use the de Bruijn graph (DBG) paradigm to reconstruct genomes, where a table of subsequences of a certain length is derived from the reads, and their overlaps are analyzed to assemble sequences. Despite longer subsequences unlocking longer genomic features for assembly, associated increase in compute resources limits the practicability of DBG over other assembly archetypes already designed for longer reads. Here, we revisit the DBG paradigm to adapt it to the changing sequencing technology landscape and introduce three data structure designs for spaced seeds in the form of paired subsequences. These data structures address memory and run time constraints imposed by longer reads. We observe that when a fixed distance separates seed pairs, it provides increased sequence specificity with increased gap length. Further, we note that Bloom filters would be suitable to implicitly store spaced seeds and be tolerant to sequencing errors. Building on this concept, we describe a data structure for tracking the frequencies of observed spaced seeds. These data structure designs will have applications in genome, transcriptome and metagenome assemblies, and read error correction.
Collapse
|
28
|
Thangam M, Gopal RK. CRCDA--Comprehensive resources for cancer NGS data analysis. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2015; 2015:bav092. [PMID: 26450948 PMCID: PMC4597977 DOI: 10.1093/database/bav092] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/27/2015] [Accepted: 08/31/2015] [Indexed: 12/24/2022]
Abstract
Next generation sequencing (NGS) innovations put a compelling landmark in life science and changed the direction of research in clinical oncology with its productivity to diagnose and treat cancer. The aim of our portal comprehensive resources for cancer NGS data analysis (CRCDA) is to provide a collection of different NGS tools and pipelines under diverse classes with cancer pathways and databases and furthermore, literature information from PubMed. The literature data was constrained to 18 most common cancer types such as breast cancer, colon cancer and other cancers that exhibit in worldwide population. NGS-cancer tools for the convenience have been categorized into cancer genomics, cancer transcriptomics, cancer epigenomics, quality control and visualization. Pipelines for variant detection, quality control and data analysis were listed to provide out-of-the box solution for NGS data analysis, which may help researchers to overcome challenges in selecting and configuring individual tools for analysing exome, whole genome and transcriptome data. An extensive search page was developed that can be queried by using (i) type of data [literature, gene data and sequence read archive (SRA) data] and (ii) type of cancer (selected based on global incidence and accessibility of data). For each category of analysis, variety of tools are available and the biggest challenge is in searching and using the right tool for the right application. The objective of the work is collecting tools in each category available at various places and arranging the tools and other data in a simple and user-friendly manner for biologists and oncologists to find information easier. To the best of our knowledge, we have collected and presented a comprehensive package of most of the resources available in cancer for NGS data analysis. Given these factors, we believe that this website will be an useful resource to the NGS research community working on cancer. Database URL: http://bioinfo.au-kbc.org.in/ngs/ngshome.html.
Collapse
Affiliation(s)
- Manonanthini Thangam
- AU-KBC Research Centre, MIT Campus of Anna University, Chromepet, Chennai, India
| | - Ramesh Kumar Gopal
- AU-KBC Research Centre, MIT Campus of Anna University, Chromepet, Chennai, India
| |
Collapse
|
29
|
Detection of a Distinctive Genomic Signature in Rhabdoid Glioblastoma, A Rare Disease Entity Identified by Whole Exome Sequencing and Whole Transcriptome Sequencing. Transl Oncol 2015; 8:279-87. [PMID: 26310374 PMCID: PMC4562980 DOI: 10.1016/j.tranon.2015.05.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2014] [Revised: 05/10/2015] [Accepted: 05/20/2015] [Indexed: 12/23/2022] Open
Abstract
We analyzed the genome of a rhabdoid glioblastoma (R-GBM) tumor, a very rare variant of GBM. A surgical specimen of R-GBM from a 20-year-old woman was analyzed using whole exome sequencing (WES), whole transcriptome sequencing (WTS), single nucleotide polymorphism array, and array comparative genomic hybridization. The status of gene expression in R-GBM tissue was compared with that of normal brain tissue and conventional GBM tumor tissue. We identified 23 somatic non-synonymous small nucleotide variants with WES. We identified the BRAF V600E mutation and possible functional changes in the mutated genes, ISL1 and NDRG2. Copy number alteration analysis revealed gains of chromosomes 3, 7, and 9. We found loss of heterozygosity and focal homozygous deletion on 9q21, which includes CDKN2A and CDKN2B. In addition, WTS revealed that CDK6, MET, EZH2, EGFR, and NOTCH1, which are located on chromosomes 7 and 9, were over-expressed, whereas CDKN2A/2B were minimally expressed. Fusion gene analysis showed 14 candidate genes that may be functionally involved in R-GBM, including TWIST2, and UPK3BL. The BRAF V600E mutation, CDKN2A/2B deletion, and EGFR/MET copy number gain were observed. These simultaneous alterations are very rarely found in GBM. Moreover, the NDRG2 mutation was first identified in this study as it has never been reported in GBM. We observed a unique genomic signature in R-GBM compared to conventional GBM, which may provide insight regarding R-GBM as a distinct disease entity among the larger group of GBMs.
Collapse
|
30
|
Weirather JL, Afshar PT, Clark TA, Tseng E, Powers LS, Underwood JG, Zabner J, Korlach J, Wong WH, Au KF. Characterization of fusion genes and the significantly expressed fusion isoforms in breast cancer by hybrid sequencing. Nucleic Acids Res 2015; 43:e116. [PMID: 26040699 PMCID: PMC4605286 DOI: 10.1093/nar/gkv562] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2015] [Accepted: 05/15/2015] [Indexed: 12/19/2022] Open
Abstract
We developed an innovative hybrid sequencing approach, IDP-fusion, to detect fusion genes, determine fusion sites and identify and quantify fusion isoforms. IDP-fusion is the first method to study gene fusion events by integrating Third Generation Sequencing long reads and Second Generation Sequencing short reads. We applied IDP-fusion to PacBio data and Illumina data from the MCF-7 breast cancer cells. Compared with the existing tools, IDP-fusion detects fusion genes at higher precision and a very low false positive rate. The results show that IDP-fusion will be useful for unraveling the complexity of multiple fusion splices and fusion isoforms within tumorigenesis-relevant fusion genes.
Collapse
Affiliation(s)
- Jason L Weirather
- Department of Internal Medicine, University of Iowa, 200 Hawkins Dr, Iowa City, IA 52242, USA
| | - Pegah Tootoonchi Afshar
- Department of Electrical Engineering, School of Engineering, Stanford University, Stanford, CA 94305, USA
| | - Tyson A Clark
- Pacific Biosciences, 1380 Willow Road, Menlo Park, CA 94025, USA
| | - Elizabeth Tseng
- Pacific Biosciences, 1380 Willow Road, Menlo Park, CA 94025, USA
| | - Linda S Powers
- Department of Internal Medicine, University of Iowa, 200 Hawkins Dr, Iowa City, IA 52242, USA
| | - Jason G Underwood
- Department of Genome Sciences, University of Washington, 3720 15th Ave NE, Seattle WA 98195-5065, USA
| | - Joseph Zabner
- Department of Internal Medicine, University of Iowa, 200 Hawkins Dr, Iowa City, IA 52242, USA
| | - Jonas Korlach
- Pacific Biosciences, 1380 Willow Road, Menlo Park, CA 94025, USA
| | - Wing Hung Wong
- Department of Statistics and Department of Health Research & Policy, 390 Serra Mall, Stanford University, Stanford, CA 94305, USA
| | - Kin Fai Au
- Department of Internal Medicine, University of Iowa, 200 Hawkins Dr, Iowa City, IA 52242, USA
| |
Collapse
|
31
|
Abyzov A, Li S, Kim DR, Mohiyuddin M, Stütz AM, Parrish NF, Mu XJ, Clark W, Chen K, Hurles M, Korbel JO, Lam HYK, Lee C, Gerstein MB. Analysis of deletion breakpoints from 1,092 humans reveals details of mutation mechanisms. Nat Commun 2015; 6:7256. [PMID: 26028266 PMCID: PMC4451611 DOI: 10.1038/ncomms8256] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2014] [Accepted: 04/21/2015] [Indexed: 02/07/2023] Open
Abstract
Investigating genomic structural variants at basepair resolution is crucial for understanding their formation mechanisms. We identify and analyze 8,943 deletion breakpoints in 1,092 samples from the 1000 Genomes Project. We find breakpoints have more nearby SNPs and indels than the genomic average, likely a consequence of relaxed selection. By investigating the correlation of breakpoints with DNA methylation, Hi-C interactions, and histone marks and the substitution patterns of nucleotides near them, we find that breakpoints with the signature of non-allelic homologous recombination (NAHR) are associated with open chromatin. We hypothesize that some NAHR deletions occur without DNA replication and cell division, in embryonic and germline cells. In contrast, breakpoints associated with non-homologous (NH) mechanisms often have sequence micro-insertions, templated from later replicating genomic sites, spaced at two characteristic distances from the breakpoint. These micro-insertions are consistent with template-switching events and suggest a particular spatiotemporal configuration for DNA during the events.
Collapse
Affiliation(s)
- Alexej Abyzov
- Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic, 200 1st Street SW, Rochester, Minnesota 55905, USA
| | - Shantao Li
- 1] Program in Computational Biology and Bioinformatics, Yale University, 266 Whitney Avenue, New Haven, Connecticut 06520, USA [2] Department of Computer Science, Yale University, New Haven, Connecticut 06520, USA
| | - Daniel Rhee Kim
- Department of Computer Science, Yale University, New Haven, Connecticut 06520, USA
| | | | - Adrian M Stütz
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg 69117, Germany
| | | | - Xinmeng Jasmine Mu
- 1] Program in Computational Biology and Bioinformatics, Yale University, 266 Whitney Avenue, New Haven, Connecticut 06520, USA [2] Department of Molecular Biophysics and Biochemistry, School of Medicine, Yale University, New Haven, Connecticut 06520, USA
| | - Wyatt Clark
- 1] Program in Computational Biology and Bioinformatics, Yale University, 266 Whitney Avenue, New Haven, Connecticut 06520, USA [2] Department of Molecular Biophysics and Biochemistry, School of Medicine, Yale University, New Haven, Connecticut 06520, USA
| | - Ken Chen
- The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA
| | - Matthew Hurles
- Department of Human Genetics, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Jan O Korbel
- 1] European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg 69117, Germany [2] European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Hugo Y K Lam
- Bina Technologies, Roche Sequencing, Redwood City, California 94065, USA
| | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06030, USA
| | - Mark B Gerstein
- 1] Program in Computational Biology and Bioinformatics, Yale University, 266 Whitney Avenue, New Haven, Connecticut 06520, USA [2] Department of Molecular Biophysics and Biochemistry, School of Medicine, Yale University, New Haven, Connecticut 06520, USA [3] Department of Computer Science, Yale University, New Haven, Connecticut 06520, USA
| |
Collapse
|
32
|
Engle EK, Fisher DAC, Miller CA, McLellan MD, Fulton RS, Moore DM, Wilson RK, Ley TJ, Oh ST. Clonal evolution revealed by whole genome sequencing in a case of primary myelofibrosis transformed to secondary acute myeloid leukemia. Leukemia 2015; 29:869-76. [PMID: 25252869 PMCID: PMC4374044 DOI: 10.1038/leu.2014.289] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2014] [Revised: 09/15/2014] [Accepted: 09/18/2014] [Indexed: 12/16/2022]
Abstract
Clonal architecture in myeloproliferative neoplasms (MPNs) is poorly understood. Here we report genomic analyses of a patient with primary myelofibrosis (PMF) transformed to secondary acute myeloid leukemia (sAML). Whole genome sequencing (WGS) was performed on PMF and sAML diagnosis samples, with skin included as a germline surrogate. Deep sequencing validation was performed on the WGS samples and an additional sample obtained during sAML remission/relapsed PMF. Clustering analysis of 649 validated somatic single-nucleotide variants revealed four distinct clonal groups, each including putative driver mutations. The first group (including JAK2 and U2AF1), representing the founding clone, included mutations with high frequency at all three disease stages. The second clonal group (including MYB) was present only in PMF, suggesting the presence of a clone that was dispensable for transformation. The third group (including ASXL1) contained mutations with low frequency in PMF and high frequency in subsequent samples, indicating evolution of the dominant clone with disease progression. The fourth clonal group (including IDH1 and RUNX1) was acquired at sAML transformation and was predominantly absent at sAML remission/relapsed PMF. Taken together, these findings illustrate the complex clonal dynamics associated with disease evolution in MPNs and sAML.
Collapse
Affiliation(s)
- E K Engle
- Division of Hematology, Washington University School of Medicine, St Louis, MO, USA
| | - D A C Fisher
- Division of Hematology, Washington University School of Medicine, St Louis, MO, USA
| | - C A Miller
- The Genome Institute, Washington University School of Medicine, St Louis, MO, USA
| | - M D McLellan
- The Genome Institute, Washington University School of Medicine, St Louis, MO, USA
| | - R S Fulton
- The Genome Institute, Washington University School of Medicine, St Louis, MO, USA
| | - D M Moore
- Division of Hematology, Washington University School of Medicine, St Louis, MO, USA
| | - R K Wilson
- The Genome Institute, Washington University School of Medicine, St Louis, MO, USA
| | - T J Ley
- The Genome Institute, Division of Oncology, Washington University School of Medicine, St Louis, MO, USA
| | - S T Oh
- Division of Hematology, Washington University School of Medicine, St Louis, MO, USA
| |
Collapse
|
33
|
Zhou W, Zhao H, Chong Z, Mark RJ, Eterovic AK, Meric-Bernstam F, Chen K. ClinSeK: a targeted variant characterization framework for clinical sequencing. Genome Med 2015; 7:34. [PMID: 25918555 PMCID: PMC4410453 DOI: 10.1186/s13073-015-0155-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2014] [Accepted: 03/10/2015] [Indexed: 02/12/2023] Open
Abstract
Applying genomics to patient care demands sensitive, unambiguous and rapid characterization of a known set of clinically relevant variants in patients' samples, an objective substantially different from the standard discovery process, in which every base in every sequenced read must be examined. Further, the approach must be sufficiently robust as to be able to detect multiple and potentially rare variants from heterogeneous samples. To meet this critical objective, we developed a novel variant characterization framework, ClinSeK, which performs targeted analysis of relevant reads from high-throughput sequencing data. ClinSeK is designed for efficient targeted short read alignment and is capable of characterizing a wide spectrum of genetic variants from single nucleotide variation to large-scale genomic rearrangement breakpoints. Applying ClinSeK to over a thousand cancer patients demonstrated substantively better performance, in terms of accuracy, runtime and disk storage, for clinical applications than existing variant discovery tools. ClinSeK is freely available for academic use at http://bioinformatics.mdanderson.org/main/clinsek.
Collapse
Affiliation(s)
- Wanding Zhou
- />Department of Bioinformatics and Computational Biology, the University of Texas MD Anderson Cancer Center, Houston, TX 77030 USA
| | - Hao Zhao
- />Department of Bioinformatics and Computational Biology, the University of Texas MD Anderson Cancer Center, Houston, TX 77030 USA
| | - Zechen Chong
- />Department of Bioinformatics and Computational Biology, the University of Texas MD Anderson Cancer Center, Houston, TX 77030 USA
| | - Routbort J Mark
- />Department of Hematopathology, the University of Texas MD Anderson Cancer Center, Houston, TX 77030 USA
| | - Agda K Eterovic
- />Department of Systems Biology, the University of Texas MD Anderson Cancer Center, Houston, TX 77030 USA
- />Institute of Personalized Cancer Therapy, the University of Texas MD Anderson Cancer Center, Houston, TX 77030 USA
| | - Funda Meric-Bernstam
- />Institute of Personalized Cancer Therapy, the University of Texas MD Anderson Cancer Center, Houston, TX 77030 USA
- />Department of Investigational Cancer Therapy, the University of Texas MD Anderson Cancer Center, Houston, TX 77030 USA
| | - Ken Chen
- />Department of Bioinformatics and Computational Biology, the University of Texas MD Anderson Cancer Center, Houston, TX 77030 USA
| |
Collapse
|
34
|
Fernandez-Cuesta L, Sun R, Menon R, George J, Lorenz S, Meza-Zepeda LA, Peifer M, Plenker D, Heuckmann JM, Leenders F, Zander T, Dahmen I, Koker M, Schöttle J, Ullrich RT, Altmüller J, Becker C, Nürnberg P, Seidel H, Böhm D, Göke F, Ansén S, Russell PA, Wright GM, Wainer Z, Solomon B, Petersen I, Clement JH, Sänger J, Brustugun OT, Helland Å, Solberg S, Lund-Iversen M, Buettner R, Wolf J, Brambilla E, Vingron M, Perner S, Haas SA, Thomas RK. Identification of novel fusion genes in lung cancer using breakpoint assembly of transcriptome sequencing data. Genome Biol 2015; 16:7. [PMID: 25650807 PMCID: PMC4300615 DOI: 10.1186/s13059-014-0558-0] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2014] [Accepted: 12/03/2014] [Indexed: 02/08/2023] Open
Abstract
Genomic translocation events frequently underlie cancer development through generation of gene fusions with oncogenic properties. Identification of such fusion transcripts by transcriptome sequencing might help to discover new potential therapeutic targets. We developed TRUP (Tumor-specimen suited RNA-seq Unified Pipeline) (https://github.com/ruping/TRUP), a computational approach that combines split-read and read-pair analysis with de novo assembly for the identification of chimeric transcripts in cancer specimens. We apply TRUP to RNA-seq data of different tumor types, and find it to be more sensitive than alternative tools in detecting chimeric transcripts, such as secondary rearrangements in EML4-ALK-positive lung tumors, or recurrent inactivating rearrangements affecting RASSF8.
Collapse
|
35
|
Abstract
High-throughput DNA sequencing has revolutionized the study of cancer genomics with numerous discoveries that are relevant to cancer diagnosis and treatment. The latest sequencing and analysis methods have successfully identified somatic alterations, including single-nucleotide variants, insertions and deletions, copy-number aberrations, structural variants and gene fusions. Additional computational techniques have proved useful for defining the mutations, genes and molecular networks that drive diverse cancer phenotypes and that determine clonal architectures in tumour samples. Collectively, these tools have advanced the study of genomic, transcriptomic and epigenomic alterations in cancer, and their association to clinical properties. Here, we review cancer genomics software and the insights that have been gained from their application.
Collapse
|
36
|
Teer JK. An improved understanding of cancer genomics through massively parallel sequencing. Transl Cancer Res 2014; 3:243-259. [PMID: 26146607 PMCID: PMC4486294 DOI: 10.3978/j.issn.2218-676x.2014.05.05] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
DNA sequencing technology advances have enabled genetic investigation of more samples in a shorter time than has previously been possible. Furthermore, the ability to analyze and understand large sequencing datasets has improved due to concurrent advances in sequence data analysis methods and software tools. Constant improvements to both technology and analytic approaches in this fast moving field are evidenced by many recent publications of computational methods, as well as biological results linking genetic events to human disease. Cancer in particular has been the subject of intense investigation, owing to the genetic underpinnings of this complex collection of diseases. New massively-parallel sequencing (MPS) technologies have enabled the investigation of thousands of samples, divided across tens of different tumor types, resulting in new driver gene identification, mutagenic pattern characterization, and other newly uncovered features of tumor biology. This review will focus both on methods and recent results: current analytical approaches to DNA and RNA sequencing will be presented followed by a review of recent pan-cancer sequencing studies. This overview of methods and results will not only highlight the recent advances in cancer genomics, but also the methods and tools used to accomplish these advancements in a constantly and rapidly improving field.
Collapse
Affiliation(s)
- Jamie K Teer
- , H. Lee Moffitt Cancer Center and Research Institute, 12902 Magnolia Dr., Tampa, FL 33612, Tel: 813-745-2650
| |
Collapse
|
37
|
Kühn MWM, Bullinger L, Gröschel S, Krönke J, Edelmann J, Rücker FG, Eiwen K, Paschka P, Gaidzik VI, Holzmann K, Schlenk RF, Döhner H, Döhner K. Genome-wide genotyping of acute myeloid leukemia with translocation t(9;11)(p22;q23) reveals novel recurrent genomic alterations. Haematologica 2014; 99:e133-5. [PMID: 24859875 DOI: 10.3324/haematol.2014.105544] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Affiliation(s)
| | - Lars Bullinger
- Department of Internal Medicine III, University of Ulm, Germany
| | - Stefan Gröschel
- Department of Internal Medicine III, University of Ulm, Germany
| | - Jan Krönke
- Department of Internal Medicine III, University of Ulm, Germany
| | | | - Frank G Rücker
- Department of Internal Medicine III, University of Ulm, Germany
| | - Karina Eiwen
- Department of Internal Medicine III, University of Ulm, Germany
| | - Peter Paschka
- Department of Internal Medicine III, University of Ulm, Germany
| | | | | | | | - Hartmut Döhner
- Department of Internal Medicine III, University of Ulm, Germany
| | | |
Collapse
|
38
|
Gao J, Ciriello G, Sander C, Schultz N. Collection, integration and analysis of cancer genomic profiles: from data to insight. Curr Opin Genet Dev 2014; 24:92-8. [PMID: 24584084 PMCID: PMC4084973 DOI: 10.1016/j.gde.2013.12.003] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2013] [Revised: 12/03/2013] [Accepted: 12/04/2013] [Indexed: 12/20/2022]
Abstract
The recent deluge of cancer genomics data provides a tremendous opportunity for the discovery of detailed mechanisms of tumorigenesis and the development of therapeutics. However, identifying the functionally relevant genomic alterations ('drivers') among the many non-oncogenic events ('passengers') presents a major challenge. Several new methods have been developed over the past few years that identify recurrently altered genes. Mapping the recurrent genomic alterations, such as somatic mutations and focal DNA copy-number alterations, onto individual tumor samples as tumor-specific event calls facilitates the identification of altered processes and pathways. The resulting reduction in complexity makes cancer genomics data more easily interpretable by cancer researchers and is now driving the development of powerful yet intuitive web-based analysis tools.
Collapse
Affiliation(s)
- Jianjiong Gao
- Computational Biology Center, Memorial Sloan-Kettering Cancer Center, Box 460, New York, NY 10065, USA
| | - Giovanni Ciriello
- Computational Biology Center, Memorial Sloan-Kettering Cancer Center, Box 460, New York, NY 10065, USA
| | - Chris Sander
- Computational Biology Center, Memorial Sloan-Kettering Cancer Center, Box 460, New York, NY 10065, USA
| | - Nikolaus Schultz
- Computational Biology Center, Memorial Sloan-Kettering Cancer Center, Box 460, New York, NY 10065, USA.
| |
Collapse
|
39
|
Chen K, Chen L, Fan X, Wallis J, Ding L, Weinstock G. TIGRA: a targeted iterative graph routing assembler for breakpoint assembly. Genome Res 2014; 24:310-7. [PMID: 24307552 PMCID: PMC3912421 DOI: 10.1101/gr.162883.113] [Citation(s) in RCA: 63] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2013] [Accepted: 12/03/2013] [Indexed: 12/20/2022]
Abstract
Recent progress in next-generation sequencing has greatly facilitated our study of genomic structural variation. Unlike single nucleotide variants and small indels, many structural variants have not been completely characterized at nucleotide resolution. Deriving the complete sequences underlying such breakpoints is crucial for not only accurate discovery, but also for the functional characterization of altered alleles. However, our current ability to determine such breakpoint sequences is limited because of challenges in aligning and assembling short reads. To address this issue, we developed a targeted iterative graph routing assembler, TIGRA, which implements a set of novel data analysis routines to achieve effective breakpoint assembly from next-generation sequencing data. In our assessment using data from the 1000 Genomes Project, TIGRA was able to accurately assemble the majority of deletion and mobile element insertion breakpoints, with a substantively better success rate and accuracy than other algorithms. TIGRA has been applied in the 1000 Genomes Project and other projects and is freely available for academic use.
Collapse
Affiliation(s)
- Ken Chen
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA
- Department of Computer Science, Rice University, Houston, Texas 77005, USA
| | - Lei Chen
- The Genome Institute, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Xian Fan
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA
- Department of Computer Science, Rice University, Houston, Texas 77005, USA
| | - John Wallis
- The Genome Institute, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Li Ding
- The Genome Institute, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- Department of Medicine, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - George Weinstock
- The Genome Institute, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| |
Collapse
|
40
|
Chen K, Navin NE, Wang Y, Schmidt HK, Wallis JW, Niu B, Fan X, Zhao H, McLellan MD, Hoadley KA, Mardis ER, Ley TJ, Perou CM, Wilson RK, Ding L. BreakTrans: uncovering the genomic architecture of gene fusions. Genome Biol 2013; 14:R87. [PMID: 23972288 PMCID: PMC4054677 DOI: 10.1186/gb-2013-14-8-r87] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2013] [Accepted: 08/23/2013] [Indexed: 01/18/2023] Open
Abstract
Producing gene fusions through genomic structural rearrangements is a major mechanism for tumor evolution. Therefore, accurately detecting gene fusions and the originating rearrangements is of great importance for personalized cancer diagnosis and targeted therapy. We present a tool, BreakTrans, that systematically maps predicted gene fusions to structural rearrangements. Thus, BreakTrans not only validates both types of predictions, but also provides mechanistic interpretations. BreakTrans effectively validates known fusions and discovers novel events in a breast cancer cell line. Applying BreakTrans to 43 breast cancer samples in The Cancer Genome Atlas identifies 90 genomically validated gene fusions. BreakTrans is available at http://bioinformatics.mdanderson.org/main/BreakTrans.
Collapse
|
41
|
Shyr D, Liu Q. Next generation sequencing in cancer research and clinical application. Biol Proced Online 2013; 15:4. [PMID: 23406336 PMCID: PMC3599179 DOI: 10.1186/1480-9222-15-4] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2013] [Accepted: 02/09/2013] [Indexed: 01/29/2023] Open
Abstract
The wide application of next-generation sequencing (NGS), mainly through whole genome, exome and transcriptome sequencing, provides a high-resolution and global view of the cancer genome. Coupled with powerful bioinformatics tools, NGS promises to revolutionize cancer research, diagnosis and therapy. In this paper, we review the recent advances in NGS-based cancer genomic research as well as clinical application, summarize the current integrative oncogenomic projects, resources and computational algorithms, and discuss the challenge and future directions in the research and clinical application of cancer genomic sequencing.
Collapse
Affiliation(s)
- Derek Shyr
- Washington University, 63130, St. Louis, MO, USA
| | - Qi Liu
- Center for Quantitative Sciences, Vanderbilt University School of Medicine, 37232, Nashville, TN, USA
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, 37232, Nashville, TN, USA
| |
Collapse
|
42
|
Abstract
Advances in sequencing technologies and increased access to sequencing services have led to renewed interest in sequence and genome assembly. Concurrently, new applications for sequencing have emerged, including gene expression analysis, discovery of genomic variants and metagenomics, and each of these has different needs and challenges in terms of assembly. We survey the theoretical foundations that underlie modern assembly and highlight the options and practical trade-offs that need to be considered, focusing on how individual features address the needs of specific applications. We also review key software and the interplay between experimental design and efficacy of assembly.
Collapse
Affiliation(s)
- Niranjan Nagarajan
- Computational and Systems Biology, Genome Institute of Singapore, 138672 Singapore
| | | |
Collapse
|
43
|
Gutmann DH, McLellan MD, Hussain I, Wallis JW, Fulton LL, Fulton RS, Magrini V, Demeter R, Wylie T, Kandoth C, Leonard JR, Guha A, Miller CA, Ding L, Mardis ER. Somatic neurofibromatosis type 1 (NF1) inactivation characterizes NF1-associated pilocytic astrocytoma. Genome Res 2012; 23:431-9. [PMID: 23222849 PMCID: PMC3589532 DOI: 10.1101/gr.142604.112] [Citation(s) in RCA: 96] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
Low-grade brain tumors (pilocytic astrocytomas) arising in the neurofibromatosis type 1 (NF1) inherited cancer predisposition syndrome are hypothesized to result from a combination of germline and acquired somatic NF1 tumor suppressor gene mutations. However, genetically engineered mice (GEM) in which mono-allelic germline Nf1 gene loss is coupled with bi-allelic somatic (glial progenitor cell) Nf1 gene inactivation develop brain tumors that do not fully recapitulate the neuropathological features of the human condition. These observations raise the intriguing possibility that, while loss of neurofibromin function is necessary for NF1-associated low-grade astrocytoma development, additional genetic changes may be required for full penetrance of the human brain tumor phenotype. To identify these potential cooperating genetic mutations, we performed whole-genome sequencing (WGS) analysis of three NF1-associated pilocytic astrocytoma (PA) tumors. We found that the mechanism of somatic NF1 loss was different in each tumor (frameshift mutation, loss of heterozygosity, and methylation). In addition, tumor purity analysis revealed that these tumors had a high proportion of stromal cells, such that only 50%–60% of cells in the tumor mass exhibited somatic NF1 loss. Importantly, we identified no additional recurrent pathogenic somatic mutations, supporting a model in which neuroglial progenitor cell NF1 loss is likely sufficient for PA formation in cooperation with a proper stromal environment.
Collapse
Affiliation(s)
- David H Gutmann
- Department of Neurology, Washington University School of Medicine, St. Louis, MO 63110, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
44
|
Govindan R, Ding L, Griffith M, Subramanian J, Dees ND, Kanchi KL, Maher CA, Fulton R, Fulton L, Wallis J, Chen K, Walker J, McDonald S, Bose R, Ornitz D, Xiong D, You M, Dooling DJ, Watson M, Mardis ER, Wilson RK. Genomic landscape of non-small cell lung cancer in smokers and never-smokers. Cell 2012; 150:1121-34. [PMID: 22980976 DOI: 10.1016/j.cell.2012.08.024] [Citation(s) in RCA: 921] [Impact Index Per Article: 70.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2012] [Revised: 07/17/2012] [Accepted: 08/23/2012] [Indexed: 12/18/2022]
Abstract
We report the results of whole-genome and transcriptome sequencing of tumor and adjacent normal tissue samples from 17 patients with non-small cell lung carcinoma (NSCLC). We identified 3,726 point mutations and more than 90 indels in the coding sequence, with an average mutation frequency more than 10-fold higher in smokers than in never-smokers. Novel alterations in genes involved in chromatin modification and DNA repair pathways were identified, along with DACH1, CFTR, RELN, ABCB5, and HGF. Deep digital sequencing revealed diverse clonality patterns in both never-smokers and smokers. All validated EFGR and KRAS mutations were present in the founder clones, suggesting possible roles in cancer initiation. Analysis revealed 14 fusions, including ROS1 and ALK, as well as novel metabolic enzymes. Cell-cycle and JAK-STAT pathways are significantly altered in lung cancer, along with perturbations in 54 genes that are potentially targetable with currently available drugs.
Collapse
Affiliation(s)
- Ramaswamy Govindan
- Department of Internal Medicine, Division of Oncology, Washington University School of Medicine, St. Louis, MO 63110, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
45
|
Wang Q, Xia J, Jia P, Pao W, Zhao Z. Application of next generation sequencing to human gene fusion detection: computational tools, features and perspectives. Brief Bioinform 2012; 14:506-19. [PMID: 22877769 DOI: 10.1093/bib/bbs044] [Citation(s) in RCA: 84] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Gene fusions are important genomic events in human cancer because their fusion gene products can drive the development of cancer and thus are potential prognostic tools or therapeutic targets in anti-cancer treatment. Major advancements have been made in computational approaches for fusion gene discovery over the past 3 years due to improvements and widespread applications of high-throughput next generation sequencing (NGS) technologies. To identify fusions from NGS data, existing methods typically leverage the strengths of both sequencing technologies and computational strategies. In this article, we review the NGS and computational features of existing methods for fusion gene detection and suggest directions for future development.
Collapse
|