1
|
Gao G, McClellan J, Barbeira AN, Fiorica PN, Li JL, Mu Z, Olopade OI, Huo D, Im HK. A multi-tissue, splicing-based joint transcriptome-wide association study identifies susceptibility genes for breast cancer. Am J Hum Genet 2024; 111:1100-1113. [PMID: 38733992 PMCID: PMC11179262 DOI: 10.1016/j.ajhg.2024.04.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 04/13/2024] [Accepted: 04/15/2024] [Indexed: 05/13/2024] Open
Abstract
Splicing-based transcriptome-wide association studies (splicing-TWASs) of breast cancer have the potential to identify susceptibility genes. However, existing splicing-TWASs test the association of individual excised introns in breast tissue only and thus have limited power to detect susceptibility genes. In this study, we performed a multi-tissue joint splicing-TWAS that integrated splicing-TWAS signals of multiple excised introns in each gene across 11 tissues that are potentially relevant to breast cancer risk. We utilized summary statistics from a meta-analysis that combined genome-wide association study (GWAS) results of 424,650 women of European ancestry. Splicing-level prediction models were trained in GTEx (v.8) data. We identified 240 genes by the multi-tissue joint splicing-TWAS at the Bonferroni-corrected significance level; in the tissue-specific splicing-TWAS that combined TWAS signals of excised introns in genes in breast tissue only, we identified nine additional significant genes. Of these 249 genes, 88 genes in 62 loci have not been reported by previous TWASs, and 17 genes in seven loci are at least 1 Mb away from published GWAS index variants. By comparing the results of our splicing-TWASs with previous gene-expression-based TWASs that used the same summary statistics and expression prediction models trained in the same reference panel, we found that 110 genes in 70 loci that are identified only by the splicing-TWASs. Our results showed that for many genes, expression quantitative trait loci (eQTL) did not show a significant impact on breast cancer risk, whereas splicing quantitative trait loci (sQTL) showed a strong impact through intron excision events.
Collapse
Affiliation(s)
- Guimin Gao
- Department of Public Health Sciences, University of Chicago, Chicago, IL 60637, USA
| | - Julian McClellan
- Department of Public Health Sciences, University of Chicago, Chicago, IL 60637, USA
| | - Alvaro N Barbeira
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Peter N Fiorica
- Department of Public Health Sciences, University of Chicago, Chicago, IL 60637, USA
| | - James L Li
- Department of Public Health Sciences, University of Chicago, Chicago, IL 60637, USA
| | - Zepeng Mu
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Olufunmilayo I Olopade
- Section of Hematology and Oncology, Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Dezheng Huo
- Department of Public Health Sciences, University of Chicago, Chicago, IL 60637, USA; Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA.
| | - Hae Kyung Im
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA.
| |
Collapse
|
2
|
Crabtree JS, Miele L. Precision diagnostics in cancer: Predict, prevent, and personalize. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2022; 190:39-56. [DOI: 10.1016/bs.pmbts.2022.03.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
3
|
Lucero R, Zappulli V, Sammarco A, Murillo OD, Cheah PS, Srinivasan S, Tai E, Ting DT, Wei Z, Roth ME, Laurent LC, Krichevsky AM, Breakefield XO, Milosavljevic A. Glioma-Derived miRNA-Containing Extracellular Vesicles Induce Angiogenesis by Reprogramming Brain Endothelial Cells. Cell Rep 2021; 30:2065-2074.e4. [PMID: 32075753 DOI: 10.1016/j.celrep.2020.01.073] [Citation(s) in RCA: 113] [Impact Index Per Article: 28.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 09/29/2019] [Accepted: 01/22/2020] [Indexed: 12/13/2022] Open
Abstract
Glioblastoma (GBM) is characterized by aberrant vascularization and a complex tumor microenvironment. The failure of anti-angiogenic therapies suggests pathways of GBM neovascularization, possibly attributable to glioblastoma stem cells (GSCs) and their interplay with the tumor microenvironment. It has been established that GSC-derived extracellular vesicles (GSC-EVs) and their cargoes are proangiogenic in vitro. To further elucidate EV-mediated mechanisms of neovascularization in vitro, we perform RNA-seq and DNA methylation profiling of human brain endothelial cells exposed to GSC-EVs. To correlate these results to tumors in vivo, we perform histoepigenetic analysis of GBM molecular profiles in the TCGA collection. Remarkably, GSC-EVs and normal vascular growth factors stimulate highly distinct gene regulatory responses that converge on angiogenesis. The response to GSC-EVs shows a footprint of post-transcriptional gene silencing by EV-derived miRNAs. Our results provide insights into targetable angiogenesis pathways in GBM and miRNA candidates for liquid biopsy biomarkers.
Collapse
Affiliation(s)
- Rocco Lucero
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Valentina Zappulli
- Department of Comparative Biomedicine and Food Science, University of Padua, Padua, Italy; Departments of Neurology and Radiology, Massachusetts General Hospital, Boston, MA 02114, USA; Neuroscience Program, Harvard Medical School, Boston, MA 02115, USA.
| | - Alessandro Sammarco
- Department of Comparative Biomedicine and Food Science, University of Padua, Padua, Italy; Departments of Neurology and Radiology, Massachusetts General Hospital, Boston, MA 02114, USA; Neuroscience Program, Harvard Medical School, Boston, MA 02115, USA
| | - Oscar D Murillo
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Pike See Cheah
- Departments of Neurology and Radiology, Massachusetts General Hospital, Boston, MA 02114, USA; Neuroscience Program, Harvard Medical School, Boston, MA 02115, USA; Department of Human Anatomy, Faculty of Medicine and Health Sciences, Universiti Putra Malaysia, Seri Kembangan, Selangor, Malaysia
| | - Srimeenakshi Srinivasan
- Department of Obstetrics, Gynecology, and Reproductive Sciences and Sanford Consortium for Regenerative Medicine, University of California, San Diego, La Jolla, CA 92037, USA
| | - Eric Tai
- Massachusetts General Hospital Cancer Center, Boston, MA 02114, USA; Department of Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - David T Ting
- Massachusetts General Hospital Cancer Center, Boston, MA 02114, USA; Department of Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Zhiyun Wei
- Department of Neurology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Matthew E Roth
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Louise C Laurent
- Department of Obstetrics, Gynecology, and Reproductive Sciences and Sanford Consortium for Regenerative Medicine, University of California, San Diego, La Jolla, CA 92037, USA
| | - Anna M Krichevsky
- Department of Neurology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Xandra O Breakefield
- Departments of Neurology and Radiology, Massachusetts General Hospital, Boston, MA 02114, USA; Neuroscience Program, Harvard Medical School, Boston, MA 02115, USA
| | | |
Collapse
|
4
|
Liu Q, Hu Y, Stucky A, Fang L, Zhong JF, Wang K. LongGF: computational algorithm and software tool for fast and accurate detection of gene fusions by long-read transcriptome sequencing. BMC Genomics 2020; 21:793. [PMID: 33372596 PMCID: PMC7771079 DOI: 10.1186/s12864-020-07207-4] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Accepted: 10/29/2020] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Long-read RNA-Seq techniques can generate reads that encompass a large proportion or the entire mRNA/cDNA molecules, so they are expected to address inherited limitations of short-read RNA-Seq techniques that typically generate < 150 bp reads. However, there is a general lack of software tools for gene fusion detection from long-read RNA-seq data, which takes into account the high basecalling error rates and the presence of alignment errors. RESULTS In this study, we developed a fast computational tool, LongGF, to efficiently detect candidate gene fusions from long-read RNA-seq data, including cDNA sequencing data and direct mRNA sequencing data. We evaluated LongGF on tens of simulated long-read RNA-seq datasets, and demonstrated its superior performance in gene fusion detection. We also tested LongGF on a Nanopore direct mRNA sequencing dataset and a PacBio sequencing dataset generated on a mixture of 10 cancer cell lines, and found that LongGF achieved better performance to detect known gene fusions over existing computational tools. Furthermore, we tested LongGF on a Nanopore cDNA sequencing dataset on acute myeloid leukemia, and pinpointed the exact location of a translocation (previously known in cytogenetic resolution) in base resolution, which was further validated by Sanger sequencing. CONCLUSIONS In summary, LongGF will greatly facilitate the discovery of candidate gene fusion events from long-read RNA-Seq data, especially in cancer samples. LongGF is implemented in C++ and is available at https://github.com/WGLab/LongGF .
Collapse
Affiliation(s)
- Qian Liu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Yu Hu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Andres Stucky
- Department of Otolaryngology, Keck School of Medicine, University of Southern California, Los Angeles, CA, 90033, USA
| | - Li Fang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Jiang F Zhong
- Department of Otolaryngology, Keck School of Medicine, University of Southern California, Los Angeles, CA, 90033, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
5
|
Brown NA, Elenitoba-Johnson KSJ. Enabling Precision Oncology Through Precision Diagnostics. ANNUAL REVIEW OF PATHOLOGY-MECHANISMS OF DISEASE 2020; 15:97-121. [PMID: 31977297 DOI: 10.1146/annurev-pathmechdis-012418-012735] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Genomic testing enables clinical management to be tailored to individual cancer patients based on the molecular alterations present within cancer cells. Genomic sequencing results can be applied to detect and classify cancer, predict prognosis, and target therapies. Next-generation sequencing has revolutionized the field of cancer genomics by enabling rapid and cost-effective sequencing of large portions of the genome. With this technology, precision oncology is quickly becoming a realized paradigm for managing the treatment of cancer patients. However, many challenges must be overcome to efficiently implement the transition of next-generation sequencing from research applications to routine clinical practice, including using specimens commonly available in the clinical setting; determining how to process, store, and manage large amounts of sequencing data; determining how to interpret and prioritize molecular findings; and coordinating health professionals from multiple disciplines.
Collapse
Affiliation(s)
- Noah A Brown
- Department of Pathology, University of Michigan Medical School, Ann Arbor, Michigan 48109, USA;
| | - Kojo S J Elenitoba-Johnson
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA;
| |
Collapse
|
6
|
Detection of novel fusion-transcripts by RNA-Seq in T-cell lymphoblastic lymphoma. Sci Rep 2019; 9:5179. [PMID: 30914738 PMCID: PMC6435891 DOI: 10.1038/s41598-019-41675-3] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2018] [Accepted: 03/14/2019] [Indexed: 02/08/2023] Open
Abstract
Fusions transcripts have been proven to be strong drivers for neoplasia-associated mutations, although their incidence in T-cell lymphoblastic lymphoma needs to be determined yet. Using RNA-Seq we have selected 55 fusion transcripts identified by at least two of three detection methods in the same tumour. We confirmed the existence of 24 predicted novel fusions that had not been described in cancer or normal tissues yet, indicating the accuracy of the prediction. Of note, one of them involves the proto oncogene TAL1. Other confirmed fusions could explain the overexpression of driver genes such as COMMD3-BMI1, LMO1 or JAK3. Five fusions found exclusively in tumour samples could be considered pathogenic (NFYG-TAL1, RIC3-TCRBC2, SLC35A3-HIAT1, PICALM MLLT10 and MLLT10-PICALM). However, other fusions detected simultaneously in normal and tumour samples (JAK3-INSL3, KANSL1-ARL17A/B and TFG-ADGRG7) could be germ-line fusions genes involved in tumour-maintaining tasks. Notably, some fusions were confirmed in more tumour samples than predicted, indicating that the detection methods underestimated the real number of existing fusions. Our results highlight the potential of RNA-Seq to identify new cryptic fusions, which could be drivers or tumour-maintaining passenger genes. Such novel findings shed light on the searching for new T-LBL biomarkers in these haematological disorders.
Collapse
|
7
|
Vu TN, Deng W, Trac QT, Calza S, Hwang W, Pawitan Y. A fast detection of fusion genes from paired-end RNA-seq data. BMC Genomics 2018; 19:786. [PMID: 30382840 PMCID: PMC6211471 DOI: 10.1186/s12864-018-5156-1] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2017] [Accepted: 10/10/2018] [Indexed: 01/03/2023] Open
Abstract
Background Fusion genes are known to be drivers of many common cancers, so they are potential markers for diagnosis, prognosis or therapy response. The advent of paired-end RNA sequencing enhances our ability to discover fusion genes. While there are available methods, routine analyses of large number of samples are still limited due to high computational demands. Results We develop FuSeq, a fast and accurate method to discover fusion genes based on quasi-mapping to quickly map the reads, extract initial candidates from split reads and fusion equivalence classes of mapped reads, and finally apply multiple filters and statistical tests to get the final candidates. We apply FuSeq to four validated datasets: breast cancer, melanoma and glioma datasets, and one spike-in dataset. The results reveal high sensitivity and specificity in all datasets, and compare well against other methods such as FusionMap, TRUP, TopHat-Fusion, SOAPfuse and JAFFA. In terms of computational time, FuSeq is two-fold faster than FusionMap and orders of magnitude faster than the other methods. Conclusions With this advantage of less computational demands, FuSeq makes it practical to investigate fusion genes in large numbers of samples. FuSeq is implemented in C++ and R, and available at https://github.com/nghiavtr/FuSeqfor non-commercial uses. Electronic supplementary material The online version of this article (10.1186/s12864-018-5156-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Trung Nghia Vu
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Nobels väg 12A, Stockholm, 17177, Sweden
| | - Wenjiang Deng
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Nobels väg 12A, Stockholm, 17177, Sweden
| | - Quang Thinh Trac
- Department of Computational Sciences and Engineering, VNU University of Engineering and Technology, Xuan Thuy, 144, Hanoi, 84024, Vietnam
| | - Stefano Calza
- Department of Molecular and Translational Medicine, University of Brescia, Viale Europa, 11, Brescia, 25125, Italy
| | - Woochang Hwang
- Data Science for Knowledge Creation Research Center, Seoul National University, Seoul, 151-747, South Korea
| | - Yudi Pawitan
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Nobels väg 12A, Stockholm, 17177, Sweden.
| |
Collapse
|
8
|
Rhee SJ, Kwon T, Seo M, Jang YJ, Sim TY, Cho S, Han SW, Lee GP. De novo-based transcriptome profiling of male-sterile and fertile watermelon lines. PLoS One 2017; 12:e0187147. [PMID: 29095876 PMCID: PMC5667795 DOI: 10.1371/journal.pone.0187147] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Accepted: 10/14/2017] [Indexed: 12/23/2022] Open
Abstract
The whole-genome sequence of watermelon (Citrullus lanatus (Thunb.) Matsum. & Nakai), a valuable horticultural crop worldwide, was released in 2013. Here, we compared a de novo-based approach (DBA) to a reference-based approach (RBA) using RNA-seq data, to aid in efforts to improve the annotation of the watermelon reference genome and to obtain biological insight into male-sterility in watermelon. We applied these techniques to available data from two watermelon lines: the male-sterile line DAH3615-MS and the male-fertile line DAH3615. Using DBA, we newly annotated 855 watermelon transcripts, and found gene functional clusters predicted to be related to stimulus responses, nucleic acid binding, transmembrane transport, homeostasis, and Golgi/vesicles. Among the DBA-annotated transcripts, 138 de novo-exclusive differentially-expressed genes (DEDEGs) related to male sterility were detected. Out of 33 randomly selected newly annotated transcripts and DEDEGs, 32 were validated by RT-qPCR. This study demonstrates the usefulness and reliability of the de novo transcriptome assembly in watermelon, and provides new insights for researchers exploring transcriptional blueprints with regard to the male sterility.
Collapse
Affiliation(s)
- Sun-Ju Rhee
- Department of Integrative Plant Science, Chung-Ang University, Ansung, Republic of Korea
| | - Taehyung Kwon
- Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea
| | - Minseok Seo
- Interdisciplinary Program in Bioinformatics, Seoul National University, Kwan-ak Gu, Seoul, Republic of Korea
- CHO&KIM Genomics, C-1008, H Business Park, 26, Beobwon-ro 9-gil, Songpa-gu, Seoul, Republic of Korea
| | - Yoon Jeong Jang
- Department of Integrative Plant Science, Chung-Ang University, Ansung, Republic of Korea
| | - Tae Yong Sim
- Department of Integrative Plant Science, Chung-Ang University, Ansung, Republic of Korea
| | - Seoae Cho
- CHO&KIM Genomics, C-1008, H Business Park, 26, Beobwon-ro 9-gil, Songpa-gu, Seoul, Republic of Korea
| | - Sang-Wook Han
- Department of Integrative Plant Science, Chung-Ang University, Ansung, Republic of Korea
- * E-mail: (SWH); (GPL)
| | - Gung Pyo Lee
- Department of Integrative Plant Science, Chung-Ang University, Ansung, Republic of Korea
- * E-mail: (SWH); (GPL)
| |
Collapse
|
9
|
Hsieh G, Bierman R, Szabo L, Lee AG, Freeman DE, Watson N, Sweet-Cordero EA, Salzman J. Statistical algorithms improve accuracy of gene fusion detection. Nucleic Acids Res 2017; 45:e126. [PMID: 28541529 PMCID: PMC5737606 DOI: 10.1093/nar/gkx453] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2017] [Accepted: 05/22/2017] [Indexed: 11/14/2022] Open
Abstract
Gene fusions are known to play critical roles in tumor pathogenesis. Yet, sensitive and specific algorithms to detect gene fusions in cancer do not currently exist. In this paper, we present a new statistical algorithm, MACHETE (Mismatched Alignment CHimEra Tracking Engine), which achieves highly sensitive and specific detection of gene fusions from RNA-Seq data, including the highest Positive Predictive Value (PPV) compared to the current state-of-the-art, as assessed in simulated data. We show that the best performing published algorithms either find large numbers of fusions in negative control data or suffer from low sensitivity detecting known driving fusions in gold standard settings, such as EWSR1-FLI1. As proof of principle that MACHETE discovers novel gene fusions with high accuracy in vivo, we mined public data to discover and subsequently PCR validate novel gene fusions missed by other algorithms in the ovarian cancer cell line OVCAR3. These results highlight the gains in accuracy achieved by introducing statistical models into fusion detection, and pave the way for unbiased discovery of potentially driving and druggable gene fusions in primary tumors.
Collapse
Affiliation(s)
- Gillian Hsieh
- Stanford University, Department of Biochemistry, 279 Campus Drive, Stanford, CA 94305, USA
| | - Rob Bierman
- Stanford University, Department of Biochemistry, 279 Campus Drive, Stanford, CA 94305, USA
| | - Linda Szabo
- Stanford University, Biomedical Informatics, 1265 Welch Road, MSOB, X-215, MC 5479, Stanford, CA 94305-5479, USA
| | - Alex Gia Lee
- Stanford University, Cancer Biology, 265 Campus Drive, Suite G2103, Stanford, CA 94305-5456, USA
| | - Donald E Freeman
- Stanford University, Department of Biochemistry, 279 Campus Drive, Stanford, CA 94305, USA
| | - Nathaniel Watson
- Stanford University, Department of Biochemistry, 279 Campus Drive, Stanford, CA 94305, USA
| | | | - Julia Salzman
- Stanford University, Department of Biochemistry, 279 Campus Drive, Stanford, CA 94305, USA.,Stanford University, Department of Biomedical Data Science, Stanford, CA 94305-5456, USA
| |
Collapse
|
10
|
Zhou JX, Yang X, Ning S, Wang L, Wang K, Zhang Y, Yuan F, Li F, Zhuo DD, Tang L, Zhuo D. Identification of KANSARL as the first cancer predisposition fusion gene specific to the population of European ancestry origin. Oncotarget 2017; 8:50594-50607. [PMID: 28881586 PMCID: PMC5584173 DOI: 10.18632/oncotarget.16385] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Accepted: 02/20/2017] [Indexed: 12/30/2022] Open
Abstract
Gene fusion is one of the hallmarks of cancer. Recent advances in RNA-seq of cancer transcriptomes have facilitated the discovery of fusion transcripts. In this study, we report identification of a surprisingly large number of fusion transcripts, including six KANSARL (KANSL1-ARL17A) transcripts that resulted from the fusion between the KANSL1 and ARL17A genes using a RNA splicingcode model. Five of these six KANSARL fusion transcripts are novel. By systematic analysis of RNA-seq data of glioblastoma, prostate cancer, lung cancer, breast cancer, and lymphoma from different regions of the World, we have found that KANSARL fusion transcripts were rarely detected in the tumors of individuals from Asia or Africa. In contrast, they exist in 30 - 52% of the tumors from North Americans cancer patients. Analysis of CEPH/Utah Pedigree 1463 has revealed that KANSARL is a familially-inherited fusion gene. Further analysis of RNA-seq datasets of the 1000 Genome Project has indicated that KANSARL fusion gene is specific to 28.9% of the population of European ancestry origin. In summary, we demonstrated that KANSARL is the first cancer predisposition fusion gene associated with genetic backgrounds of European ancestry origin.
Collapse
Affiliation(s)
- Jeff Xiwu Zhou
- Department of Medicine, School of Medicine, Ningbo University, Ningbo, China
| | - Xiaoyan Yang
- SplicingCodes.com, Biotailor Inc., Palmetto Bay, FL, USA
| | - Shunbin Ning
- Department of Internal Medicine, Quillen College of Medicine, East Tennessee State University, Johnson City, TN, USA
| | - Ling Wang
- Department of Internal Medicine, Quillen College of Medicine, East Tennessee State University, Johnson City, TN, USA
| | - Kesheng Wang
- Department of Biostatistics and Epidemiology, East Tennessee State University, Johnson City, TN, USA
| | - Yanbin Zhang
- Department of Biochemistry and Molecular Biology, University of Miami, Miami, FL, USA
| | - Fenghua Yuan
- Department of Biochemistry and Molecular Biology, University of Miami, Miami, FL, USA
| | - Fengli Li
- Department of Medicine, School of Medicine, Ningbo University, Ningbo, China
| | - David D Zhuo
- SplicingCodes.com, Biotailor Inc., Palmetto Bay, FL, USA
| | - Liren Tang
- SplicingCodes.com, Biotailor Inc., Palmetto Bay, FL, USA
| | - Degen Zhuo
- SplicingCodes.com, Biotailor Inc., Palmetto Bay, FL, USA
| |
Collapse
|
11
|
Kumar S, Razzaq SK, Vo AD, Gautam M, Li H. Identifying fusion transcripts using next generation sequencing. WILEY INTERDISCIPLINARY REVIEWS. RNA 2016; 7:811-823. [PMID: 27485475 PMCID: PMC5065767 DOI: 10.1002/wrna.1382] [Citation(s) in RCA: 69] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/20/2016] [Revised: 07/05/2016] [Accepted: 07/07/2016] [Indexed: 01/14/2023]
Abstract
Fusion transcripts (i.e., chimeric RNAs) resulting from gene fusions have been used successfully for cancer diagnosis, prognosis, and therapeutic applications. In addition, many fusion transcripts are found in normal human cell lines and tissues, with some data supporting their role in normal physiology. Besides chromosomal rearrangement, intergenic splicing can generate them. Global identification of fusion transcripts becomes possible with the help of next generation sequencing technology like RNA-Seq. In the past decade, major advancements have been made for chimeric RNA discovery due to the development of advanced sequencing platform and software packages. However, current software tools behave differently in terms of specificity, sensitivity, time, and computational memory usage. Recent benchmarking studies showed that none of the tools are inclusive. The development of high performance (accurate and fast), and user-friendly fusion detection tool/pipeline is still an open quest. In this article, we review the existing software packages for fusion detection. We explain the methods of the tools, and discuss various factors that affect fusion detection. We summarize conclusions drawn from several comparative studies, and then discuss some of the pitfalls of these studies. We also describe the limitations of current tools, and suggest directions for future development. WIREs RNA 2016, 7:811-823. doi: 10.1002/wrna.1382 For further resources related to this article, please visit the WIREs website.
Collapse
Affiliation(s)
- Shailesh Kumar
- Department of Pathology, School of Medicine, University of Virginia, Charlottesville, VA, USA
| | - Sundus Khalid Razzaq
- Department of Pathology, School of Medicine, University of Virginia, Charlottesville, VA, USA
| | - Angie Duy Vo
- Department of Pathology, School of Medicine, University of Virginia, Charlottesville, VA, USA
| | - Mamta Gautam
- Department of Pathology, School of Medicine, University of Virginia, Charlottesville, VA, USA
| | - Hui Li
- Department of Pathology, School of Medicine, University of Virginia, Charlottesville, VA, USA.
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA, USA.
| |
Collapse
|
12
|
Latysheva NS, Babu MM. Discovering and understanding oncogenic gene fusions through data intensive computational approaches. Nucleic Acids Res 2016; 44:4487-503. [PMID: 27105842 PMCID: PMC4889949 DOI: 10.1093/nar/gkw282] [Citation(s) in RCA: 121] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2016] [Accepted: 03/24/2016] [Indexed: 12/21/2022] Open
Abstract
Although gene fusions have been recognized as important drivers of cancer for decades, our understanding of the prevalence and function of gene fusions has been revolutionized by the rise of next-generation sequencing, advances in bioinformatics theory and an increasing capacity for large-scale computational biology. The computational work on gene fusions has been vastly diverse, and the present state of the literature is fragmented. It will be fruitful to merge three camps of gene fusion bioinformatics that appear to rarely cross over: (i) data-intensive computational work characterizing the molecular biology of gene fusions; (ii) development research on fusion detection tools, candidate fusion prioritization algorithms and dedicated fusion databases and (iii) clinical research that seeks to either therapeutically target fusion transcripts and proteins or leverages advances in detection tools to perform large-scale surveys of gene fusion landscapes in specific cancer types. In this review, we unify these different-yet highly complementary and symbiotic-approaches with the view that increased synergy will catalyze advancements in gene fusion identification, characterization and significance evaluation.
Collapse
Affiliation(s)
- Natasha S Latysheva
- MRC Laboratory of Molecular Biology, Francis Crick Ave, Cambridge CB2 0QH, United Kingdom
| | - M Madan Babu
- MRC Laboratory of Molecular Biology, Francis Crick Ave, Cambridge CB2 0QH, United Kingdom
| |
Collapse
|
13
|
Arsenijevic V, Davis-Dusenbery BN. Reproducible, Scalable Fusion Gene Detection from RNA-Seq. Methods Mol Biol 2016; 1381:223-37. [PMID: 26667464 DOI: 10.1007/978-1-4939-3204-7_13] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Chromosomal rearrangements resulting in the creation of novel gene products, termed fusion genes, have been identified as driving events in the development of multiple types of cancer. As these gene products typically do not exist in normal cells, they represent valuable prognostic and therapeutic targets. Advances in next-generation sequencing and computational approaches have greatly improved our ability to detect and identify fusion genes. Nevertheless, these approaches require significant computational resources. Here we describe an approach which leverages cloud computing technologies to perform fusion gene detection from RNA sequencing data at any scale. We additionally highlight methods to enhance reproducibility of bioinformatics analyses which may be applied to any next-generation sequencing experiment.
Collapse
Affiliation(s)
- Vladan Arsenijevic
- Department of Bioinformatics, Seven Bridges Genomics, One Broadway, 14th Floor, Cambridge, MA, 02142, USA
| | - Brandi N Davis-Dusenbery
- Department of Bioinformatics, Seven Bridges Genomics, One Broadway, 14th Floor, Cambridge, MA, 02142, USA.
| |
Collapse
|
14
|
Liu S, Tsai WH, Ding Y, Chen R, Fang Z, Huo Z, Kim S, Ma T, Chang TY, Priedigkeit NM, Lee AV, Luo J, Wang HW, Chung IF, Tseng GC. Comprehensive evaluation of fusion transcript detection algorithms and a meta-caller to combine top performing methods in paired-end RNA-seq data. Nucleic Acids Res 2015; 44:e47. [PMID: 26582927 PMCID: PMC4797269 DOI: 10.1093/nar/gkv1234] [Citation(s) in RCA: 112] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2015] [Accepted: 10/24/2015] [Indexed: 12/31/2022] Open
Abstract
Background: Fusion transcripts are formed by either fusion genes (DNA level) or trans-splicing events (RNA level). They have been recognized as a promising tool for diagnosing, subtyping and treating cancers. RNA-seq has become a precise and efficient standard for genome-wide screening of such aberration events. Many fusion transcript detection algorithms have been developed for paired-end RNA-seq data but their performance has not been comprehensively evaluated to guide practitioners. In this paper, we evaluated 15 popular algorithms by their precision and recall trade-off, accuracy of supporting reads and computational cost. We further combine top-performing methods for improved ensemble detection. Results: Fifteen fusion transcript detection tools were compared using three synthetic data sets under different coverage, read length, insert size and background noise, and three real data sets with selected experimental validations. No single method dominantly performed the best but SOAPfuse generally performed well, followed by FusionCatcher and JAFFA. We further demonstrated the potential of a meta-caller algorithm by combining top performing methods to re-prioritize candidate fusion transcripts with high confidence that can be followed by experimental validation. Conclusion: Our result provides insightful recommendations when applying individual tool or combining top performers to identify fusion transcript candidates.
Collapse
Affiliation(s)
- Silvia Liu
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, 130 De Soto Street, Pittsburgh, PA 15261, USA Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Biomedical Science Tower 3, 3501 Fifth Avenue, Pittsburgh, PA 15213, USA
| | - Wei-Hsiang Tsai
- Institute of Biomedical Informatics, National Yang-Ming University, No. 155, Sec. 2, Linong Street, Beitou District, Taipei 112, Taiwan
| | - Ying Ding
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, 130 De Soto Street, Pittsburgh, PA 15261, USA Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Biomedical Science Tower 3, 3501 Fifth Avenue, Pittsburgh, PA 15213, USA
| | - Rui Chen
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, 130 De Soto Street, Pittsburgh, PA 15261, USA
| | - Zhou Fang
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, 130 De Soto Street, Pittsburgh, PA 15261, USA
| | - Zhiguang Huo
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, 130 De Soto Street, Pittsburgh, PA 15261, USA
| | - SungHwan Kim
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, 130 De Soto Street, Pittsburgh, PA 15261, USA
| | - Tianzhou Ma
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, 130 De Soto Street, Pittsburgh, PA 15261, USA
| | - Ting-Yu Chang
- Institute of Microbiology and Immunology, National Yang-Ming University, No. 155, Sec. 2, Linong Street, Beitou District, Taipei 112, Taiwan
| | - Nolan Michael Priedigkeit
- Molecular Pharmacology, School of Medicine, University of Pittsburgh, 3550 Terrace Street, Pittsburgh, PA 15261, USA
| | - Adrian V Lee
- Magee-Women's Research Institute, 204 Craft Avenue, Pittsburgh, PA 15213, USA
| | - Jianhua Luo
- Department of Pathology, School of Medicine, University of Pittsburgh, 3550 Terrace Street, Pittsburgh, PA 15261, USA
| | - Hsei-Wei Wang
- Institute of Biomedical Informatics, National Yang-Ming University, No. 155, Sec. 2, Linong Street, Beitou District, Taipei 112, Taiwan Institute of Microbiology and Immunology, National Yang-Ming University, No. 155, Sec. 2, Linong Street, Beitou District, Taipei 112, Taiwan Center for Systems and Synthetic Biology, National Yang-Ming University, No. 155, Sec. 2, Linong Street, Beitou District, Taipei 112, Taiwan
| | - I-Fang Chung
- Institute of Biomedical Informatics, National Yang-Ming University, No. 155, Sec. 2, Linong Street, Beitou District, Taipei 112, Taiwan Center for Systems and Synthetic Biology, National Yang-Ming University, No. 155, Sec. 2, Linong Street, Beitou District, Taipei 112, Taiwan
| | - George C Tseng
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, 130 De Soto Street, Pittsburgh, PA 15261, USA Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Biomedical Science Tower 3, 3501 Fifth Avenue, Pittsburgh, PA 15213, USA
| |
Collapse
|
15
|
Davare MA, Tognon CE. Detecting and targetting oncogenic fusion proteins in the genomic era. Biol Cell 2015; 107:111-29. [PMID: 25631473 DOI: 10.1111/boc.201400096] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2014] [Accepted: 01/23/2015] [Indexed: 12/15/2022]
Abstract
The advent of widespread cancer genome sequencing has accelerated our understanding of the molecular aberrations underlying malignant disease at an unprecedented rate. Coupling the large number of bioinformatic methods developed to locate genomic breakpoints with increased sequence read length and a deeper understanding of coding region function has enabled rapid identification of novel actionable oncogenic fusion genes. Using examples of kinase fusions found in liquid and solid tumours, this review highlights major concepts that have arisen in our understanding of cancer pathogenesis through the study of fusion proteins. We provide an overview of recently developed methods to identify potential fusion proteins from next-generation sequencing data, describe the validation of their oncogenic potential and discuss the role of targetted therapies in treating cancers driven by fusion oncoproteins.
Collapse
Affiliation(s)
- Monika A Davare
- Knight Cancer Institute, Oregon Health & Science University, Portland, OR, 97239, U.S.A; Department of Pediatrics, Oregon Health & Science University, Portland, OR, 97239, U.S.A
| | | |
Collapse
|
16
|
Nariai N, Kojima K, Mimori T, Sato Y, Kawai Y, Yamaguchi-Kabata Y, Nagasaki M. TIGAR2: sensitive and accurate estimation of transcript isoform expression with longer RNA-Seq reads. BMC Genomics 2014; 15 Suppl 10:S5. [PMID: 25560536 PMCID: PMC4304212 DOI: 10.1186/1471-2164-15-s10-s5] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND High-throughput RNA sequencing (RNA-Seq) enables quantification and identification of transcripts at single-base resolution. Recently, longer sequence reads become available thanks to the development of new types of sequencing technologies as well as improvements in chemical reagents for the Next Generation Sequencers. Although several computational methods have been proposed for quantifying gene expression levels from RNA-Seq data, they are not sufficiently optimized for longer reads (e.g. >250 bp). RESULTS We propose TIGAR2, a statistical method for quantifying transcript isoforms from fixed and variable length RNA-Seq data. Our method models substitution, deletion, and insertion errors of sequencers based on gapped-alignments of reads to the reference cDNA sequences so that sensitive read-aligners such as Bowtie2 and BWA-MEM are effectively incorporated in our pipeline. Also, a heuristic algorithm is implemented in variational Bayesian inference for faster computation. We apply TIGAR2 to both simulation data and real data of human samples and evaluate performance of transcript quantification with TIGAR2 in comparison to existing methods. CONCLUSIONS TIGAR2 is a sensitive and accurate tool for quantifying transcript isoform abundances from RNA-Seq data. Our method performs better than existing methods for the fixed-length reads (100 bp, 250 bp, 500 bp, and 1000 bp of both single-end and paired-end) and variable-length reads, especially for reads longer than 250 bp.
Collapse
Affiliation(s)
- Naoki Nariai
- Department of Integrative Genomics, Tohoku Medical Megabank Organization, Tohoku University, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
| | - Kaname Kojima
- Department of Integrative Genomics, Tohoku Medical Megabank Organization, Tohoku University, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
| | - Takahiro Mimori
- Department of Integrative Genomics, Tohoku Medical Megabank Organization, Tohoku University, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
| | - Yukuto Sato
- Department of Integrative Genomics, Tohoku Medical Megabank Organization, Tohoku University, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
| | - Yosuke Kawai
- Department of Integrative Genomics, Tohoku Medical Megabank Organization, Tohoku University, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
| | - Yumi Yamaguchi-Kabata
- Department of Integrative Genomics, Tohoku Medical Megabank Organization, Tohoku University, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
| | - Masao Nagasaki
- Department of Integrative Genomics, Tohoku Medical Megabank Organization, Tohoku University, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
| |
Collapse
|
17
|
Greger L, Su J, Rung J, Ferreira PG, Geuvadis consortium, Lappalainen T, Dermitzakis ET, Brazma A. Tandem RNA chimeras contribute to transcriptome diversity in human population and are associated with intronic genetic variants. PLoS One 2014; 9:e104567. [PMID: 25133550 PMCID: PMC4136775 DOI: 10.1371/journal.pone.0104567] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2014] [Accepted: 07/14/2014] [Indexed: 01/18/2023] Open
Abstract
Chimeric RNAs originating from two or more different genes are known to exist not only in cancer, but also in normal tissues, where they can play a role in human evolution. However, the exact mechanism of their formation is unknown. Here, we use RNA sequencing data from 462 healthy individuals representing 5 human populations to systematically identify and in depth characterize 81 RNA tandem chimeric transcripts, 13 of which are novel. We observe that 6 out of these 81 chimeras have been regarded as cancer-specific. Moreover, we show that a prevalence of long introns at the fusion breakpoint is associated with the chimeric transcripts formation. We also find that tandem RNA chimeras have lower abundances as compared to their partner genes. Finally, by combining our results with genomic data from the same individuals we uncover intronic genetic variants associated with the chimeric RNA formation. Taken together our findings provide an important insight into the chimeric transcripts formation and open new avenues of research into the role of intronic genetic variants in post-transcriptional processing events.
Collapse
Affiliation(s)
- Liliana Greger
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Hinxton, United Kingdom
- * E-mail:
| | - Jing Su
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Hinxton, United Kingdom
| | - Johan Rung
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Hinxton, United Kingdom
| | - Pedro G. Ferreira
- Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland
- Institute for Genetics and Genomics in Geneva (iG3), University of Geneva, Geneva, Switzerland
- Swiss Institute of Bioinformatics, Geneva, Switzerland
| | | | - Tuuli Lappalainen
- New York Genome Center, New York, New York, United States of America
| | - Emmanouil T. Dermitzakis
- Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland
- Institute for Genetics and Genomics in Geneva (iG3), University of Geneva, Geneva, Switzerland
- Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Alvis Brazma
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Hinxton, United Kingdom
| |
Collapse
|
18
|
Wijaya E, Shimizu K, Asai K, Hamada M. Reference-free prediction of rearrangement breakpoint reads. ACTA ACUST UNITED AC 2014; 30:2559-67. [PMID: 24876376 DOI: 10.1093/bioinformatics/btu360] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
MOTIVATION Chromosome rearrangement events are triggered by atypical breaking and rejoining of DNA molecules, which are observed in many cancer-related diseases. The detection of rearrangement is typically done by using short reads generated by next-generation sequencing (NGS) and combining the reads with knowledge of a reference genome. Because structural variations and genomes differ from one person to another, intermediate comparison via a reference genome may lead to loss of information. RESULTS In this article, we propose a reference-free method for detecting clusters of breakpoints from the chromosomal rearrangements. This is done by directly comparing a set of NGS normal reads with another set that may be rearranged. Our method SlideSort-BPR (breakpoint reads) is based on a fast algorithm for all-against-all comparisons of short reads and theoretical analyses of the number of neighboring reads. When applied to a dataset with a sequencing depth of 100×, it finds ∼ 88% of the breakpoints correctly with no false-positive reads. Moreover, evaluation on a real prostate cancer dataset shows that the proposed method predicts more fusion transcripts correctly than previous approaches, and yet produces fewer false-positive reads. To our knowledge, this is the first method to detect breakpoint reads without using a reference genome. AVAILABILITY AND IMPLEMENTATION The source code of SlideSort-BPR can be freely downloaded from https://code.google.com/p/slidesort-bpr/.
Collapse
Affiliation(s)
- Edward Wijaya
- Immunology Frontier Research Center, Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562 and Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Kana Shimizu
- Immunology Frontier Research Center, Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562 and Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Kiyoshi Asai
- Immunology Frontier Research Center, Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562 and Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo Shinjuku-ku, Tokyo 169-8555, Japan Immunology Frontier Research Center, Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562 and Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Michiaki Hamada
- Immunology Frontier Research Center, Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562 and Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo Shinjuku-ku, Tokyo 169-8555, Japan Immunology Frontier Research Center, Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562 and Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| |
Collapse
|
19
|
Xiao CL, Mai ZB, Lian XL, Zhong JY, Jin JJ, He QY, Zhang G. FANSe2: a robust and cost-efficient alignment tool for quantitative next-generation sequencing applications. PLoS One 2014; 9:e94250. [PMID: 24743329 PMCID: PMC3990525 DOI: 10.1371/journal.pone.0094250] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2013] [Accepted: 03/12/2014] [Indexed: 11/26/2022] Open
Abstract
Correct and bias-free interpretation of the deep sequencing data is inevitably dependent on the complete mapping of all mappable reads to the reference sequence, especially for quantitative RNA-seq applications. Seed-based algorithms are generally slow but robust, while Burrows-Wheeler Transform (BWT) based algorithms are fast but less robust. To have both advantages, we developed an algorithm FANSe2 with iterative mapping strategy based on the statistics of real-world sequencing error distribution to substantially accelerate the mapping without compromising the accuracy. Its sensitivity and accuracy are higher than the BWT-based algorithms in the tests using both prokaryotic and eukaryotic sequencing datasets. The gene identification results of FANSe2 is experimentally validated, while the previous algorithms have false positives and false negatives. FANSe2 showed remarkably better consistency to the microarray than most other algorithms in terms of gene expression quantifications. We implemented a scalable and almost maintenance-free parallelization method that can utilize the computational power of multiple office computers, a novel feature not present in any other mainstream algorithm. With three normal office computers, we demonstrated that FANSe2 mapped an RNA-seq dataset generated from an entire Illunima HiSeq 2000 flowcell (8 lanes, 608 M reads) to masked human genome within 4.1 hours with higher sensitivity than Bowtie/Bowtie2. FANSe2 thus provides robust accuracy, full indel sensitivity, fast speed, versatile compatibility and economical computational utilization, making it a useful and practical tool for deep sequencing applications. FANSe2 is freely available at http://bioinformatics.jnu.edu.cn/software/fanse2/.
Collapse
Affiliation(s)
- Chuan-Le Xiao
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, College of Life Science and Technology, Jinan University, Guangzhou, China
| | - Zhi-Biao Mai
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, College of Life Science and Technology, Jinan University, Guangzhou, China
| | - Xin-Lei Lian
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, College of Life Science and Technology, Jinan University, Guangzhou, China
| | - Jia-Yong Zhong
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, College of Life Science and Technology, Jinan University, Guangzhou, China
| | - Jing-jie Jin
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, College of Life Science and Technology, Jinan University, Guangzhou, China
| | - Qing-Yu He
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, College of Life Science and Technology, Jinan University, Guangzhou, China
- * E-mail: (GZ); (Q-YH)
| | - Gong Zhang
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, College of Life Science and Technology, Jinan University, Guangzhou, China
- * E-mail: (GZ); (Q-YH)
| |
Collapse
|
20
|
Investigation of de novo unique differentially expressed genes related to evolution in exercise response during domestication in Thoroughbred race horses. PLoS One 2014; 9:e91418. [PMID: 24658125 PMCID: PMC3962374 DOI: 10.1371/journal.pone.0091418] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2013] [Accepted: 02/11/2014] [Indexed: 12/17/2022] Open
Abstract
Previous studies of horse RNA-seq were performed by mapping sequence reads to the reference genome during transcriptome analysis. However in this study, we focused on two main ideas. First, differentially expressed genes (DEGs) were identified by de novo–based analysis (DBA) in RNA-seq data from six Thoroughbreds before and after exercise, here-after referred to as “de novo unique differentially expressed genes” (DUDEG). Second, by integrating both conventional DEGs and genes identified as being selected for during domestication of Thoroughbred and Jeju pony from whole genome re-sequencing (WGS) data, we give a new concept to the definition of DEG. We identified 1,034 and 567 DUDEGs in skeletal muscle and blood, respectively. DUDEGs in skeletal muscle were significantly related to exercise-induced stress biological process gene ontology (BP-GO) terms: ‘immune system process’; ‘response to stimulus’; and, ‘death’ and a KEGG pathways: ‘JAK-STAT signaling pathway’; ‘MAPK signaling pathway’; ‘regulation of actin cytoskeleton’; and, ‘p53 signaling pathway’. In addition, we found TIMELESS, EIF4A3 and ZNF592 in blood and CHMP4C and FOXO3 in skeletal muscle, to be in common between DUDEGs and selected genes identified by evolutionary statistics such as FST and Cross Population Extended Haplotype Homozygosity (XP-EHH). Moreover, in Thoroughbreds, three out of five genes (CHMP4C, EIF4A3 and FOXO3) related to exercise response showed relatively low nucleotide diversity compared to the Jeju pony. DUDEGs are not only conceptually new DEGs that cannot be attained from reference-based analysis (RBA) but also supports previous RBA results related to exercise in Thoroughbred. In summary, three exercise related genes which were selected for during domestication in the evolutionary history of Thoroughbred were identified as conceptually new DEGs in this study.
Collapse
|
21
|
Annala MJ, Parker BC, Zhang W, Nykter M. Fusion genes and their discovery using high throughput sequencing. Cancer Lett 2013; 340:192-200. [PMID: 23376639 PMCID: PMC3675181 DOI: 10.1016/j.canlet.2013.01.011] [Citation(s) in RCA: 61] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2012] [Revised: 12/28/2012] [Accepted: 01/04/2013] [Indexed: 01/25/2023]
Abstract
Fusion genes are hybrid genes that combine parts of two or more original genes. They can form as a result of chromosomal rearrangements or abnormal transcription, and have been shown to act as drivers of malignant transformation and progression in many human cancers. The biological significance of fusion genes together with their specificity to cancer cells has made them into excellent targets for molecular therapy. Fusion genes are also used as diagnostic and prognostic markers to confirm cancer diagnosis and monitor response to molecular therapies. High-throughput sequencing has enabled the systematic discovery of fusion genes in a wide variety of cancer types. In this review, we describe the history of fusion genes in cancer and the ways in which fusion genes form and affect cellular function. We also describe computational methodologies for detecting fusion genes from high-throughput sequencing experiments, and the most common sources of error that lead to false discovery of fusion genes.
Collapse
Affiliation(s)
- M J Annala
- Tampere University of Technology, Tampere, Finland.
| | | | | | | |
Collapse
|
22
|
Wu J, Zhang W, Huang S, He Z, Cheng Y, Wang J, Lam TW, Peng Z, Yiu SM. SOAPfusion: a robust and effective computational fusion discovery tool for RNA-seq reads. Bioinformatics 2013; 29:2971-8. [DOI: 10.1093/bioinformatics/btt522] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
|
23
|
Nariai N, Hirose O, Kojima K, Nagasaki M. TIGAR: transcript isoform abundance estimation method with gapped alignment of RNA-Seq data by variational Bayesian inference. ACTA ACUST UNITED AC 2013; 29:2292-9. [PMID: 23821651 DOI: 10.1093/bioinformatics/btt381] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
MOTIVATION Many human genes express multiple transcript isoforms through alternative splicing, which greatly increases diversity of protein function. Although RNA sequencing (RNA-Seq) technologies have been widely used in measuring amounts of transcribed mRNA, accurate estimation of transcript isoform abundances from RNA-Seq data is challenging because reads often map to more than one transcript isoforms or paralogs whose sequences are similar to each other. RESULTS We propose a statistical method to estimate transcript isoform abundances from RNA-Seq data. Our method can handle gapped alignments of reads against reference sequences so that it allows insertion or deletion errors within reads. The proposed method optimizes the number of transcript isoforms by variational Bayesian inference through an iterative procedure, and its convergence is guaranteed under a stopping criterion. On simulated datasets, our method outperformed the comparable quantification methods in inferring transcript isoform abundances, and at the same time its rate of convergence was faster than that of the expectation maximization algorithm. We also applied our method to RNA-Seq data of human cell line samples, and showed that our prediction result was more consistent among technical replicates than those of other methods. AVAILABILITY An implementation of our method is available at http://github.com/nariai/tigar CONTACT nariai@megabank.tohoku.ac.jp SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Naoki Nariai
- Department of Integrative Genomics, Tohoku Medical Megabank Organization, Tohoku University, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8575, Japan.
| | | | | | | |
Collapse
|
24
|
Wang L. Identification of cancer gene fusions based on advanced analysis of the human genome or transcriptome. Front Med 2013; 7:280-9. [PMID: 23807217 DOI: 10.1007/s11684-013-0265-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2012] [Accepted: 02/27/2013] [Indexed: 01/03/2023]
Abstract
Many gene fusions have been recognized as important diagnostic and/or prognostic markers in human malignancies. In recent years, novel gene fusions have been identified in cases without prior knowledge of the genetic background. Accompanied by a powerful computational data analysis method, new genome-wide screening approaches were used to detect cryptic genomic aberrations. This review focused on advanced genomewide screening approaches in fusion gene identification, such as microarray-based approaches, next-generation sequencing, and NanoString nCounter gene expression system. The fundamental rationale and strategy for fusion gene identification using each biotech platform are also discussed.
Collapse
Affiliation(s)
- Lu Wang
- Department of Pathology, Memorial Sloan-Kettering Cancer Center, New York, NY 10065, USA.
| |
Collapse
|
25
|
Long-range transcriptome sequencing reveals cancer cell growth regulatory chimeric mRNA. Neoplasia 2013; 14:1087-96. [PMID: 23226102 DOI: 10.1593/neo.121342] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2012] [Revised: 08/16/2012] [Accepted: 09/30/2012] [Indexed: 12/15/2022] Open
Abstract
mRNA chimeras from chromosomal translocations often play a role as transforming oncogenes. However, cancer transcriptomes also contain mRNA chimeras that may play a role in tumor development, which arise as transcriptional or post-transcriptional events. To identify such chimeras, we developed a deterministic screening strategy for long-range sequence analysis. High-throughput, long-read sequencing was then performed on cDNA libraries from major tumor histotypes and corresponding normal tissues. These analyses led to the identification of 378 chimeras, with an unexpectedly high frequency of expression (≈2 x 10(-5) of all mRNA). Functional assays in breast and ovarian cancer cell lines showed that a large fraction of mRNA chimeras regulates cell replication. Strikingly, chimeras were shown to include both positive and negative regulators of cell growth, which functioned as such in a cell-type-specific manner. Replication-controlling chimeras were found to be expressed by most cancers from breast, ovary, colon, uterus, kidney, lung, and stomach, suggesting a widespread role in tumor development.
Collapse
|
26
|
Carrara M, Beccuti M, Cavallo F, Donatelli S, Lazzarato F, Cordero F, Calogero RA. State of art fusion-finder algorithms are suitable to detect transcription-induced chimeras in normal tissues? BMC Bioinformatics 2013; 14 Suppl 7:S2. [PMID: 23815381 PMCID: PMC3633050 DOI: 10.1186/1471-2105-14-s7-s2] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Background RNA-seq has the potential to discover genes created by chromosomal rearrangements. Fusion genes, also known as "chimeras", are formed by the breakage and re-joining of two different chromosomes. It is known that chimeras have been implicated in the development of cancer. Few publications in the past showed the presence of fusion events also in normal tissue, but with very limited overlaps between their results. More recently, two fusion genes in normal tissues were detected using both RNA-seq and protein data. Due to heterogeneous results in identifying chimeras in normal tissue, we decided to evaluate the efficacy of state of the art fusion finders in detecting chimeras in RNA-seq data from normal tissues. Results We compared the performance of six fusion-finder tools: FusionHunter, FusionMap, FusionFinder, MapSplice, deFuse and TopHat-fusion. To evaluate the sensitivity we used a synthetic dataset of fusion-products, called positive dataset; in these experiments FusionMap, FusionFinder, MapSplice, and TopHat-fusion are able to detect more than 78% of fusion genes. All tools were error prone with high variability among the tools, identifying some fusion genes not present in the synthetic dataset. To better investigate the false discovery chimera detection rate, synthetic datasets free of fusion-products, called negative datasets, were used. The negative datasets have different read lengths and quality scores, which allow detecting dependency of the tools on both these features. FusionMap, FusionFinder, mapSplice, deFuse and TopHat-fusion were error-prone. Only FusionHunter results were free of false positive. FusionMap gave the best compromise in terms of specificity in the negative dataset and of sensitivity in the positive dataset. Conclusions We have observed a dependency of the tools on read length, quality score and on the number of reads supporting each chimera. Thus, it is important to carefully select the software on the basis of the structure of the RNA-seq data under analysis. Furthermore, the sensitivity of chimera detection tools does not seem to be sufficient to provide results consistent with those obtained in normal tissues on the basis of fusion events extracted from published data.
Collapse
Affiliation(s)
- Matteo Carrara
- University of Torino, Bioinformatics & Genomics unit, Molecular Biotechnology Center, Via Nizza 52, 10126 Torino, Italy
| | | | | | | | | | | | | |
Collapse
|
27
|
Abstract
Ongoing global genome characterization efforts are revolutionizing our knowledge of cancer genomics and tumor biology. In parallel, information gleaned from these studies on driver cancer gene alterations--mutations, copy number alterations, translocations, and/or chromosomal rearrangements--an be leveraged, in principle, to develop a cohesive framework for individualized cancer treatment. These possibilities have been enabled, to a large degree, by revolutionary advances in genomic technologies that facilitate systematic profiling for hallmark cancer genetic alterations at increasingly fine resolutions. Ongoing innovations in existing genomics technologies, as well as the many emerging technologies, will likely continue to advance translational cancer genomics and precision cancer medicine.
Collapse
Affiliation(s)
- Laura E MacConaill
- Center for Cancer Genome Discovery, Dana-Farber Cancer Institute, 44 Binney St, Dana 1539, Boston, MA 02115, USA.
| |
Collapse
|
28
|
Xuan J, Yu Y, Qing T, Guo L, Shi L. Next-generation sequencing in the clinic: promises and challenges. Cancer Lett 2012; 340:284-95. [PMID: 23174106 DOI: 10.1016/j.canlet.2012.11.025] [Citation(s) in RCA: 199] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2012] [Revised: 11/13/2012] [Accepted: 11/13/2012] [Indexed: 02/06/2023]
Abstract
The advent of next generation sequencing (NGS) technologies has revolutionized the field of genomics, enabling fast and cost-effective generation of genome-scale sequence data with exquisite resolution and accuracy. Over the past years, rapid technological advances led by academic institutions and companies have continued to broaden NGS applications from research to the clinic. A recent crop of discoveries have highlighted the medical impact of NGS technologies on Mendelian and complex diseases, particularly cancer. However, the ever-increasing pace of NGS adoption presents enormous challenges in terms of data processing, storage, management and interpretation as well as sequencing quality control, which hinder the translation from sequence data into clinical practice. In this review, we first summarize the technical characteristics and performance of current NGS platforms. We further highlight advances in the applications of NGS technologies towards the development of clinical diagnostics and therapeutics. Common issues in NGS workflows are also discussed to guide the selection of NGS platforms and pipelines for specific research purposes.
Collapse
Affiliation(s)
- Jiekun Xuan
- School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, China; National Center for Toxicological Research, US Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA
| | | | | | | | | |
Collapse
|
29
|
Benelli M, Pescucci C, Marseglia G, Severgnini M, Torricelli F, Magi A. Discovering chimeric transcripts in paired-end RNA-seq data by using EricScript. Bioinformatics 2012; 28:3232-9. [DOI: 10.1093/bioinformatics/bts617] [Citation(s) in RCA: 120] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
30
|
Kalyana-Sundaram S, Shanmugam A, Chinnaiyan AM. Gene Fusion Markup Language: a prototype for exchanging gene fusion data. BMC Bioinformatics 2012; 13:269. [PMID: 23072312 PMCID: PMC3607969 DOI: 10.1186/1471-2105-13-269] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2011] [Accepted: 10/11/2012] [Indexed: 12/26/2022] Open
Abstract
Background An avalanche of next generation sequencing (NGS) studies has generated an unprecedented amount of genomic structural variation data. These studies have also identified many novel gene fusion candidates with more detailed resolution than previously achieved. However, in the excitement and necessity of publishing the observations from this recently developed cutting-edge technology, no community standardization approach has arisen to organize and represent the data with the essential attributes in an interchangeable manner. As transcriptome studies have been widely used for gene fusion discoveries, the current non-standard mode of data representation could potentially impede data accessibility, critical analyses, and further discoveries in the near future. Results Here we propose a prototype, Gene Fusion Markup Language (GFML) as an initiative to provide a standard format for organizing and representing the significant features of gene fusion data. GFML will offer the advantage of representing the data in a machine-readable format to enable data exchange, automated analysis interpretation, and independent verification. As this database-independent exchange initiative evolves it will further facilitate the formation of related databases, repositories, and analysis tools. The GFML prototype is made available at
http://code.google.com/p/gfml-prototype/. Conclusion The Gene Fusion Markup Language (GFML) presented here could facilitate the development of a standard format for organizing, integrating and representing the significant features of gene fusion data in an inter-operable and query-able fashion that will enable biologically intuitive access to gene fusion findings and expedite functional characterization. A similar model is envisaged for other NGS data analyses.
Collapse
Affiliation(s)
- Shanker Kalyana-Sundaram
- Michigan Center for Translational Pathology, Department of Pathology, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | | | | |
Collapse
|
31
|
Wang Q, Xia J, Jia P, Pao W, Zhao Z. Application of next generation sequencing to human gene fusion detection: computational tools, features and perspectives. Brief Bioinform 2012; 14:506-19. [PMID: 22877769 DOI: 10.1093/bib/bbs044] [Citation(s) in RCA: 84] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Gene fusions are important genomic events in human cancer because their fusion gene products can drive the development of cancer and thus are potential prognostic tools or therapeutic targets in anti-cancer treatment. Major advancements have been made in computational approaches for fusion gene discovery over the past 3 years due to improvements and widespread applications of high-throughput next generation sequencing (NGS) technologies. To identify fusions from NGS data, existing methods typically leverage the strengths of both sequencing technologies and computational strategies. In this article, we review the NGS and computational features of existing methods for fusion gene detection and suggest directions for future development.
Collapse
|
32
|
Wu S, Li C, Huang W, Li W, Li RW. Alternative splicing regulated by butyrate in bovine epithelial cells. PLoS One 2012; 7:e39182. [PMID: 22720068 PMCID: PMC3375255 DOI: 10.1371/journal.pone.0039182] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2012] [Accepted: 05/21/2012] [Indexed: 12/02/2022] Open
Abstract
As a signaling molecule and an inhibitor of histone deacetylases (HDACs), butyrate exerts its impact on a broad range of biological processes, such as apoptosis and cell proliferation, in addition to its critical role in energy metabolism in ruminants. This study examined the effect of butyrate on alternative splicing in bovine epithelial cells using RNA-seq technology. Junction reads account for 11.28 and 12.32% of total mapped reads between the butyrate-treated (BT) and control (CT) groups. 201,326 potential splicing junctions detected were supported by ≥3 junction reads. Approximately 94% of these junctions conformed to the consensus sequence (GT/AG) while ∼3% were GC/AG junctions. No AT/AC junctions were observed. A total of 2,834 exon skipping events, supported by a minimum of 3 junction reads, were detected. At least 7 genes, their mRNA expression significantly affected by butyrate, also had exon skipping events differentially regulated by butyrate. Furthermore, COL5A3, which was induced 310-fold by butyrate (FDR <0.001) at the gene level, had a significantly higher number of junction reads mapped to Exon#8 (Donor) and Exon#11 (Acceptor) in BT. This event had the potential to result in the formation of a COL5A3 mRNA isoform with 2 of the 69 exons missing. In addition, 216 differentially expressed transcript isoforms regulated by butyrate were detected. For example, Isoform 1 of ORC1 was strongly repressed by butyrate while Isoform 2 remained unchanged. Butyrate physically binds to and inhibits all zinc-dependent HDACs except HDAC6 and HDAC10. Our results provided evidence that butyrate also regulated deacetylase activities of classical HDACs via its transcriptional control. Moreover, thirteen gene fusion events differentially affected by butyrate were identified. Our results provided a snapshot into complex transcriptome dynamics regulated by butyrate, which will facilitate our understanding of the biological effects of butyrate and other HDAC inhibitors.
Collapse
Affiliation(s)
- Sitao Wu
- Center for Research in Biological Systems, University of California San Diego, San Diego, California, United States of America
| | - Congjun Li
- USDA-ARS, Bovine Functional Genomics Laboratory, Beltsville, Maryland, United States of America
| | - Wen Huang
- Department of Genetics, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Weizhong Li
- Center for Research in Biological Systems, University of California San Diego, San Diego, California, United States of America
| | - Robert W. Li
- USDA-ARS, Bovine Functional Genomics Laboratory, Beltsville, Maryland, United States of America
- * E-mail:
| |
Collapse
|
33
|
RNA-Seq mapping and detection of gene fusions with a suffix array algorithm. PLoS Comput Biol 2012; 8:e1002464. [PMID: 22496636 PMCID: PMC3320572 DOI: 10.1371/journal.pcbi.1002464] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2011] [Accepted: 02/21/2012] [Indexed: 12/20/2022] Open
Abstract
High-throughput RNA sequencing enables quantification of transcripts (both known and novel), exon/exon junctions and fusions of exons from different genes. Discovery of gene fusions–particularly those expressed with low abundance– is a challenge with short- and medium-length sequencing reads. To address this challenge, we implemented an RNA-Seq mapping pipeline within the LifeScope software. We introduced new features including filter and junction mapping, annotation-aided pairing rescue and accurate mapping quality values. We combined this pipeline with a Suffix Array Spliced Read (SASR) aligner to detect chimeric transcripts. Performing paired-end RNA-Seq of the breast cancer cell line MCF-7 using the SOLiD system, we called 40 gene fusions among over 120,000 splicing junctions. We validated 36 of these 40 fusions with TaqMan assays, of which 25 were expressed in MCF-7 but not the Human Brain Reference. An intra-chromosomal gene fusion involving the estrogen receptor alpha gene ESR1, and another involving the RPS6KB1 (Ribosomal protein S6 kinase beta-1) were recurrently expressed in a number of breast tumor cell lines and a clinical tumor sample. Advances in sequencing technology are enabling detailed characterization of RNA transcripts from biological samples. The fundamental challenge of accurately mapping the reads on transcripts and gleaning biological meaning from the data remains. One class of transcripts, gene fusions, is particularly important in cancer. Some gene fusions are prominent markers in leukemia, prostate, and other cancers and putatively causative in certain tumor types. We present a set of new RNA-Seq analysis techniques to map reads, and count expression of genes, exons and splicing junctions, especially those that give evidence of gene fusions. These tools are available in a software package with a straightforward graphical user interface. Using this software, we called and validated several gene fusions in a breast cancer cell line. By testing the presence of these fusions in a larger population of tumor cell lines and clinical samples, we found that two of them were expressed recurrently.
Collapse
|
34
|
Application of the whole-transcriptome shotgun sequencing approach to the study of Philadelphia-positive acute lymphoblastic leukemia. Blood Cancer J 2012; 2:e61. [PMID: 22829256 PMCID: PMC3317525 DOI: 10.1038/bcj.2012.6] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2012] [Accepted: 01/16/2012] [Indexed: 12/17/2022] Open
Abstract
Although the pathogenesis of BCR–ABL1-positive acute lymphoblastic leukemia (ALL) is mainly related to the expression of the BCR–ABL1 fusion transcript, additional cooperating genetic lesions are supposed to be involved in its development and progression. Therefore, in an attempt to investigate the complex landscape of mutations, changes in expression profiles and alternative splicing (AS) events that can be observed in such disease, the leukemia transcriptome of a BCR–ABL1-positive ALL patient at diagnosis and at relapse was sequenced using a whole-transcriptome shotgun sequencing (RNA-Seq) approach. A total of 13.9 and 15.8 million sequence reads was generated from de novo and relapsed samples, respectively, and aligned to the human genome reference sequence. This led to the identification of five validated missense mutations in genes involved in metabolic processes (DPEP1, TMEM46), transport (MVP), cell cycle regulation (ABL1) and catalytic activity (CTSZ), two of which resulted in acquired relapse variants. In all, 6390 and 4671 putative AS events were also detected, as well as expression levels for 18 315 and 18 795 genes, 28% of which were differentially expressed in the two disease phases. These data demonstrate that RNA-Seq is a suitable approach for identifying a wide spectrum of genetic alterations potentially involved in ALL.
Collapse
|
35
|
Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat Rev Genet 2011; 13:36-46. [PMID: 22124482 DOI: 10.1038/nrg3117] [Citation(s) in RCA: 1122] [Impact Index Per Article: 80.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Repetitive DNA sequences are abundant in a broad range of species, from bacteria to mammals, and they cover nearly half of the human genome. Repeats have always presented technical challenges for sequence alignment and assembly programs. Next-generation sequencing projects, with their short read lengths and high data volumes, have made these challenges more difficult. From a computational perspective, repeats create ambiguities in alignment and assembly, which, in turn, can produce biases and errors when interpreting results. Simply ignoring repeats is not an option, as this creates problems of its own and may mean that important biological phenomena are missed. We discuss the computational problems surrounding repeats and describe strategies used by current bioinformatics systems to solve them.
Collapse
|
36
|
|
37
|
Kim D, Salzberg SL. TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. Genome Biol 2011. [PMID: 21835007 DOI: 10.1186/gb‐2011‐12‐8‐r72] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
TopHat-Fusion is an algorithm designed to discover transcripts representing fusion gene products, which result from the breakage and re-joining of two different chromosomes, or from rearrangements within a chromosome. TopHat-Fusion is an enhanced version of TopHat, an efficient program that aligns RNA-seq reads without relying on existing annotation. Because it is independent of gene annotation, TopHat-Fusion can discover fusion products deriving from known genes, unknown genes and unannotated splice variants of known genes. Using RNA-seq data from breast and prostate cancer cell lines, we detected both previously reported and novel fusions with solid supporting evidence. TopHat-Fusion is available at http://tophat-fusion.sourceforge.net/.
Collapse
Affiliation(s)
- Daehwan Kim
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA.
| | | |
Collapse
|
38
|
Iyer MK, Chinnaiyan AM, Maher CA. ChimeraScan: a tool for identifying chimeric transcription in sequencing data. ACTA ACUST UNITED AC 2011; 27:2903-4. [PMID: 21840877 DOI: 10.1093/bioinformatics/btr467] [Citation(s) in RCA: 211] [Impact Index Per Article: 15.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
SUMMARY Next generation sequencing (NGS) technologies have enabled de novo gene fusion discovery that could reveal candidates with therapeutic significance in cancer. Here we present an open-source software package, ChimeraScan, for the discovery of chimeric transcription between two independent transcripts in high-throughput transcriptome sequencing data. AVAILABILITY http://chimerascan.googlecode.com CONTACT cmaher@dom.wustl.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Matthew K Iyer
- Michigan Center for Translational Pathology, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | | | | |
Collapse
|
39
|
Kim D, Salzberg SL. TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. Genome Biol 2011; 12:R72. [PMID: 21835007 PMCID: PMC3245612 DOI: 10.1186/gb-2011-12-8-r72] [Citation(s) in RCA: 621] [Impact Index Per Article: 44.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2011] [Revised: 07/21/2011] [Accepted: 08/11/2011] [Indexed: 12/21/2022] Open
Abstract
TopHat-Fusion is an algorithm designed to discover transcripts representing fusion gene products, which result from the breakage and re-joining of two different chromosomes, or from rearrangements within a chromosome. TopHat-Fusion is an enhanced version of TopHat, an efficient program that aligns RNA-seq reads without relying on existing annotation. Because it is independent of gene annotation, TopHat-Fusion can discover fusion products deriving from known genes, unknown genes and unannotated splice variants of known genes. Using RNA-seq data from breast and prostate cancer cell lines, we detected both previously reported and novel fusions with solid supporting evidence. TopHat-Fusion is available at http://tophat-fusion.sourceforge.net/.
Collapse
Affiliation(s)
- Daehwan Kim
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA.
| | | |
Collapse
|
40
|
Li Y, Chien J, Smith DI, Ma J. FusionHunter: identifying fusion transcripts in cancer using paired-end RNA-seq. ACTA ACUST UNITED AC 2011; 27:1708-10. [PMID: 21546395 DOI: 10.1093/bioinformatics/btr265] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Fusion transcripts can be created as a result of genome rearrangement in cancer. Some of them play important roles in carcinogenesis, and can serve as diagnostic and therapeutic targets. With more and more cancer genomes being sequenced by next-generation sequencing technologies, we believe an efficient tool for reliably identifying fusion transcripts will be desirable for many groups. RESULTS We designed and implemented an open-source software tool, called FusionHunter, which reliably identifies fusion transcripts from transcriptional analysis of paired-end RNA-seq. We show that FusionHunter can accurately detect fusions that were previously confirmed by RT-PCR in a publicly available dataset. The purpose of FusionHunter is to identify potential fusions with high sensitivity and specificity and to guide further functional validation in the laboratory. AVAILABILITY http://bioen-compbio.bioen.illinois.edu/FusionHunter/.
Collapse
Affiliation(s)
- Yang Li
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | | | | | | |
Collapse
|
41
|
McPherson A, Wu C, Hajirasouliha I, Hormozdiari F, Hach F, Lapuk A, Volik S, Shah S, Collins C, Sahinalp SC. Comrad: detection of expressed rearrangements by integrated analysis of RNA-Seq and low coverage genome sequence data. ACTA ACUST UNITED AC 2011; 27:1481-8. [PMID: 21478487 DOI: 10.1093/bioinformatics/btr184] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
MOTIVATION Comrad is a novel algorithmic framework for the integrated analysis of RNA-Seq and whole genome shotgun sequencing (WGSS) data for the purposes of discovering genomic rearrangements and aberrant transcripts. The Comrad framework leverages the advantages of both RNA-Seq and WGSS data, providing accurate classification of rearrangements as expressed or not expressed and accurate classification of the genomic or non-genomic origin of aberrant transcripts. A major benefit of Comrad is its ability to accurately identify aberrant transcripts and associated rearrangements using low coverage genome data. As a result, a Comrad analysis can be performed at a cost comparable to that of two RNA-Seq experiments, significantly lower than an analysis requiring high coverage genome data. RESULTS We have applied Comrad to the discovery of gene fusions and read-throughs in prostate cancer cell line C4-2, a derivative of the LNCaP cell line with androgen-independent characteristics. As a proof of concept, we have rediscovered in the C4-2 data 4 of the 6 fusions previously identified in LNCaP. We also identified six novel fusion transcripts and associated genomic breakpoints, and verified their existence in LNCaP, suggesting that Comrad may be more sensitive than previous methods that have been applied to fusion discovery in LNCaP. We show that many of the gene fusions discovered using Comrad would be difficult to identify using currently available techniques. AVAILABILITY A C++ and Perl implementation of the method demonstrated in this article is available at http://compbio.cs.sfu.ca/.
Collapse
Affiliation(s)
- Andrew McPherson
- School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada.
| | | | | | | | | | | | | | | | | | | |
Collapse
|