1
|
Cotto KC, Feng YY, Ramu A, Richters M, Freshour SL, Skidmore ZL, Xia H, McMichael JF, Kunisaki J, Campbell KM, Chen THP, Rozycki EB, Adkins D, Devarakonda S, Sankararaman S, Lin Y, Chapman WC, Maher CA, Arora V, Dunn GP, Uppaluri R, Govindan R, Griffith OL, Griffith M. Integrated analysis of genomic and transcriptomic data for the discovery of splice-associated variants in cancer. Nat Commun 2023; 14:1589. [PMID: 36949070 PMCID: PMC10033906 DOI: 10.1038/s41467-023-37266-6] [Citation(s) in RCA: 25] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Accepted: 03/08/2023] [Indexed: 03/24/2023] Open
Abstract
Somatic mutations within non-coding regions and even exons may have unidentified regulatory consequences that are often overlooked in analysis workflows. Here we present RegTools ( www.regtools.org ), a computationally efficient, free, and open-source software package designed to integrate somatic variants from genomic data with splice junctions from bulk or single cell transcriptomic data to identify variants that may cause aberrant splicing. We apply RegTools to over 9000 tumor samples with both tumor DNA and RNA sequence data. RegTools discovers 235,778 events where a splice-associated variant significantly increases the splicing of a particular junction, across 158,200 unique variants and 131,212 unique junctions. To characterize these somatic variants and their associated splice isoforms, we annotate them with the Variant Effect Predictor, SpliceAI, and Genotype-Tissue Expression junction counts and compare our results to other tools that integrate genomic and transcriptomic data. While many events are corroborated by the aforementioned tools, the flexibility of RegTools also allows us to identify splice-associated variants in known cancer drivers, such as TP53, CDKN2A, and B2M, and other genes.
Collapse
Affiliation(s)
- Kelsy C Cotto
- Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Yang-Yang Feng
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Avinash Ramu
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | - Megan Richters
- Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Sharon L Freshour
- Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Zachary L Skidmore
- Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Huiming Xia
- Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Joshua F McMichael
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Jason Kunisaki
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Katie M Campbell
- Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Timothy Hung-Po Chen
- Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Emily B Rozycki
- Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Douglas Adkins
- Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Siddhartha Devarakonda
- Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Sumithra Sankararaman
- Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Yiing Lin
- Department of Surgery, Washington University School of Medicine, St. Louis, MO, USA
| | - William C Chapman
- Department of Surgery, Washington University School of Medicine, St. Louis, MO, USA
| | - Christopher A Maher
- Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Vivek Arora
- Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Gavin P Dunn
- Department of Neurosurgery, Mass General Hospital, Boston, MA, USA
- Center for Brain Tumor Immunology and Immunotherapy, Mass General Hospital, Boston, MA, USA
| | - Ravindra Uppaluri
- Department of Surgery, Brigham and Women's Hospital, Boston, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Ramaswamy Govindan
- Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
- Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO, USA
| | - Obi L Griffith
- Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA.
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA.
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA.
- Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO, USA.
| | - Malachi Griffith
- Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA.
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA.
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA.
- Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO, USA.
| |
Collapse
|
2
|
Mucaki EJ, Shirley BC, Rogan PK. Expression Changes Confirm Genomic Variants Predicted to Result in Allele-Specific, Alternative mRNA Splicing. Front Genet 2020; 11:109. [PMID: 32211018 PMCID: PMC7066660 DOI: 10.3389/fgene.2020.00109] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Accepted: 01/30/2020] [Indexed: 12/11/2022] Open
Abstract
Splice isoform structure and abundance can be affected by either noncoding or masquerading coding variants that alter the structure or abundance of transcripts. When these variants are common in the population, these nonconstitutive transcripts are sufficiently frequent so as to resemble naturally occurring, alternative mRNA splicing. Prediction of the effects of such variants has been shown to be accurate using information theory-based methods. Single nucleotide polymorphisms (SNPs) predicted to significantly alter natural and/or cryptic splice site strength were shown to affect gene expression. Splicing changes for known SNP genotypes were confirmed in HapMap lymphoblastoid cell lines with gene expression microarrays and custom designed q-RT-PCR or TaqMan assays. The majority of these SNPs (15 of 22) as well as an independent set of 24 variants were then subjected to RNAseq analysis using the ValidSpliceMut web beacon (http://validsplicemut.cytognomix.com), which is based on data from the Cancer Genome Atlas and International Cancer Genome Consortium. SNPs from different genes analyzed with gene expression microarray and q-RT-PCR exhibited significant changes in affected splice site use. Thirteen SNPs directly affected exon inclusion and 10 altered cryptic site use. Homozygous SNP genotypes resulting in stronger splice sites exhibited higher levels of processed mRNA than alleles associated with weaker sites. Four SNPs exhibited variable expression among individuals with the same genotypes, masking statistically significant expression differences between alleles. Genome-wide information theory and expression analyses (RNAseq) in tumor exomes and genomes confirmed splicing effects for 7 of the HapMap SNP and 14 SNPs identified from tumor genomes. q-RT-PCR resolved rare splice isoforms with read abundance too low for statistical significance in ValidSpliceMut. Nevertheless, the web-beacon provides evidence of unanticipated splicing outcomes, for example, intron retention due to compromised recognition of constitutive splice sites. Thus, ValidSpliceMut and q-RT-PCR represent complementary resources for identification of allele-specific, alternative splicing.
Collapse
Affiliation(s)
- Eliseos J Mucaki
- Department of Biochemistry, University of Western Ontario, London, ON, Canada
| | | | - Peter K Rogan
- Department of Biochemistry, University of Western Ontario, London, ON, Canada.,CytoGnomix, London, ON, Canada.,Department of Oncology University of Western Ontario, London, ON, Canada.,Department of Computer Science, University of Western Ontario, London, ON, Canada
| |
Collapse
|
3
|
Shirley BC, Mucaki EJ, Rogan PK. Pan-cancer repository of validated natural and cryptic mRNA splicing mutations. F1000Res 2019; 7:1908. [PMID: 31275557 DOI: 10.12688/f1000research.17204.1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/30/2018] [Indexed: 12/26/2022] Open
Abstract
We present a major public resource of mRNA splicing mutations validated according to multiple lines of evidence of abnormal gene expression. Likely mutations present in all tumor types reported in the Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC) were identified based on the comparative strengths of splice sites in tumor versus normal genomes, and then validated by respectively comparing counts of splice junction spanning and abundance of transcript reads in RNA-Seq data from matched tissues and tumors lacking these mutations. The comprehensive resource features 341,486 of these validated mutations, the majority of which (69.9%) are not present in the Single Nucleotide Polymorphism Database (dbSNP 150). There are 131,347 unique mutations which weaken or abolish natural splice sites, and 222,071 mutations which strengthen cryptic splice sites (11,932 affect both simultaneously). 28,812 novel or rare flagged variants (with <1% population frequency in dbSNP) were observed in multiple tumor tissue types. An algorithm was developed to classify variants into splicing molecular phenotypes that integrates germline heterozygosity, degree of information change and impact on expression. The classification thresholds were calibrated against the ClinVar clinical database phenotypic assignments. Variants are partitioned into allele-specific alternative splicing, likely aberrant and aberrant splicing phenotypes. Single variants or chromosome ranges can be queried using a Global Alliance for Genomics and Health (GA4GH)-compliant, web-based Beacon "Validated Splicing Mutations" either separately or in aggregate alongside other Beacons through the public Beacon Network, as well as through our website. The website provides additional information, such as a visual representation of supporting RNAseq results, gene expression in the corresponding normal tissues, and splicing molecular phenotypes.
Collapse
Affiliation(s)
| | - Eliseos J Mucaki
- Biochemistry, University of Western Ontario, London, Ontario, N6A 2C1, Canada
| | - Peter K Rogan
- CytoGnomix Inc., London, Ontario, N5X 3X5, Canada.,Biochemistry, University of Western Ontario, London, Ontario, N6A 2C1, Canada.,Computer Science, University of Western Ontario, London, Ontario, N6A 2C1, Canada.,Oncology, University of Western Ontario, London, Ontario, N6A 2C1, Canada
| |
Collapse
|
4
|
Wan A, Place E, Pierce EA, Comander J. Characterizing variants of unknown significance in rhodopsin: A functional genomics approach. Hum Mutat 2019; 40:1127-1144. [PMID: 30977563 PMCID: PMC7027811 DOI: 10.1002/humu.23762] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Revised: 03/31/2019] [Accepted: 04/08/2019] [Indexed: 01/19/2023]
Abstract
Characterizing the pathogenicity of DNA sequence variants of unknown significance (VUS) is a major bottleneck in human genetics, and is increasingly important in determining which patients with inherited retinal diseases could benefit from gene therapy. A library of 210 rhodopsin (RHO) variants from literature and in‐house genetic diagnostic testing were created to efficiently detect pathogenic RHO variants that fail to express on the cell surface. This study, while focused on RHO, demonstrates a streamlined, generalizable method for detecting pathogenic VUS. A relatively simple next‐generation sequencing‐based readout was developed so that a flow cytometry‐based assay could be performed simultaneously on all variants in a pooled format, without the need for barcodes or viral transduction. The resulting dataset characterized the surface expression of every RHO library variant with a high degree of reproducibility (r2 = 0.92–0.95), recategorizing 37 variants. For example, three retinitis pigmentosa pedigrees were solved by identifying VUS which showed low expression levels (p.G18D, p.G101V, and p.P180T). Results were validated across multiple assays and correlated with clinical disease severity. This study presents a parallelized, higher‐throughput cell‐based assay for the functional characterization of VUS in RHO, and can be applied more broadly to other inherited retinal disease genes and other disorders.
Collapse
Affiliation(s)
- Aliete Wan
- Department of Ophthalmology, Ocular Genomics Institute, Berman-Gund Laboratory for the Study of Retinal Degenerations, Massachusetts Eye and Ear, Harvard Medical School, Boston, Massachusetts
| | - Emily Place
- Department of Ophthalmology, Ocular Genomics Institute, Berman-Gund Laboratory for the Study of Retinal Degenerations, Massachusetts Eye and Ear, Harvard Medical School, Boston, Massachusetts
| | - Eric A Pierce
- Department of Ophthalmology, Ocular Genomics Institute, Berman-Gund Laboratory for the Study of Retinal Degenerations, Massachusetts Eye and Ear, Harvard Medical School, Boston, Massachusetts
| | - Jason Comander
- Department of Ophthalmology, Ocular Genomics Institute, Berman-Gund Laboratory for the Study of Retinal Degenerations, Massachusetts Eye and Ear, Harvard Medical School, Boston, Massachusetts
| |
Collapse
|
5
|
Shirley BC, Mucaki EJ, Rogan PK. Pan-cancer repository of validated natural and cryptic mRNA splicing mutations. F1000Res 2018; 7:1908. [PMID: 31275557 PMCID: PMC6544075 DOI: 10.12688/f1000research.17204.3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/27/2019] [Indexed: 11/20/2022] Open
Abstract
We present a major public resource of mRNA splicing mutations validated according to multiple lines of evidence of abnormal gene expression. Likely mutations present in all tumor types reported in the Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC) were identified based on the comparative strengths of splice sites in tumor versus normal genomes, and then validated by respectively comparing counts of splice junction spanning and abundance of transcript reads in RNA-Seq data from matched tissues and tumors lacking these mutations. The comprehensive resource features 341,486 of these validated mutations, the majority of which (69.9%) are not present in the Single Nucleotide Polymorphism Database (dbSNP 150). There are 131,347 unique mutations which weaken or abolish natural splice sites, and 222,071 mutations which strengthen cryptic splice sites (11,932 affect both simultaneously). 28,812 novel or rare flagged variants (with <1% population frequency in dbSNP) were observed in multiple tumor tissue types. An algorithm was developed to classify variants into splicing molecular phenotypes that integrates germline heterozygosity, degree of information change and impact on expression. The classification thresholds were calibrated against the ClinVar clinical database phenotypic assignments. Variants are partitioned into allele-specific alternative splicing, likely aberrant and aberrant splicing phenotypes. Single variants or chromosome ranges can be queried using a Global Alliance for Genomics and Health (GA4GH)-compliant, web-based Beacon "Validated Splicing Mutations" either separately or in aggregate alongside other Beacons through the public Beacon Network, as well as through our website. The website provides additional information, such as a visual representation of supporting RNAseq results, gene expression in the corresponding normal tissues, and splicing molecular phenotypes.
Collapse
Affiliation(s)
| | - Eliseos J Mucaki
- Biochemistry, University of Western Ontario, London, Ontario, N6A 2C1, Canada
| | - Peter K Rogan
- CytoGnomix Inc., London, Ontario, N5X 3X5, Canada.,Biochemistry, University of Western Ontario, London, Ontario, N6A 2C1, Canada.,Computer Science, University of Western Ontario, London, Ontario, N6A 2C1, Canada.,Oncology, University of Western Ontario, London, Ontario, N6A 2C1, Canada
| |
Collapse
|
6
|
Mucaki EJ, Caminsky NG, Perri AM, Lu R, Laederach A, Halvorsen M, Knoll JHM, Rogan PK. A unified analytic framework for prioritization of non-coding variants of uncertain significance in heritable breast and ovarian cancer. BMC Med Genomics 2016; 9:19. [PMID: 27067391 PMCID: PMC4828881 DOI: 10.1186/s12920-016-0178-5] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2015] [Accepted: 03/15/2016] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Sequencing of both healthy and disease singletons yields many novel and low frequency variants of uncertain significance (VUS). Complete gene and genome sequencing by next generation sequencing (NGS) significantly increases the number of VUS detected. While prior studies have emphasized protein coding variants, non-coding sequence variants have also been proven to significantly contribute to high penetrance disorders, such as hereditary breast and ovarian cancer (HBOC). We present a strategy for analyzing different functional classes of non-coding variants based on information theory (IT) and prioritizing patients with large intragenic deletions. METHODS We captured and enriched for coding and non-coding variants in genes known to harbor mutations that increase HBOC risk. Custom oligonucleotide baits spanning the complete coding, non-coding, and intergenic regions 10 kb up- and downstream of ATM, BRCA1, BRCA2, CDH1, CHEK2, PALB2, and TP53 were synthesized for solution hybridization enrichment. Unique and divergent repetitive sequences were sequenced in 102 high-risk, anonymized patients without identified mutations in BRCA1/2. Aside from protein coding and copy number changes, IT-based sequence analysis was used to identify and prioritize pathogenic non-coding variants that occurred within sequence elements predicted to be recognized by proteins or protein complexes involved in mRNA splicing, transcription, and untranslated region (UTR) binding and structure. This approach was supplemented by in silico and laboratory analysis of UTR structure. RESULTS 15,311 unique variants were identified, of which 245 occurred in coding regions. With the unified IT-framework, 132 variants were identified and 87 functionally significant VUS were further prioritized. An intragenic 32.1 kb interval in BRCA2 that was likely hemizygous was detected in one patient. We also identified 4 stop-gain variants and 3 reading-frame altering exonic insertions/deletions (indels). CONCLUSIONS We have presented a strategy for complete gene sequence analysis followed by a unified framework for interpreting non-coding variants that may affect gene expression. This approach distills large numbers of variants detected by NGS to a limited set of variants prioritized as potential deleterious changes.
Collapse
Affiliation(s)
- Eliseos J Mucaki
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, ON, N6A 2C1, Canada
| | - Natasha G Caminsky
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, ON, N6A 2C1, Canada
| | - Ami M Perri
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, ON, N6A 2C1, Canada
| | - Ruipeng Lu
- Department of Computer Science, Faculty of Science, Western University, London, N6A 2C1, Canada
| | - Alain Laederach
- Department of Biology, University of North Carolina, Chapel Hill, NC, 27599-3290, USA
| | - Matthew Halvorsen
- Institute for Genomic Medicine, Columbia University Medical Center, New York, NY, 10032, USA
| | - Joan H M Knoll
- Department of Pathology and Laboratory Medicine, Schulich School of Medicine and Dentistry, Western University, London, N6A 2C1, Canada
- Cytognomix Inc., London, Canada
| | - Peter K Rogan
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, ON, N6A 2C1, Canada.
- Department of Computer Science, Faculty of Science, Western University, London, N6A 2C1, Canada.
- Cytognomix Inc., London, Canada.
- Department of Oncology, Schulich School of Medicine and Dentistry, Western University, London, N6A 2C1, Canada.
| |
Collapse
|
7
|
Caminsky NG, Mucaki EJ, Perri AM, Lu R, Knoll JHM, Rogan PK. Prioritizing Variants in Complete Hereditary Breast and Ovarian Cancer Genes in Patients Lacking Known BRCA Mutations. Hum Mutat 2016; 37:640-52. [PMID: 26898890 DOI: 10.1002/humu.22972] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2015] [Revised: 01/22/2016] [Accepted: 02/16/2016] [Indexed: 12/11/2022]
Abstract
BRCA1 and BRCA2 testing for hereditary breast and ovarian cancer (HBOC) does not identify all pathogenic variants. Sequencing of 20 complete genes in HBOC patients with uninformative test results (N = 287), including noncoding and flanking sequences of ATM, BARD1, BRCA1, BRCA2, CDH1, CHEK2, EPCAM, MLH1, MRE11A, MSH2, MSH6, MUTYH, NBN, PALB2, PMS2, PTEN, RAD51B, STK11, TP53, and XRCC2, identified 38,372 unique variants. We apply information theory (IT) to predict and prioritize noncoding variants of uncertain significance in regulatory, coding, and intronic regions based on changes in binding sites in these genes. Besides mRNA splicing, IT provides a common framework to evaluate potential affinity changes in transcription factor (TFBSs), splicing regulatory (SRBSs), and RNA-binding protein (RBBSs) binding sites following mutation. We prioritized variants affecting the strengths of 10 splice sites (four natural, six cryptic), 148 SRBS, 36 TFBS, and 31 RBBS. Three variants were also prioritized based on their predicted effects on mRNA secondary (2°) structure and 17 for pseudoexon activation. Additionally, four frameshift, two in-frame deletions, and five stop-gain mutations were identified. When combined with pedigree information, complete gene sequence analysis can focus attention on a limited set of variants in a wide spectrum of functional mutation types for downstream functional and co-segregation analysis.
Collapse
Affiliation(s)
- Natasha G Caminsky
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| | - Eliseos J Mucaki
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| | - Ami M Perri
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| | - Ruipeng Lu
- Department of Computer Science, Faculty of Science, Western University, London, Ontario, Canada
| | - Joan H M Knoll
- Department of Pathology and Laboratory Medicine, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada.,Cytognomix Inc, London, Ontario, Canada
| | - Peter K Rogan
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada.,Department of Computer Science, Faculty of Science, Western University, London, Ontario, Canada.,Cytognomix Inc, London, Ontario, Canada.,Department of Oncology, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| |
Collapse
|
8
|
Dorman SN, Baranova K, Knoll JHM, Urquhart BL, Mariani G, Carcangiu ML, Rogan PK. Genomic signatures for paclitaxel and gemcitabine resistance in breast cancer derived by machine learning. Mol Oncol 2015; 10:85-100. [PMID: 26372358 DOI: 10.1016/j.molonc.2015.07.006] [Citation(s) in RCA: 87] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2015] [Accepted: 07/31/2015] [Indexed: 12/21/2022] Open
Abstract
Increasingly, the effectiveness of adjuvant chemotherapy agents for breast cancer has been related to changes in the genomic profile of tumors. We investigated correspondence between growth inhibitory concentrations of paclitaxel and gemcitabine (GI50) and gene copy number, mutation, and expression first in breast cancer cell lines and then in patients. Genes encoding direct targets of these drugs, metabolizing enzymes, transporters, and those previously associated with chemoresistance to paclitaxel (n = 31 genes) or gemcitabine (n = 18) were analyzed. A multi-factorial, principal component analysis (MFA) indicated expression was the strongest indicator of sensitivity for paclitaxel, and copy number and expression were informative for gemcitabine. The factors were combined using support vector machines (SVM). Expression of 15 genes (ABCC10, BCL2, BCL2L1, BIRC5, BMF, FGF2, FN1, MAP4, MAPT, NFKB2, SLCO1B3, TLR6, TMEM243, TWIST1, and CSAG2) predicted cell line sensitivity to paclitaxel with 82% accuracy. Copy number profiles of 3 genes (ABCC10, NT5C, TYMS) together with expression of 7 genes (ABCB1, ABCC10, CMPK1, DCTD, NME1, RRM1, RRM2B), predicted gemcitabine response with 85% accuracy. Expression and copy number studies of two independent sets of patients with known responses were then analyzed with these models. These included tumor blocks from 21 patients that were treated with both paclitaxel and gemcitabine, and 319 patients on paclitaxel and anthracycline therapy. A new paclitaxel SVM was derived from an 11-gene subset since data for 4 of the original genes was unavailable. The accuracy of this SVM was similar in cell lines and tumor blocks (70-71%). The gemcitabine SVM exhibited 62% prediction accuracy for the tumor blocks due to the presence of samples with poor nucleic acid integrity. Nevertheless, the paclitaxel SVM predicted sensitivity in 84% of patients with no or minimal residual disease.
Collapse
Affiliation(s)
- Stephanie N Dorman
- Department of Biochemistry, Schulich School of Medicine and Dentistry, University of Western Ontario, London, ON, Canada
| | - Katherina Baranova
- Department of Biochemistry, Schulich School of Medicine and Dentistry, University of Western Ontario, London, ON, Canada
| | - Joan H M Knoll
- Department of Pathology and Laboratory Medicine, Schulich School of Medicine and Dentistry, University of Western Ontario, London, ON, Canada; Molecular Diagnostics Division, Laboratory Medicine Program, London Health Sciences Centre, ON, Canada; Cytognomix Inc., London, ON, Canada
| | - Brad L Urquhart
- Department of Physiology and Pharmacology, Schulich School of Medicine and Dentistry, University of Western Ontario, London, ON, Canada
| | - Gabriella Mariani
- Department of Medical Oncology, Fondazione IRCCS Istituto Nazionale dei Tumori, Milan, Italy
| | - Maria Luisa Carcangiu
- Department of Diagnostic and Laboratory Pathology, Fondazione IRCCS Istituto Nazionale dei Tumori, Milan, Italy
| | - Peter K Rogan
- Department of Biochemistry, Schulich School of Medicine and Dentistry, University of Western Ontario, London, ON, Canada; Cytognomix Inc., London, ON, Canada; Department of Computer Science, University of Western Ontario, London, ON, Canada; Department of Oncology, Schulich School of Medicine and Dentistry, University of Western Ontario, London, ON, Canada.
| |
Collapse
|
9
|
Caminsky NG, Mucaki EJ, Rogan PK. Interpretation of mRNA splicing mutations in genetic disease: review of the literature and guidelines for information-theoretical analysis. F1000Res 2015. [DOI: 10.12688/f1000research.5654.2] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
The interpretation of genomic variants has become one of the paramount challenges in the post-genome sequencing era. In this review we summarize nearly 20 years of research on the applications of information theory (IT) to interpret coding and non-coding mutations that alter mRNA splicing in rare and common diseases. We compile and summarize the spectrum of published variants analyzed by IT, to provide a broad perspective of the distribution of deleterious natural and cryptic splice site variants detected, as well as those affecting splicing regulatory sequences. Results for natural splice site mutations can be interrogated dynamically with Splicing Mutation Calculator, a companion software program that computes changes in information content for any splice site substitution, linked to corresponding publications containing these mutations. The accuracy of IT-based analysis was assessed in the context of experimentally validated mutations. Because splice site information quantifies binding affinity, IT-based analyses can discern the differences between variants that account for the observed reduced (leaky) versus abolished mRNA splicing. We extend this principle by comparing predicted mutations in natural, cryptic, and regulatory splice sites with observed deleterious phenotypic and benign effects. Our analysis of 1727 variants revealed a number of general principles useful for ensuring portability of these analyses and accurate input and interpretation of mutations. We offer guidelines for optimal use of IT software for interpretation of mRNA splicing mutations.
Collapse
|
10
|
Jian X, Boerwinkle E, Liu X. In silico prediction of splice-altering single nucleotide variants in the human genome. Nucleic Acids Res 2014; 42:13534-44. [PMID: 25416802 PMCID: PMC4267638 DOI: 10.1093/nar/gku1206] [Citation(s) in RCA: 338] [Impact Index Per Article: 33.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2014] [Revised: 10/12/2014] [Accepted: 11/04/2014] [Indexed: 01/17/2023] Open
Abstract
In silico tools have been developed to predict variants that may have an impact on pre-mRNA splicing. The major limitation of the application of these tools to basic research and clinical practice is the difficulty in interpreting the output. Most tools only predict potential splice sites given a DNA sequence without measuring splicing signal changes caused by a variant. Another limitation is the lack of large-scale evaluation studies of these tools. We compared eight in silico tools on 2959 single nucleotide variants within splicing consensus regions (scSNVs) using receiver operating characteristic analysis. The Position Weight Matrix model and MaxEntScan outperformed other methods. Two ensemble learning methods, adaptive boosting and random forests, were used to construct models that take advantage of individual methods. Both models further improved prediction, with outputs of directly interpretable prediction scores. We applied our ensemble scores to scSNVs from the Catalogue of Somatic Mutations in Cancer database. Analysis showed that predicted splice-altering scSNVs are enriched in recurrent scSNVs and known cancer genes. We pre-computed our ensemble scores for all potential scSNVs across the human genome, providing a whole genome level resource for identifying splice-altering scSNVs discovered from large-scale sequencing studies.
Collapse
Affiliation(s)
- Xueqiu Jian
- Division of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Center for Human Genetics, The Brown Foundation Institute of Molecular Medicine for the Prevention of Human Diseases, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Eric Boerwinkle
- Division of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Center for Human Genetics, The Brown Foundation Institute of Molecular Medicine for the Prevention of Human Diseases, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Xiaoming Liu
- Division of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| |
Collapse
|
11
|
Mudvari P, Movassagh M, Kowsari K, Seyfi A, Kokkinaki M, Edwards NJ, Golestaneh N, Horvath A. SNPlice: variants that modulate Intron retention from RNA-sequencing data. ACTA ACUST UNITED AC 2014; 31:1191-8. [PMID: 25481010 DOI: 10.1093/bioinformatics/btu804] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2014] [Accepted: 11/30/2014] [Indexed: 12/22/2022]
Abstract
RATIONALE The growing recognition of the importance of splicing, together with rapidly accumulating RNA-sequencing data, demand robust high-throughput approaches, which efficiently analyze experimentally derived whole-transcriptome splice profiles. RESULTS We have developed a computational approach, called SNPlice, for identifying cis-acting, splice-modulating variants from RNA-seq datasets. SNPlice mines RNA-seq datasets to find reads that span single-nucleotide variant (SNV) loci and nearby splice junctions, assessing the co-occurrence of variants and molecules that remain unspliced at nearby exon-intron boundaries. Hence, SNPlice highlights variants preferentially occurring on intron-containing molecules, possibly resulting from altered splicing. To illustrate co-occurrence of variant nucleotide and exon-intron boundary, allele-specific sequencing was used. SNPlice results are generally consistent with splice-prediction tools, but also indicate splice-modulating elements missed by other algorithms. SNPlice can be applied to identify variants that correlate with unexpected splicing events, and to measure the splice-modulating potential of canonical splice-site SNVs. AVAILABILITY AND IMPLEMENTATION SNPlice is freely available for download from https://code.google.com/p/snplice/ as a self-contained binary package for 64-bit Linux computers and as python source-code. CONTACT pmudvari@gwu.edu or horvatha@gwu.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Prakriti Mudvari
- McCormick Genomics and Proteomics Center, Department of Biochemistry and Molecular Medicine and Department of Pharmacology and Physiology, The George Washington University, Washington, DC 20037, USA and Department of Ophthalmology, Department of Neurology and Department of Biochemistry and Molecular & Cellular Biology, Georgetown University, School of Medicine, Washington, DC 20057, USA McCormick Genomics and Proteomics Center, Department of Biochemistry and Molecular Medicine and Department of Pharmacology and Physiology, The George Washington University, Washington, DC 20037, USA and Department of Ophthalmology, Department of Neurology and Department of Biochemistry and Molecular & Cellular Biology, Georgetown University, School of Medicine, Washington, DC 20057, USA
| | - Mercedeh Movassagh
- McCormick Genomics and Proteomics Center, Department of Biochemistry and Molecular Medicine and Department of Pharmacology and Physiology, The George Washington University, Washington, DC 20037, USA and Department of Ophthalmology, Department of Neurology and Department of Biochemistry and Molecular & Cellular Biology, Georgetown University, School of Medicine, Washington, DC 20057, USA McCormick Genomics and Proteomics Center, Department of Biochemistry and Molecular Medicine and Department of Pharmacology and Physiology, The George Washington University, Washington, DC 20037, USA and Department of Ophthalmology, Department of Neurology and Department of Biochemistry and Molecular & Cellular Biology, Georgetown University, School of Medicine, Washington, DC 20057, USA
| | - Kamran Kowsari
- McCormick Genomics and Proteomics Center, Department of Biochemistry and Molecular Medicine and Department of Pharmacology and Physiology, The George Washington University, Washington, DC 20037, USA and Department of Ophthalmology, Department of Neurology and Department of Biochemistry and Molecular & Cellular Biology, Georgetown University, School of Medicine, Washington, DC 20057, USA McCormick Genomics and Proteomics Center, Department of Biochemistry and Molecular Medicine and Department of Pharmacology and Physiology, The George Washington University, Washington, DC 20037, USA and Department of Ophthalmology, Department of Neurology and Department of Biochemistry and Molecular & Cellular Biology, Georgetown University, School of Medicine, Washington, DC 20057, USA
| | - Ali Seyfi
- McCormick Genomics and Proteomics Center, Department of Biochemistry and Molecular Medicine and Department of Pharmacology and Physiology, The George Washington University, Washington, DC 20037, USA and Department of Ophthalmology, Department of Neurology and Department of Biochemistry and Molecular & Cellular Biology, Georgetown University, School of Medicine, Washington, DC 20057, USA McCormick Genomics and Proteomics Center, Department of Biochemistry and Molecular Medicine and Department of Pharmacology and Physiology, The George Washington University, Washington, DC 20037, USA and Department of Ophthalmology, Department of Neurology and Department of Biochemistry and Molecular & Cellular Biology, Georgetown University, School of Medicine, Washington, DC 20057, USA
| | - Maria Kokkinaki
- McCormick Genomics and Proteomics Center, Department of Biochemistry and Molecular Medicine and Department of Pharmacology and Physiology, The George Washington University, Washington, DC 20037, USA and Department of Ophthalmology, Department of Neurology and Department of Biochemistry and Molecular & Cellular Biology, Georgetown University, School of Medicine, Washington, DC 20057, USA
| | - Nathan J Edwards
- McCormick Genomics and Proteomics Center, Department of Biochemistry and Molecular Medicine and Department of Pharmacology and Physiology, The George Washington University, Washington, DC 20037, USA and Department of Ophthalmology, Department of Neurology and Department of Biochemistry and Molecular & Cellular Biology, Georgetown University, School of Medicine, Washington, DC 20057, USA
| | - Nady Golestaneh
- McCormick Genomics and Proteomics Center, Department of Biochemistry and Molecular Medicine and Department of Pharmacology and Physiology, The George Washington University, Washington, DC 20037, USA and Department of Ophthalmology, Department of Neurology and Department of Biochemistry and Molecular & Cellular Biology, Georgetown University, School of Medicine, Washington, DC 20057, USA McCormick Genomics and Proteomics Center, Department of Biochemistry and Molecular Medicine and Department of Pharmacology and Physiology, The George Washington University, Washington, DC 20037, USA and Department of Ophthalmology, Department of Neurology and Department of Biochemistry and Molecular & Cellular Biology, Georgetown University, School of Medicine, Washington, DC 20057, USA McCormick Genomics and Proteomics Center, Department of Biochemistry and Molecular Medicine and Department of Pharmacology and Physiology, The George Washington University, Washington, DC 20037, USA and Department of Ophthalmology, Department of Neurology and Department of Biochemistry and Molecular & Cellular Biology, Georgetown University, School of Medicine, Washington, DC 20057, USA
| | - Anelia Horvath
- McCormick Genomics and Proteomics Center, Department of Biochemistry and Molecular Medicine and Department of Pharmacology and Physiology, The George Washington University, Washington, DC 20037, USA and Department of Ophthalmology, Department of Neurology and Department of Biochemistry and Molecular & Cellular Biology, Georgetown University, School of Medicine, Washington, DC 20057, USA McCormick Genomics and Proteomics Center, Department of Biochemistry and Molecular Medicine and Department of Pharmacology and Physiology, The George Washington University, Washington, DC 20037, USA and Department of Ophthalmology, Department of Neurology and Department of Biochemistry and Molecular & Cellular Biology, Georgetown University, School of Medicine, Washington, DC 20057, USA
| |
Collapse
|
12
|
Caminsky N, Mucaki EJ, Rogan PK. Interpretation of mRNA splicing mutations in genetic disease: review of the literature and guidelines for information-theoretical analysis. F1000Res 2014; 3:282. [PMID: 25717368 PMCID: PMC4329672 DOI: 10.12688/f1000research.5654.1] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/10/2014] [Indexed: 12/14/2022] Open
Abstract
The interpretation of genomic variants has become one of the paramount challenges in the post-genome sequencing era. In this review we summarize nearly 20 years of research on the applications of information theory (IT) to interpret coding and non-coding mutations that alter mRNA splicing in rare and common diseases. We compile and summarize the spectrum of published variants analyzed by IT, to provide a broad perspective of the distribution of deleterious natural and cryptic splice site variants detected, as well as those affecting splicing regulatory sequences. Results for natural splice site mutations can be interrogated dynamically with Splicing Mutation Calculator, a companion software program that computes changes in information content for any splice site substitution, linked to corresponding publications containing these mutations. The accuracy of IT-based analysis was assessed in the context of experimentally validated mutations. Because splice site information quantifies binding affinity, IT-based analyses can discern the differences between variants that account for the observed reduced (leaky) versus abolished mRNA splicing. We extend this principle by comparing predicted mutations in natural, cryptic, and regulatory splice sites with observed deleterious phenotypic and benign effects. Our analysis of 1727 variants revealed a number of general principles useful for ensuring portability of these analyses and accurate input and interpretation of mutations. We offer guidelines for optimal use of IT software for interpretation of mRNA splicing mutations.
Collapse
Affiliation(s)
- Natasha Caminsky
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, ON, N6A 2C1, Canada
| | - Eliseos J Mucaki
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, ON, N6A 2C1, Canada
| | - Peter K Rogan
- Departments of Biochemistry and Computer Science, Western University, London, ON, N6A 2C1, Canada
| |
Collapse
|
13
|
Splicing mutation analysis reveals previously unrecognized pathways in lymph node-invasive breast cancer. Sci Rep 2014; 4:7063. [PMID: 25394353 PMCID: PMC4231324 DOI: 10.1038/srep07063] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2014] [Accepted: 10/29/2014] [Indexed: 12/22/2022] Open
Abstract
Somatic mutations reported in large-scale breast cancer (BC) sequencing studies primarily consist of protein coding mutations. mRNA splicing mutation analyses have been limited in scope, despite their prevalence in Mendelian genetic disorders. We predicted splicing mutations in 442 BC tumour and matched normal exomes from The Cancer Genome Atlas Consortium (TCGA). These splicing defects were validated by abnormal expression changes in these tumours. Of the 5,206 putative mutations identified, exon skipping, leaky or cryptic splicing was confirmed for 988 variants. Pathway enrichment analysis of the mutated genes revealed mutations in 9 NCAM1-related pathways, which were significantly increased in samples with evidence of lymph node metastasis, but not in lymph node-negative tumours. We suggest that comprehensive reporting of DNA sequencing data should include non-trivial splicing analyses to avoid missing clinically-significant deleterious splicing mutations, which may reveal novel mutated pathways present in genetic disorders.
Collapse
|