1
|
Vasquez YA, Pfeil J, Mueller L, Beale H, Lyle AG, Sanders L, Learned K, Kephart E, van den Bout A, Cheney A, Hosseinzadeh S, Bjork I, Salama SR, Vaske O. Abstract 3035: Identifying potential druggable targets for synovial sarcoma using comparative RNA-seq analysis. Cancer Res 2021. [DOI: 10.1158/1538-7445.am2021-3035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Synovial sarcoma (SS) is an aggressive soft-tissue malignancy, accounting for 10% of all soft-tissue sarcomas. These tumors can occur at any age but most often affect young adults and adolescents, developing deep in the distal extremities. The prognosis of SS tumors is poor, with a 5-year survival rate of 36-76%, a high rate of metastasis and few treatments. The purpose of this study was to identify novel overexpressed oncogenes that could serve as druggable targets for treating synovial sarcoma patients. We compared the RNA-Seq expression profiles of a cohort of 36 synovial sarcomas to our compendium of RNA-Seq expression data from 12,236 tumor samples (treehousegenomics.ucsc.edu) from pediatric and adult cancer patients. In comparing gene expression in the synovial sarcoma cohort samples against the compendium samples, gene expression outliers were defined as having expression above the gene-specific outlier threshold as defined by the Tukey's outlier method. Among the overexpression outliers, pathway enrichment analysis was used to identify common and druggable pathways, with implications for potential therapeutics for patients with SS. Our analysis identified the overexpression of members of the Sonic Hedgehog pathway in the majority of synovial sarcoma samples. For example, GLI1 expression exceeded the outlier threshold in 35 out of 36 samples. This pathway can be targeted by available small molecule inhibitors. Ongoing work focuses on evaluating the role of Sonic Hedgehog signaling in the pathogenesis of SS using pharmacological inhibition, CRISPRi studies in cell line models of the disease and nanopore sequencing. We currently have 4 patient-derived synovial sarcoma cell lines (HSSY-II, SYO-1, YAMATO, and ASKA) that we can grow in both adherent conditions and in 3D cell culture as sarcospheres. We detected the expression of the SYT-SSX fusion transcript in each of the cell lines by RT-PCR to confirm the cell lines maintained expression of the pathogenic fusion. This work has implications for using comparative tumor RNA-seq derived gene expression data for nominating novel druggable targets specific to synovial sarcoma tumors.
Citation Format: Yvonne A. Vasquez, Jacob Pfeil, Letitia Mueller, Holly Beale, Alfred G. Lyle, Lauren Sanders, Katrina Learned, Ellen Kephart, Anouk van den Bout, Allison Cheney, Sahar Hosseinzadeh, Isabel Bjork, Sofie R. Salama, Olena Vaske. Identifying potential druggable targets for synovial sarcoma using comparative RNA-seq analysis [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2021; 2021 Apr 10-15 and May 17-21. Philadelphia (PA): AACR; Cancer Res 2021;81(13_Suppl):Abstract nr 3035.
Collapse
Affiliation(s)
| | - Jacob Pfeil
- University of California Santa Cruz, Santa Cruz, CA
| | | | - Holly Beale
- University of California Santa Cruz, Santa Cruz, CA
| | | | | | | | | | | | | | | | - Isabel Bjork
- University of California Santa Cruz, Santa Cruz, CA
| | | | - Olena Vaske
- University of California Santa Cruz, Santa Cruz, CA
| |
Collapse
|
2
|
Anastopoulos I, Beale H, Lyle G, Cheney A, Vaske OM, Stuart JM. Abstract 2287: Detection of RNA-Seq library preparation type via random forest. Cancer Res 2021. [DOI: 10.1158/1538-7445.am2021-2287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Library preparation protocols for RNA sequencing (RNA-Seq) libraries vary and affect the downstream analysis of RNA-Seq data. Available RNA sequencing datasets often lack library construction protocol information in the metadata. This information is necessary to be able to compare RNA sequencing datasets appropriately. For example, non-polyadenylated transcripts are measured when a ribosomal RNA-depletion method is used (riboD library preparation), but not when transcripts are selected by poly(A) tails (polyA library preparation). Without knowing the library preparation method, it would appear that the non-polyadenylated transcripts were expressed at much higher levels in the samples prepared via riboD. In order to tackle this issue, we have developed a Random Forest classifier that can delineate between riboD and polyA RNA-Seq datasets. A grid search method was applied to the number of trees, maximum depth, and the minimum datasets included in each leaf node, which determined the best parameters to be 100, 8, and 1 respectively. However, applying the model on the Pediatric Brain Tumor Atlas (PBTA) showed strong overfitting for polyA samples. We examined the performance of the model investigating each maximum depth increment from 1 to 8, and we determined that the best performance on our validation set was achieved with maximum depth of 1. We subsequently proceeded to train our classifier on our own curated compendiums of pediatric cancer polyA and riboD samples including 188 and 264 samples respectively, after balancing the two datasets in terms of disease prevalence, and selecting the top 5,000 most variable genes as the default input dimensionality of the model. For samples whose genes do not exactly match the predetermined 5,000 genes, we substitute the expression of the missing gene with the mean expression of the gene observed in the training data. We show that it achieves 100% classification accuracy of samples to their respective library preparation protocols in GTEX (all polyA), CCLE (all polyA), and for 7 of 9 SRA projects. Five of the SRA projects contained only riboD datasets, 3 contained only all polyA datasets, and one was 50% riboD, and 50% polyA. Notably the SRA datasets are not all cancer related datasets, showing the power of our model to distinguish between library preparation protocols in vastly different settings. The model will become available on Docker so that it is readily, and easily accessible for application on new samples. Our model serves as an important step towards robust library preparation identification. Including samples in the training procedure from diverse contexts would make our model more widely applicable. Need summary statement here and/or future work statement
Citation Format: Ioannis Anastopoulos, Holly Beale, Geoff Lyle, Allison Cheney, Olena M. Vaske, Joshua M. Stuart. Detection of RNA-Seq library preparation type via random forest [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2021; 2021 Apr 10-15 and May 17-21. Philadelphia (PA): AACR; Cancer Res 2021;81(13_Suppl):Abstract nr 2287.
Collapse
Affiliation(s)
| | - Holly Beale
- University of California Santa Cruz, Santa Cruz, CA
| | - Geoff Lyle
- University of California Santa Cruz, Santa Cruz, CA
| | | | | | | |
Collapse
|
3
|
Hundley K, Vaske O, Lyle G, Kephart E, Learned K, Beale H, Wardell C, De Loose A, Day JD, Rodriguez A. Comparative Transcriptomics to Identify Targeted Therapy Candidates in High Grade Glioma. Neurosurgery 2020. [DOI: 10.1093/neuros/nyaa447_871] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
4
|
Hundley K, Vaske O, Lyle G, Learned K, Beale H, Kephart E, De Loose A, Lee M, Day JD, Rodriguez A. EPCO-23. COMPARATIVE TRANSCRIPTOMICS TO IDENTIFY TARGETED THERAPY CANDIDATES IN HIGH GRADE GLIOMA. Neuro Oncol 2020. [DOI: 10.1093/neuonc/noaa215.302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Abstract
Genomic characterization is often used for the identification of therapeutic targets in tumors. Recently, comparative transcriptomics has begun to be utilized for this purpose. In this pilot, we compare the transcriptome of a patient with recurrent high grade glioma (HGG) to our cohort to identify potential therapies. We reviewed transcriptomic profiles from patients who had resection of HGG at our institution over the past year as well as the UCSC cancer compendium. Briefly, tumor RNA was extracted from embedded tumor tissue sections with tumor cellularity higher than 20%. RNA libraries were sequenced to obtain approximately 65 million reads on an Illumina HiSeq 4000 System utilizing patterned flow cell technology. The RNA profile of a 24 male with Li-Fraumeni syndrome and recurrent HGG with leptomeningeal spread underwent comparative transcriptomics to identify targets. A Bayesian statistical framework for gene expression outlier detection was used. These comparisons allowed for the identification of genes and pathways that are significantly overexpressed. Our internal HGG cohort consisted of 44 adult patients and was evenly distributed among the 4 HGG Verhaak subtypes. Our patient of interest had druggable outlier expression in HDAC1, STAT1 and STAT2 in comparison to our internal cohort indicating vorinostat and ruxolitinib as potential therapies, respectively. We then compared our patient of interest to 12,747 patients in the cancer compendium and STAT2 expression was high but not an outlier. In comparison to 738 glioma samples, STAT1 and STAT2 were outliers but not HDAC1 again indicating ruxolitinib as a potential targeted therapy. The patient did not have outlier expression in notch transcriptional targets or immune checkpoint biomarkers when compared to all cohorts. In conclusion, comparative Transcriptomics can identify therapeutic targets in a patient with recurrent HGG even in small cohorts. In our pilot, we identified ruxolitinib as a potential candidate to treat leptomeningeal recurrence.
Collapse
Affiliation(s)
- Kelsey Hundley
- University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Olena Vaske
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Geoff Lyle
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | - Holly Beale
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Ellen Kephart
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Annick De Loose
- University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Madison Lee
- University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - J D Day
- University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Analiz Rodriguez
- University of Arkansas for Medical Sciences, Little Rock, AR, USA
| |
Collapse
|
5
|
Vaske OM, Bjork I, Salama SR, Beale H, Tayi Shah A, Sanders L, Pfeil J, Lam DL, Learned K, Durbin A, Kephart ET, Currie R, Newton Y, Swatloski T, McColl D, Vivian J, Zhu J, Lee AG, Leung SG, Spillinger A, Liu HY, Liang WS, Byron SA, Berens ME, Resnick AC, Lacayo N, Spunt SL, Rangaswami A, Huynh V, Torno L, Plant A, Kirov I, Zabokrtsky KB, Rassekh SR, Deyell RJ, Laskin J, Marra MA, Sender LS, Mueller S, Sweet-Cordero EA, Goldstein TC, Haussler D. Comparative Tumor RNA Sequencing Analysis for Difficult-to-Treat Pediatric and Young Adult Patients With Cancer. JAMA Netw Open 2019; 2:e1913968. [PMID: 31651965 PMCID: PMC6822083 DOI: 10.1001/jamanetworkopen.2019.13968] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
IMPORTANCE Pediatric cancers are epigenetic diseases; therefore, considering tumor gene expression information is necessary for a complete understanding of the tumorigenic processes. OBJECTIVE To evaluate the feasibility and utility of incorporating comparative gene expression information into the precision medicine framework for difficult-to-treat pediatric and young adult patients with cancer. DESIGN, SETTING, AND PARTICIPANTS This cohort study was conducted as a consortium between the University of California, Santa Cruz (UCSC) Treehouse Childhood Cancer Initiative and clinical genomic trials. RNA sequencing (RNA-Seq) data were obtained from the following 4 clinical sites and analyzed at UCSC: British Columbia Children's Hospital (n = 31), Lucile Packard Children's Hospital at Stanford University (n = 80), CHOC Children's Hospital and Hyundai Cancer Institute (n = 46), and the Pacific Pediatric Neuro-Oncology Consortium (n = 24). The study dates were January 1, 2016, to March 22, 2017. EXPOSURES Participants underwent tumor RNA-Seq profiling as part of 4 separate clinical trials at partner hospitals. The UCSC either downloaded RNA-Seq data from a partner institution for analysis in the cloud or provided a Docker pipeline that performed the same analysis at a partner institution. The UCSC then compared each participant's tumor RNA-Seq profile with more than 11 000 uniformly analyzed tumor profiles from pediatric and young adult patients with cancer, downloaded from public data repositories. These comparisons were used to identify genes and pathways that are significantly overexpressed in each patient's tumor. Results of the UCSC analysis were presented to clinical partners. MAIN OUTCOMES AND MEASURES Feasibility of a third-party institution (UCSC Treehouse Childhood Cancer Initiative) to obtain tumor RNA-Seq data from patients, conduct comparative analysis, and present analysis results to clinicians; and proportion of patients for whom comparative tumor gene expression analysis provided useful clinical and biological information. RESULTS Among 144 samples from children and young adults (median age at diagnosis, 9 years; range, 0-26 years; 72 of 118 [61.0%] male [26 patients sex unknown]) with a relapsed, refractory, or rare cancer treated on precision medicine protocols, RNA-Seq-derived gene expression was potentially useful for 99 of 144 samples (68.8%) compared with DNA mutation information that was potentially useful for only 34 of 74 samples (45.9%). CONCLUSIONS AND RELEVANCE This study's findings suggest that tumor RNA-Seq comparisons may be feasible and highlight the potential clinical utility of incorporating such comparisons into the clinical genomic interpretation framework for difficult-to-treat pediatric and young adult patients with cancer. The study also highlights for the first time to date the potential clinical utility of harmonized publicly available genomic data sets.
Collapse
Affiliation(s)
- Olena M. Vaske
- Department of Molecular, Cell, and Developmental Biology, University of California, Santa Cruz
- University of California, Santa Cruz Genomics Institute, Santa Cruz
| | - Isabel Bjork
- University of California, Santa Cruz Genomics Institute, Santa Cruz
| | - Sofie R. Salama
- University of California, Santa Cruz Genomics Institute, Santa Cruz
- Howard Hughes Medical Institute, University of California, Santa Cruz
| | - Holly Beale
- Department of Molecular, Cell, and Developmental Biology, University of California, Santa Cruz
- University of California, Santa Cruz Genomics Institute, Santa Cruz
| | - Avanthi Tayi Shah
- Division of Hematology and Oncology, Department of Pediatrics, University of California, San Francisco
| | - Lauren Sanders
- University of California, Santa Cruz Genomics Institute, Santa Cruz
| | - Jacob Pfeil
- University of California, Santa Cruz Genomics Institute, Santa Cruz
| | - Du L. Lam
- University of California, Santa Cruz Genomics Institute, Santa Cruz
| | - Katrina Learned
- University of California, Santa Cruz Genomics Institute, Santa Cruz
| | - Ann Durbin
- University of California, Santa Cruz Genomics Institute, Santa Cruz
| | - Ellen T. Kephart
- University of California, Santa Cruz Genomics Institute, Santa Cruz
| | - Rob Currie
- University of California, Santa Cruz Genomics Institute, Santa Cruz
| | - Yulia Newton
- University of California, Santa Cruz Genomics Institute, Santa Cruz
| | - Teresa Swatloski
- University of California, Santa Cruz Genomics Institute, Santa Cruz
| | - Duncan McColl
- University of California, Santa Cruz Genomics Institute, Santa Cruz
| | - John Vivian
- University of California, Santa Cruz Genomics Institute, Santa Cruz
| | - Jingchun Zhu
- University of California, Santa Cruz Genomics Institute, Santa Cruz
| | - Alex G. Lee
- Division of Hematology and Oncology, Department of Pediatrics, University of California, San Francisco
| | - Stanley G. Leung
- Division of Hematology and Oncology, Department of Pediatrics, University of California, San Francisco
| | - Aviv Spillinger
- Division of Hematology and Oncology, Department of Pediatrics, University of California, San Francisco
| | - Heng-Yi Liu
- Division of Hematology and Oncology, Department of Pediatrics, University of California, San Francisco
| | - Winnie S. Liang
- Integrated Cancer Genomics Division, Translational Genomics Research Institute (TGen), Phoenix, Arizona
| | - Sara A. Byron
- Integrated Cancer Genomics Division, Translational Genomics Research Institute (TGen), Phoenix, Arizona
| | | | - Adam C. Resnick
- Center for Data Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania
| | - Norman Lacayo
- Stanford Cancer Institute, Stanford University School of Medicine, Stanford, California
| | - Sheri L. Spunt
- Stanford Cancer Institute, Stanford University School of Medicine, Stanford, California
| | - Arun Rangaswami
- Stanford Cancer Institute, Stanford University School of Medicine, Stanford, California
| | - Van Huynh
- CHOC Children’s Hospital, Hyundai Cancer Institute, Orange, California
| | - Lilibeth Torno
- CHOC Children’s Hospital, Hyundai Cancer Institute, Orange, California
| | - Ashley Plant
- CHOC Children’s Hospital, Hyundai Cancer Institute, Orange, California
| | - Ivan Kirov
- CHOC Children’s Hospital, Hyundai Cancer Institute, Orange, California
| | | | - S. Rod Rassekh
- British Columbia Children’s Hospital Research Institute, British Columbia Children’s Hospital, Vancouver, British Columbia, Canada
| | - Rebecca J. Deyell
- British Columbia Children’s Hospital Research Institute, British Columbia Children’s Hospital, Vancouver, British Columbia, Canada
| | | | - Marco A. Marra
- Canada’s Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Leonard S. Sender
- CHOC Children’s Hospital, Hyundai Cancer Institute, Orange, California
| | - Sabine Mueller
- Department of Neurology, University of California, San Francisco
- Department of Neurosurgery, University of California, San Francisco
- Department of Pediatrics, University of California, San Francisco
| | | | - Theodore C. Goldstein
- University of California, Santa Cruz Genomics Institute, Santa Cruz
- Now with Anthem, Inc, Palo Alto, California
| | - David Haussler
- University of California, Santa Cruz Genomics Institute, Santa Cruz
- Howard Hughes Medical Institute, University of California, Santa Cruz
| |
Collapse
|
6
|
Sanders L, Cheney A, Beale H, Kephart E, Bjork I, Pfeil JJ, Salama SR, Haussler D, Morozova O. Shared dysregulation of long non-coding RNA and developmental gene networks in histone H3 K27M gliomas and PF-A ependymomas. J Clin Oncol 2019. [DOI: 10.1200/jco.2019.37.15_suppl.e21523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
e21523 Background: Diffuse pediatric gliomas harboring a Histone-H3 K27M mutation are more aggressive than H3-wild type gliomas and demonstrate global hypomethylation at the K27 residue1. As a result, these tumors show global aberrant gene expression, resulting in a stem-like proliferative cell population2. Posterior fossa (PF) ependymomas, on the other hand, harbor few significantly recurrent somatic mutations, but PF-A and PF-B subgroups have been defined on the basis of epigenetic differences3. Compared to PF-B, the PF-A subgroup demonstrates H3K27 hypomethylation, aberrant gene expression and aggressive tumor growth4,5. Methods: We recently identified a set of long non-coding RNA (lncRNA) that are transiently expressed in early brain development6, and hypothesized that H3K27M gliomas and PF-A ependymomas may share methylation-related dysregulation of lncRNA networks responsible for maintaining normal differentiation programs. Results: Here we describe a network of regulatory lncRNA with increased expression in both H3K27M gliomas and PF-A ependymomas, as compared to H3-WT gliomas and PF-B ependymomas. We demonstrate that increased expression of this lncRNA network correlates with the over-expression of signaling pathways involved in maintaining a non-differentiated, proliferative phenotype and driving tumorigenesis. Conclusions: We hypothesize that in both H3K27M gliomas and PF-A ependymomas, aberrant global methylation may be driving lncRNA to activate and maintain stem-like states in early neural development, suggesting similarities in epigenetically driven, developmental origins for both tumor types. References: 1. Chan KM, Fang D, Gan H, et al. Genes Dev. 2013;27(9):985-90; 2. Filbin MG, Tirosh I, Hovestadt V, et al. Science. 2018;360(6386):331-5; 3. Witt H, Mack SC, Ryzhova M, et al. Cancer cell. 2011;20(2):143-57; 4. Bayliss J, Mukherjee P, Lu C, et al. Sci. Transl. Med. 2016;8(366):366ra161; 5. Mack SC, Witt H, Piro RM, et al. Nature. 2014;506(7489):445; 6. Field AR, Jacobs FM, Fiddes IT, et al. bioRxiv. 2017:232553.
Collapse
Affiliation(s)
| | | | | | | | - Isabel Bjork
- University of California, Santa Cruz, Santa Cruz, CA
| | | | | | | | | |
Collapse
|
7
|
Sanders L, Cheney A, Field A, Beale H, Kephart E, Learned K, Bjork I, Durbin A, Lyle G, Pfeil J, Salama S, Haussler D, Vaske O. GENE-11. SHARED LONG NON-CODING RNA DYSREGULATION IN HISTONE H3 K27M GLIOMAS AND PF-A EPENDYMOMAS. Neuro Oncol 2019. [DOI: 10.1093/neuonc/noz036.082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
8
|
Sanders L, Rose-Dey B, Beale H, Pfeil J, Kephart E, Learned K, Durbin A, Bjork I, Currie R, Morozova O, Agnihotri S, Salama S, Haussler D. DIPG-07. GENOMIC ANALYSIS METHODS FOR IDENTIFICATION OF CANCER DRIVER PATHWAYS IN CHILDHOOD BRAIN TUMORS. Neuro Oncol 2018. [DOI: 10.1093/neuonc/noy059.101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
9
|
Beale H, Kephart E, Sanders L, Pfeil J, Bjork I, Salama SR, Haussler D, Morozova O. Getting consistent results from comparative analysis of RNA_Seq data from single patients. J Clin Oncol 2018. [DOI: 10.1200/jco.2018.36.15_suppl.e24194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Affiliation(s)
| | | | | | | | - Isabel Bjork
- University of California, Santa Cruz, Santa Cruz, CA
| | | | | | | |
Collapse
|
10
|
Sanders L, Rose-Dey B, Beale H, Kephart E, Pfeil J, Morozova O, Agnihotri S, Salama SR, Haussler D. Comparative gene expression analysis for identifying clinically relevant overexpressed genes in childhood brain tumors. J Clin Oncol 2018. [DOI: 10.1200/jco.2018.36.15_suppl.e14033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
11
|
Pfeil J, Thornton A, Durbin A, Kephart E, Beale H, Sanders L, Bjork I, Morozova O, Salama SR, Haussler D. Gene expression analysis for improved subtyping of high-risk neuroblastoma. J Clin Oncol 2018. [DOI: 10.1200/jco.2018.36.15_suppl.10559] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Affiliation(s)
| | | | | | | | | | | | - Isabel Bjork
- University of California, Santa Cruz, Santa Cruz, CA
| | | | | | | |
Collapse
|
12
|
Learned K, Durbin A, Currie R, Beale H, Lam DL, Goldstein T, Salama SR, Haussler D, Morozova O, Bjork I. Abstract LB-338: A critical evaluation of genomic data sharing: Barriers to accessing pediatric cancer genomic datasets: a Treehouse Childhood Cancer Initiative experience. Cancer Res 2017. [DOI: 10.1158/1538-7445.am2017-lb-338] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Genomic data sharing is increasingly recognized as critical to genomic research. The need is acute in pediatric cancer research due to the rarity of pediatric tumor types and paucity of pediatric cancer data, and in translational research to assess the impact of genomic research on human health. However, genomic data sharing is hindered by an absence of standards regarding timing, patient privacy, use agreement standards, and data characterization and quality. At UC Santa Cruz Treehouse Childhood Cancer Initiative (treehousegenomics.soe.ucsc.edu), we examine individual pediatric cancer tumor RNA sequencing profiles against a database of over 11,000 tumor RNA sequencing profiles from public genomic datasets such as The Cancer Genome Atlas, Therapeutically Applicable Research To Generate Effective Treatments, International Cancer Genome Consortium, and Medulloblastoma Advanced Genomics International, and pediatric cancer clinical trials with which we partner, such as those at Stanford University, UC San Francisco, Children’s Hospital of Orange County, and British Columbia Children’s Hospital. For over 18 months, we have worked systematically to enhance the Treehouse dataset by adding pediatric cancer data and presently underrepresented tumor types. The NIH and other leading funding agencies now regularly require grantees to make genomic data generated available to the research community, either post-publication or after an embargo period. We have combed websites and public repositories, searched PubMed, and contacted researchers directly. Finding data requires a mining of literature, often with limited information, and initiating the many different processes for requesting permission for these datasets, with different and often cumbersome data use obligations. The combination of cryptically named datasets, multiple data types and the practice of grouping datasets from multiple papers under a single study accession makes zeroing in on the correct dataset challenging. Downloading the genomic data is time-consuming, such that a dataset of under a 100 files can take up to a week to download under optimal conditions. Matching metadata is inconsistently available, often vague, sparse or error ridden. Only after months of identifying, permissioning for use, committing to use- and sharing-restricting terms, and downloading the genomic and metadata, is it possible to assess the quality, often discovering that data quality is low. We evaluate the barriers to data sharing based on the Treehouse experience and offer guidelines for timing, use agreement standards, and data characterization and quality, to enhance data sharing and outcomes for all pediatric cancer patients.
Citation Format: Katrina Learned, Ann Durbin, Robert Currie, Holly Beale, Du Linh Lam, Theodore Goldstein, Sofie R. Salama, David Haussler, Olena Morozova, Isabel Bjork. A critical evaluation of genomic data sharing: Barriers to accessing pediatric cancer genomic datasets: a Treehouse Childhood Cancer Initiative experience [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2017; 2017 Apr 1-5; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2017;77(13 Suppl):Abstract nr LB-338. doi:10.1158/1538-7445.AM2017-LB-338
Collapse
Affiliation(s)
| | - Ann Durbin
- UNIVERSITY OF CALIFORNIA SANTA CRUZ, Santa Cruz, CA
| | | | - Holly Beale
- UNIVERSITY OF CALIFORNIA SANTA CRUZ, Santa Cruz, CA
| | - Du Linh Lam
- UNIVERSITY OF CALIFORNIA SANTA CRUZ, Santa Cruz, CA
| | | | | | | | | | - Isabel Bjork
- UNIVERSITY OF CALIFORNIA SANTA CRUZ, Santa Cruz, CA
| |
Collapse
|
13
|
Beale H, Lam DL, Vivian J, Newton Y, Shah AT, Bjork I, Goldstein T, Brooks AN, Stuart J, Salama S, Sweet-Cordero EA, Haussler1 D, Morozova O. Abstract 2466: Identifying confidently measured genes in single pediatric cancer patient samples using RNA sequencing. Cancer Res 2017. [DOI: 10.1158/1538-7445.am2017-2466] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
In the UC Santa Cruz Treehouse Childhood Cancer Initiative (treehousegenomics.soe.ucsc.edu), we are exploring the utility of using RNA-Seq analysis of tumor samples from children to identify potential novel therapeutic options for each individual. Within a single RNA-Seq data set, the gene expression measurements are not equally accurate. The identification of activated, druggable pathways requires accurate gene-level expression measurements.
We receive samples from a variety of clinical and research settings, and the quantity and complexity of the available input material and the depth of sequencing differ. These factors inspired us to develop a tool that will allow us to identify accurate measurements in most RNA-Seq samples we receive.
First, we characterized the relationship between depth of sequencing and the accuracy of the gene expression measurement. We analyzed subsets of reads in samples with more than 50 million Uniquely Mapped, Exonic, Non-duplicate (UMEND) reads. UMEND reads typically constitute over 80% of the reads in a high quality experiment with sufficient starting material. We compared gene expression across the subsets of reads to calculate how many UMEND reads are required to produce consistent measurements. We found that, on average, genes expressed at 1-5 TPM in our data require 30 million reads to be accurately measured. For this calculation, we define accuracy as the condition in which 75% of genes are measured to within 25% of the true value.
Secondly, we use these known relationships to identify genes that have been accurately measured in our tumor RNA-Seq samples. For a sample with 15 million UMEND reads, we find that genes expressed above 5 TPM can be accurately measured and are retained. In the first twelve samples analyzed, samples with more than 10 million UMEND reads retained at least 46% of the genes expressed above zero. We exclude as references those samples with fewer than 10 million UMEND reads due to the marked gene loss after thresholding for this group.
Using accurately measured genes allows us to more confidently assess similarity to other samples, identify enriched pathways, and confirm the expression of drug targets and related molecules under consideration. For example, we reconsidered the CDK4 inhibitor Palbociclib in one patient because the expression of RB1, downstream effector required for Palbociclib-mediated tumor cell death, was under our accuracy threshold. Accuracy thresholds can also be used in experiment planning.
Accuracy thresholding allows us to better assess the value of an RNA-Seq data set and, if necessary, identify the subset of genes whose expression can be confidently considered in a clinical setting. Our experience points to the importance of careful quality control in this process.
Citation Format: Holly Beale, Du Linh Lam, John Vivian, Yulia Newton, Avanthi Tayi Shah, Isabel Bjork, Ted Goldstein, Angela N. Brooks, Josh Stuart, Sofie Salama, E. Alejandro Sweet-Cordero, David Haussler1, Olena Morozova. Identifying confidently measured genes in single pediatric cancer patient samples using RNA sequencing [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2017; 2017 Apr 1-5; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2017;77(13 Suppl):Abstract nr 2466. doi:10.1158/1538-7445.AM2017-2466
Collapse
|
14
|
Morozova O, Newton Y, Shah AT, Beale H, Lam DL, Vivian J, Bjork I, Goldstein T, Stuart J, Salama S, Sweet-Cordero EA, Haussler D. Abstract 4890: A pan-cancer analysis framework for incorporating gene expression information into clinical interpretation of pediatric cancer genomic data. Cancer Res 2017. [DOI: 10.1158/1538-7445.am2017-4890] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Genomic characterization used in pediatric cancer clinical trials is limited to the detection of somatic mutations and gene fusions in well-characterized cancer genes. However, these approaches do not reveal actionable therapeutic targets for the majority of pediatric cancer patients. Incorporation of gene expression information into clinical genomic analysis is hindered by the lack of appropriate computational methods, designed for single patients rather than patient cohorts. UC Santa Cruz Treehouse Childhood Cancer Initiative (treehousegenomics.soe.ucsc.edu) enables the incorporation of gene expression information into the genomic evaluation of pediatric cancer patients. We leverage large cancer RNA sequencing datasets, including The Cancer Genome Atlas, Therapeutically Applicable Research to Generate Effective Treatments, Medulloblastoma Advanced Genomics International Consortium, International Cancer Genome Consortium, and published research and clinical studies. Through our “pan-cancer analysis”, we compare each prospective tumor’s RNA sequencing and/or mutational profile to over 11,000 uniformly analyzed tumor profiles using our Tumor Map method. Tumor Map visualizes single tumors together with the reference compendium and identifies samples that are most similar to the given tumor based on the gene expression profiles. We also developed a gene expression outlier analysis to identify transcripts that are over expressed in the given tumor. These pan-cancer gene expression analyses are used in conjunction with mutation data to nominate molecular pathways that may be driving the disease in each child, providing useful information to the medical teams. We aim to evaluate this approach in partnership with pediatric cancer clinical genomic trials at Stanford University, UC San Francisco, Children’s Hospital of Orange County, University of Michigan, Children’s Mercy Hospital, and British Columbia Children’s Hospital. The analysis of the first 27 patients at Stanford, most with refractory solid tumors, provided evidence of the potential clinical utility of incorporating gene expression information into the genomic evaluation of pediatric cancer patients. In all cases, we identified candidate driver molecular pathways that could be targeted by existing FDA-approved therapies or therapies available through a clinical trial. The most frequently identified molecular targets were receptor tyrosine kinases and cyclin-dependent kinases. For 3 patients with no treatment options prior to our work, the analysis contributed to treatment decisions. This study provides a framework for incorporating gene expression information into the clinical interpretation of pediatric cancer genomic data. We underscore the importance of releasing the data to the community immediately following generation, so that they may benefit new patients.
Citation Format: Olena Morozova, Yulia Newton, Avanthi Tayi Shah, Holly Beale, Du Linh Lam, John Vivian, Isabel Bjork, Theodore Goldstein, Josh Stuart, Sofie Salama, E. Alejandro Sweet-Cordero, David Haussler. A pan-cancer analysis framework for incorporating gene expression information into clinical interpretation of pediatric cancer genomic data [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2017; 2017 Apr 1-5; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2017;77(13 Suppl):Abstract nr 4890. doi:10.1158/1538-7445.AM2017-4890
Collapse
|
15
|
Lou M, Beale H. Abstract 5265: Cloud-based somatic analysis platform for populations with paired tumor normal samples. Cancer Res 2016. [DOI: 10.1158/1538-7445.am2016-5265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Maverix Biomics introduces its cloud-based somatic analysis platform, a scientifically sound, end-to-end analysis solution that begins with quality control of fastq reads and leads to somatic variant calling, annotation and interactive analysis and has been validated using paired tumor normal cancer samples from the COSMIC cell line project (CLP). In addition to providing the ability to call somatic variants for cancer populations and features to explore characteristics within each of these populations, another strength of the Maverix Analytic Platform is the ability to compare multiple cancer populations - a limitation in other existing platforms. To support this functionality, our cloud-based platform has been optimized using the biological expectations of COSMIC CLP data and outfitted with interactive features within Maverix's Variant Explorer. Specifically, we maximized the sensitivity and minimized the percentage of unvalidated variants of two COSMIC cancer cell lines - small cell lung carcinoma and renal cell carcinoma. Then interactive analysis of this data using Maverix's Variant Explorer has identified somatic variants and affected genes specific to each cancer population and shared among both cancer populations. Our platform expedites data exploration and the ability to identify similarities and differences among cancer populations which is useful for tumor reclassification and for repurposing targeted therapies based on discovering similar genetic profiles across different types of cancer.
Citation Format: Melanie Lou, Holly Beale. Cloud-based somatic analysis platform for populations with paired tumor normal samples. [abstract]. In: Proceedings of the 107th Annual Meeting of the American Association for Cancer Research; 2016 Apr 16-20; New Orleans, LA. Philadelphia (PA): AACR; Cancer Res 2016;76(14 Suppl):Abstract nr 5265.
Collapse
|
16
|
Freedman AH, Gronau I, Schweizer RM, Ortega-Del Vecchyo D, Han E, Silva PM, Galaverni M, Fan Z, Marx P, Lorente-Galdos B, Beale H, Ramirez O, Hormozdiari F, Alkan C, Vilà C, Squire K, Geffen E, Kusak J, Boyko AR, Parker HG, Lee C, Tadigotla V, Siepel A, Bustamante CD, Harkins TT, Nelson SF, Ostrander EA, Marques-Bonet T, Wayne RK, Novembre J. Genome sequencing highlights the dynamic early history of dogs. PLoS Genet 2014; 10:e1004016. [PMID: 24453982 PMCID: PMC3894170 DOI: 10.1371/journal.pgen.1004016] [Citation(s) in RCA: 323] [Impact Index Per Article: 32.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2013] [Accepted: 10/28/2013] [Indexed: 11/18/2022] Open
Abstract
To identify genetic changes underlying dog domestication and reconstruct their early evolutionary history, we generated high-quality genome sequences from three gray wolves, one from each of the three putative centers of dog domestication, two basal dog lineages (Basenji and Dingo) and a golden jackal as an outgroup. Analysis of these sequences supports a demographic model in which dogs and wolves diverged through a dynamic process involving population bottlenecks in both lineages and post-divergence gene flow. In dogs, the domestication bottleneck involved at least a 16-fold reduction in population size, a much more severe bottleneck than estimated previously. A sharp bottleneck in wolves occurred soon after their divergence from dogs, implying that the pool of diversity from which dogs arose was substantially larger than represented by modern wolf populations. We narrow the plausible range for the date of initial dog domestication to an interval spanning 11–16 thousand years ago, predating the rise of agriculture. In light of this finding, we expand upon previous work regarding the increase in copy number of the amylase gene (AMY2B) in dogs, which is believed to have aided digestion of starch in agricultural refuse. We find standing variation for amylase copy number variation in wolves and little or no copy number increase in the Dingo and Husky lineages. In conjunction with the estimated timing of dog origins, these results provide additional support to archaeological finds, suggesting the earliest dogs arose alongside hunter-gathers rather than agriculturists. Regarding the geographic origin of dogs, we find that, surprisingly, none of the extant wolf lineages from putative domestication centers is more closely related to dogs, and, instead, the sampled wolves form a sister monophyletic clade. This result, in combination with dog-wolf admixture during the process of domestication, suggests that a re-evaluation of past hypotheses regarding dog origins is necessary. The process of dog domestication is still poorly understood, largely because no studies thus far have leveraged deeply sequenced whole genomes from wolves and dogs to simultaneously evaluate support for the proposed source regions: East Asia, the Middle East, and Europe. To investigate dog origins, we sequence three wolf genomes from the putative centers of origin, two basal dog breeds (Basenji and Dingo), and a golden jackal as an outgroup. We find that none of the wolf lineages from the hypothesized domestication centers is supported as the source lineage for dogs, and that dogs and wolves diverged 11,000–16,000 years ago in a process involving extensive admixture and that was followed by a bottleneck in wolves. In addition, we investigate the amylase (AMY2B) gene family expansion in dogs, which has recently been suggested as being critical to domestication in response to increased dietary starch. We find standing variation in AMY2B copy number in wolves and show that some breeds, such as Dingo and Husky, lack the AMY2B expansion. This suggests that, at the beginning of the domestication process, dogs may have been characterized by a more carnivorous diet than their modern day counterparts, a diet held in common with early hunter-gatherers.
Collapse
Affiliation(s)
- Adam H. Freedman
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, California, United States of America
| | - Ilan Gronau
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
| | - Rena M. Schweizer
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, California, United States of America
| | - Diego Ortega-Del Vecchyo
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, California, United States of America
| | - Eunjung Han
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, California, United States of America
| | | | | | - Zhenxin Fan
- Key Laboratory of Bioresources and Ecoenvironment, Sichuan University, Chengdu, China
| | - Peter Marx
- Department of Measurement and Information Systems, Budapest University of Technology and Economics, Budapest, Hungary
| | | | - Holly Beale
- National Institutes of Health/NHGRI, Bethesda, Maryland, United States of America
| | - Oscar Ramirez
- Institut de Biologia Evolutiva (CSIC-Univ Pompeu Fabra), Barcelona, Spain
| | - Farhad Hormozdiari
- Department of Computer Science, University of California, Los Angeles, Los Angeles, California, United States of America
| | | | - Carles Vilà
- Estación Biológia de Doñana EBD-CSIC, Sevilla, Spain
| | - Kevin Squire
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, California, United States of America
| | - Eli Geffen
- Department of Zoology, Tel Aviv University, Tel Aviv, Israel
| | | | - Adam R. Boyko
- Department of Veterinary Medicine, Cornell University, Ithaca, New York, United States of America
| | - Heidi G. Parker
- National Institutes of Health/NHGRI, Bethesda, Maryland, United States of America
| | - Clarence Lee
- Life Technologies, Foster City, California, United States of America
| | - Vasisht Tadigotla
- Life Technologies, Foster City, California, United States of America
| | - Adam Siepel
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
| | | | | | - Stanley F. Nelson
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, California, United States of America
| | - Elaine A. Ostrander
- National Institutes of Health/NHGRI, Bethesda, Maryland, United States of America
| | - Tomas Marques-Bonet
- Institut de Biologia Evolutiva (CSIC-Univ Pompeu Fabra), Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA). 08010, Barcelona, Spain
| | - Robert K. Wayne
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, California, United States of America
- * E-mail: (RKW); (JN)
| | - John Novembre
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, California, United States of America
- * E-mail: (RKW); (JN)
| |
Collapse
|
17
|
|