1
|
Sayaman RW, Miyano M, Carlson EG, Senapati P, Zirbes A, Shalabi SF, Todhunter ME, Seewaldt VE, Neuhausen SL, Stampfer MR, Schones DE, LaBarge MA. Luminal epithelial cells integrate variable responses to aging into stereotypical changes that underlie breast cancer susceptibility. eLife 2024; 13:e95720. [PMID: 39545637 PMCID: PMC11723586 DOI: 10.7554/elife.95720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Accepted: 11/08/2024] [Indexed: 11/17/2024] Open
Abstract
Effects from aging in single cells are heterogenous, whereas at the organ- and tissue-levels aging phenotypes tend to appear as stereotypical changes. The mammary epithelium is a bilayer of two major phenotypically and functionally distinct cell lineages: luminal epithelial and myoepithelial cells. Mammary luminal epithelia exhibit substantial stereotypical changes with age that merit attention because these cells are the putative cells-of-origin for breast cancers. We hypothesize that effects from aging that impinge upon maintenance of lineage fidelity increase susceptibility to cancer initiation. We generated and analyzed transcriptomes from primary luminal epithelial and myoepithelial cells from younger <30 (y)ears old and older >55 y women. In addition to age-dependent directional changes in gene expression, we observed increased transcriptional variance with age that contributed to genome-wide loss of lineage fidelity. Age-dependent variant responses were common to both lineages, whereas directional changes were almost exclusively detected in luminal epithelia and involved altered regulation of chromatin and genome organizers such as SATB1. Epithelial expression variance of gap junction protein GJB6 increased with age, and modulation of GJB6 expression in heterochronous co-cultures revealed that it provided a communication conduit from myoepithelial cells that drove directional change in luminal cells. Age-dependent luminal transcriptomes comprised a prominent signal that could be detected in bulk tissue during aging and transition into cancers. A machine learning classifier based on luminal-specific aging distinguished normal from cancer tissue and was highly predictive of breast cancer subtype. We speculate that luminal epithelia are the ultimate site of integration of the variant responses to aging in their surrounding tissue, and that their emergent phenotype both endows cells with the ability to become cancer-cells-of-origin and represents a biosensor that presages cancer susceptibility.
Collapse
Affiliation(s)
- Rosalyn W Sayaman
- City of Hope, Department of Population Sciences, Beckman Research InstituteDuarteUnited States
- City of Hope, Center for Cancer and Aging, Beckman Research InstituteDuarteUnited States
- City of Hope, Cancer Metabolism Training Program, Beckman Research InstituteDuarteUnited States
- Lawrence Berkeley National Lab, Biological Sciences and EngineeringBerkeleyUnited States
| | - Masaru Miyano
- City of Hope, Department of Population Sciences, Beckman Research InstituteDuarteUnited States
- City of Hope, Center for Cancer and Aging, Beckman Research InstituteDuarteUnited States
| | - Eric G Carlson
- City of Hope, Department of Population Sciences, Beckman Research InstituteDuarteUnited States
- City of Hope, Irell and Manella Graduate School of Biological SciencesDuarteUnited States
| | - Parijat Senapati
- City of Hope, Department of Diabetes Complications and Metabolism, Beckman Research InstituteDuarteUnited States
| | - Arrianna Zirbes
- City of Hope, Department of Population Sciences, Beckman Research InstituteDuarteUnited States
- City of Hope, Irell and Manella Graduate School of Biological SciencesDuarteUnited States
| | - Sundus F Shalabi
- City of Hope, Department of Population Sciences, Beckman Research InstituteDuarteUnited States
- City of Hope, Irell and Manella Graduate School of Biological SciencesDuarteUnited States
| | - Michael E Todhunter
- City of Hope, Department of Population Sciences, Beckman Research InstituteDuarteUnited States
- City of Hope, Center for Cancer and Aging, Beckman Research InstituteDuarteUnited States
| | - Victoria E Seewaldt
- City of Hope, Department of Population Sciences, Beckman Research InstituteDuarteUnited States
| | - Susan L Neuhausen
- City of Hope, Department of Population Sciences, Beckman Research InstituteDuarteUnited States
| | - Martha R Stampfer
- Lawrence Berkeley National Lab, Biological Sciences and EngineeringBerkeleyUnited States
| | - Dustin E Schones
- City of Hope, Department of Diabetes Complications and Metabolism, Beckman Research InstituteDuarteUnited States
| | - Mark A LaBarge
- City of Hope, Department of Population Sciences, Beckman Research InstituteDuarteUnited States
- City of Hope, Center for Cancer and Aging, Beckman Research InstituteDuarteUnited States
- Center for Cancer Biomarkers Research, University of BergenBergenNorway
| |
Collapse
|
2
|
Razi A, Lo CC, Wang S, Leek JT, Hansen KD. Genotype prediction of 336,463 samples from public expression data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.10.21.562237. [PMID: 38559266 PMCID: PMC10979922 DOI: 10.1101/2023.10.21.562237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Tens of thousands of RNA-sequencing experiments comprising hundreds of thousands of individual samples have now been performed. These data represent a broad range of experimental conditions, sequencing technologies, and hypotheses under study. The Recount project has aggregated and uniformly processed hundreds of thousands of publicly available RNA-seq samples. Most of these samples only include RNA expression measurements; genotype data for these same samples would enable a wide range of analyses including variant prioritization, eQTL analysis, and studies of allele specific expression. Here, we developed a statistical model based on the existing reference and alternative read counts from the RNA-seq experiments available through Recount3 to predict genotypes at autosomal biallelic loci in coding regions. We demonstrate the accuracy of our model using large-scale studies that measured both gene expression and genotype genome-wide. We show that our predictive model is highly accurate with 99.5% overall accuracy, 99.6% major allele accuracy, and 90.4% minor allele accuracy. Our model is robust to tissue and study effects, provided the coverage is high enough. We applied this model to genotype all the samples in Recount 3 and provide the largest ready-to-use expression repository containing genotype information. We illustrate that the predicted genotype from RNA-seq data is sufficient to unravel the underlying population structure of samples in Recount3 using Principal Component Analysis.
Collapse
Affiliation(s)
- Afrooz Razi
- Department of Genetic Medicine, Johns Hopkins University School of Medicine
| | - Christopher C. Lo
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health
| | - Siruo Wang
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health
| | - Jeffrey T. Leek
- Biostatistics Program, Division of Public Health Sciences, Fred Hutchinson Cancer Center
| | - Kasper D. Hansen
- Department of Genetic Medicine, Johns Hopkins University School of Medicine
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health
- Department of Biomedical Engineering, Johns Hopkins University School of Medicine
| |
Collapse
|
3
|
Zhai J, Nie C, Wang W, Liu C, Liu T, Sun L, Li W, Wang W, Ren X, Han X, Zhou H, Li X, Tian W. Comprehensive Analysis on Prognostic Signature Based on T Cell-Mediated Tumor Killing Related Genes in Gastric Cancer. Biochem Genet 2024; 62:504-529. [PMID: 37386336 DOI: 10.1007/s10528-023-10436-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Accepted: 06/18/2023] [Indexed: 07/01/2023]
Abstract
Although immunotherapy is a valuable treatment for gastric cancer (GC), identifying the patients who would benefit most from this approach presents a challenge. In this study, GC patients were divided into two subtypes by consensus clustering according to T cell-mediated tumor killing related genes (TTKRGs), and there were significant differences in tumor-infiltrating immune cells, signaling pathways, and gene expression of immunomodulators and inhibitory immune checkpoints between the two subtypes. Then, we developed an individualized signature based on TTKRGs, and its clinical and predictive value in GC patients for chemotherapeutic and immunotherapeutic responses was assessed. We confirmed the expression levels of signature genes in GC tumor tissue using quantitative real-time polymerase chain reaction (qRT-PCR). Additionally, to improve the accuracy of GC prognosis predictions, we established a nomogram. We further identified some compounds as sensitive drugs targeting GC risk groups. The signature showed significant predictive ability across RNA-seq, microarray, and qRT-PCR cohorts, which could assist in predicting survival, immunotherapeutic and chemotherapeutic outcomes in GC patients.
Collapse
Affiliation(s)
- Jiabao Zhai
- Department of Epidemiology, School of Public Health, Harbin Medical University, 157 Baojian Road, 150081, Harbin, China
| | - Chuang Nie
- Department of Epidemiology, School of Public Health, Harbin Medical University, 157 Baojian Road, 150081, Harbin, China
| | - Wanyu Wang
- Department of Epidemiology, School of Public Health, Harbin Medical University, 157 Baojian Road, 150081, Harbin, China
| | - Chang Liu
- Department of Epidemiology, School of Public Health, Harbin Medical University, 157 Baojian Road, 150081, Harbin, China
| | - Tianyu Liu
- Department of Epidemiology, School of Public Health, Harbin Medical University, 157 Baojian Road, 150081, Harbin, China
| | - Lishuang Sun
- Department of Epidemiology, School of Public Health, Harbin Medical University, 157 Baojian Road, 150081, Harbin, China
| | - Wei Li
- Department of Epidemiology, School of Public Health, Harbin Medical University, 157 Baojian Road, 150081, Harbin, China
| | - Wentong Wang
- Department of Epidemiology, School of Public Health, Harbin Medical University, 157 Baojian Road, 150081, Harbin, China
| | - Xiyun Ren
- Department of Epidemiology, School of Public Health, Harbin Medical University, 157 Baojian Road, 150081, Harbin, China
| | - Xu Han
- Department of Epidemiology, School of Public Health, Harbin Medical University, 157 Baojian Road, 150081, Harbin, China
| | - Haibo Zhou
- Department of Epidemiology, School of Public Health, Harbin Medical University, 157 Baojian Road, 150081, Harbin, China
| | - Xin Li
- Department of Epidemiology, School of Public Health, Harbin Medical University, 157 Baojian Road, 150081, Harbin, China
| | - Wenjing Tian
- Department of Epidemiology, School of Public Health, Harbin Medical University, 157 Baojian Road, 150081, Harbin, China.
| |
Collapse
|
4
|
Muley VY. Deep Learning for Predicting Gene Regulatory Networks: A Step-by-Step Protocol in R. Methods Mol Biol 2024; 2719:265-294. [PMID: 37803123 DOI: 10.1007/978-1-0716-3461-5_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/08/2023]
Abstract
Deep learning has emerged as a powerful tool for solving complex problems, including reconstruction of gene regulatory networks within the realm of biology. These networks consist of transcription factors and their associations with genes they regulate. Despite the utility of deep learning methods in studying gene expression and regulation, their accessibility remains limited for biologists, mainly due to the prerequisites of programming skills and a nuanced grasp of the underlying algorithms. This chapter presents a deep learning protocol that utilize TensorFlow and the Keras API in R/RStudio, with the aim of making deep learning accessible for individuals without specialized expertise. The protocol focuses on the genome-wide prediction of regulatory interactions between transcription factors and genes, leveraging publicly available gene expression data in conjunction with well-established benchmarks. The protocol encompasses pivotal phases including data preprocessing, conceptualization of neural network architectures, iterative processes of model training and validation, as well as forecasting of novel regulatory associations. Furthermore, it provides insights into parameter tuning for deep learning models. By adhering to this protocol, researchers are expected to gain a comprehensive understanding of applying deep learning techniques to predict regulatory interactions. This protocol can be readily modifiable to serve diverse research problems, thereby empowering scientists to effectively harness the capabilities of deep learning in their investigations.
Collapse
Affiliation(s)
- Vijaykumar Yogesh Muley
- Independent Researcher, Hingoli, India.
- Instituto de Neurobiología, Universidad Nacional Autónoma de México, Querétaro, México.
| |
Collapse
|
5
|
Belova T, Biondi N, Hsieh PH, Lutsik P, Chudasama P, Kuijjer M. Heterogeneity in the gene regulatory landscape of leiomyosarcoma. NAR Cancer 2023; 5:zcad037. [PMID: 37492373 PMCID: PMC10365024 DOI: 10.1093/narcan/zcad037] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Revised: 07/06/2023] [Accepted: 07/18/2023] [Indexed: 07/27/2023] Open
Abstract
Characterizing inter-tumor heterogeneity is crucial for selecting suitable cancer therapy, as the presence of diverse molecular subgroups of patients can be associated with disease outcome or response to treatment. While cancer subtypes are often characterized by differences in gene expression, the mechanisms driving these differences are generally unknown. We set out to model the regulatory mechanisms driving sarcoma heterogeneity based on patient-specific, genome-wide gene regulatory networks. We developed a new computational framework, PORCUPINE, which combines knowledge on biological pathways with permutation-based network analysis to identify pathways that exhibit significant regulatory heterogeneity across a patient population. We applied PORCUPINE to patient-specific leiomyosarcoma networks modeled on data from The Cancer Genome Atlas and validated our results in an independent dataset from the German Cancer Research Center. PORCUPINE identified 37 heterogeneously regulated pathways, including pathways representing potential targets for treatment of subgroups of leiomyosarcoma patients, such as FGFR and CTLA4 inhibitory signaling. We validated the detected regulatory heterogeneity through analysis of networks and chromatin states in leiomyosarcoma cell lines. We showed that the heterogeneity identified with PORCUPINE is not associated with methylation profiles or clinical features, thereby suggesting an independent mechanism of patient heterogeneity driven by the complex landscape of gene regulatory interactions.
Collapse
Affiliation(s)
- Tatiana Belova
- Computational Biology and Systems Medicine Group, Centre for Molecular Medicine Norway, University of Oslo, Oslo, Norway
| | - Nicola Biondi
- Precision Sarcoma Research Group, German Cancer Research Center (DKFZ) and National Center for Tumor Diseases, Heidelberg, Germany
| | - Ping-Han Hsieh
- Computational Biology and Systems Medicine Group, Centre for Molecular Medicine Norway, University of Oslo, Oslo, Norway
| | - Pavlo Lutsik
- Division of Cancer Epigenomics, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Department of Oncology, Catholic University (KU) Leuven, Leuven, Belgium
| | - Priya Chudasama
- Precision Sarcoma Research Group, German Cancer Research Center (DKFZ) and National Center for Tumor Diseases, Heidelberg, Germany
| | - Marieke L Kuijjer
- Computational Biology and Systems Medicine Group, Centre for Molecular Medicine Norway, University of Oslo, Oslo, Norway
- Department of Pathology, Leiden University Medical Center, Leiden, the Netherlands
- Leiden Center for Computational Oncology, Leiden University Medical Center, Leiden, the Netherlands
| |
Collapse
|
6
|
Gu J, Dai J, Lu H, Zhao H. Comprehensive analysis of ubiquitously expressed genes in human, from a data-driven perspective. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022:S1672-0229(22)00042-0. [PMID: 35569803 PMCID: PMC10373092 DOI: 10.1016/j.gpb.2021.08.017] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Revised: 07/18/2021] [Accepted: 09/27/2021] [Indexed: 01/08/2023]
Abstract
Comprehensive characterization of spatial and temporal gene expression patterns in humans is critical for uncovering the regulatory codes of the human genome and understanding the molecular mechanism of human disease. The ubiquitously expressed genes (UEGs) refer to those genes expressed across a majority, if not all, phenotypic and physiological conditions of an organism. It is known that many human genes are broadly expressed across tissues. However, most previous UEG studies have only focused on providing a list of UEGs without capturing their global expression patterns, thus limiting the potential use of UEG information. In this article, we proposed a novel data-driven framework to leverage the extensive collection of ∼40,000 human transcriptomes to derive a list of UEGs and their corresponding global expression patterns, which offers a valuable resource to further characterize human transcriptome. Our results suggest that about half (12,234; 49.01%) of the human genes are expressed in at least 80% of human transcriptomes, and the median size of the human transcriptome is 16,342 (65.44%). Through gene clustering, we identified a set of UEGs, named LoVarUEGs, that have stable expression across human transcriptomes and can be used as internal reference genes for expression measurement. To further demonstrate the usefulness of this resource, we evaluated the global expression patterns for 16 previously predicted disallowed genes in islets beta cells and found that seven of these genes showed relatively more varied expression patterns, suggesting that the repression of these genes may not be unique to islets beta cells.
Collapse
Affiliation(s)
- Jianlei Gu
- SJTU-Yale Joint Center for Biostatistics and Data Science, Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China; Center for Biomedical Informatics, Shanghai Engineering Research Center for Big Data in Pediatric Precision Medicine, Shanghai Children's Hospital, Shanghai 200040, China; Department of Biostatistics, Yale University, New Haven, CT, 06511, United States
| | - Jiawei Dai
- SJTU-Yale Joint Center for Biostatistics and Data Science, Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Hui Lu
- SJTU-Yale Joint Center for Biostatistics and Data Science, Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China; Center for Biomedical Informatics, Shanghai Engineering Research Center for Big Data in Pediatric Precision Medicine, Shanghai Children's Hospital, Shanghai 200040, China.
| | - Hongyu Zhao
- Department of Biostatistics, Yale University, New Haven, CT, 06511, United States.
| |
Collapse
|
7
|
Faustino D, Brinkmeier H, Logotheti S, Jonitz-Heincke A, Yilmaz H, Takan I, Peters K, Bader R, Lang H, Pavlopoulou A, Pützer BM, Spitschak A. Novel integrated workflow allows production and in-depth quality assessment of multifactorial reprogrammed skeletal muscle cells from human stem cells. Cell Mol Life Sci 2022; 79:229. [PMID: 35396689 PMCID: PMC8993739 DOI: 10.1007/s00018-022-04264-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 03/08/2022] [Accepted: 03/20/2022] [Indexed: 11/03/2022]
Abstract
Skeletal muscle tissue engineering aims at generating biological substitutes that restore, maintain or improve normal muscle function; however, the quality of cells produced by current protocols remains insufficient. Here, we developed a multifactor-based protocol that combines adenovector (AdV)-mediated MYOD expression, small molecule inhibitor and growth factor treatment, and electrical pulse stimulation (EPS) to efficiently reprogram different types of human-derived multipotent stem cells into physiologically functional skeletal muscle cells (SMCs). The protocol was complemented through a novel in silico workflow that allows for in-depth estimation and potentially optimization of the quality of generated muscle tissue, based on the transcriptomes of transdifferentiated cells. We additionally patch-clamped phenotypic SMCs to associate their bioelectrical characteristics with their transcriptome reprogramming. Overall, we set up a comprehensive and dynamic approach at the nexus of viral vector-based technology, bioinformatics, and electrophysiology that facilitates production of high-quality skeletal muscle cells and can guide iterative cycles to improve myo-differentiation protocols.
Collapse
Affiliation(s)
- Dinis Faustino
- Institute of Experimental Gene Therapy and Cancer Research, Rostock University Medical Center, 18057, Rostock, Germany.,Department Life, Light and Matter, University of Rostock, 18059, Rostock, Germany
| | - Heinrich Brinkmeier
- Institute of Pathophysiology, University Medicine Greifswald, 17489, Greifswald, Germany
| | - Stella Logotheti
- Institute of Experimental Gene Therapy and Cancer Research, Rostock University Medical Center, 18057, Rostock, Germany.,Department Life, Light and Matter, University of Rostock, 18059, Rostock, Germany
| | - Anika Jonitz-Heincke
- Biomechanics and Implant Technology Research Laboratory, Department of Orthopedics, Rostock University Medical Centre, 18057, Rostock, Germany
| | - Hande Yilmaz
- Institute of Experimental Gene Therapy and Cancer Research, Rostock University Medical Center, 18057, Rostock, Germany.,Department Life, Light and Matter, University of Rostock, 18059, Rostock, Germany
| | - Isil Takan
- Izmir Biomedicine and Genome Center (IBG), Balcova, 35340, Izmir, Turkey.,Izmir International Biomedicine and Genome Institute, Dokuz Eylül University, Balcova, 35340, Izmir, Turkey
| | - Kirsten Peters
- Department of Cell Biology, Rostock University Medical Center, 18057, Rostock, Germany
| | - Rainer Bader
- Biomechanics and Implant Technology Research Laboratory, Department of Orthopedics, Rostock University Medical Centre, 18057, Rostock, Germany
| | - Hermann Lang
- Department of Operative Dentistry and Periodontology, Rostock University Medical Centre, 18057, Rostock, Germany
| | - Athanasia Pavlopoulou
- Izmir Biomedicine and Genome Center (IBG), Balcova, 35340, Izmir, Turkey.,Izmir International Biomedicine and Genome Institute, Dokuz Eylül University, Balcova, 35340, Izmir, Turkey
| | - Brigitte M Pützer
- Institute of Experimental Gene Therapy and Cancer Research, Rostock University Medical Center, 18057, Rostock, Germany. .,Department Life, Light and Matter, University of Rostock, 18059, Rostock, Germany.
| | - Alf Spitschak
- Institute of Experimental Gene Therapy and Cancer Research, Rostock University Medical Center, 18057, Rostock, Germany.,Department Life, Light and Matter, University of Rostock, 18059, Rostock, Germany
| |
Collapse
|
8
|
Nie C, Zhai J, Wang Q, Zhu X, Xiang G, Liu C, Liu T, Wang W, Wang Y, Zhao Y, Tian W, Xue Y, Zhou H. Comprehensive Analysis of an Individualized Immune-Related lncRNA Pair Signature in Gastric Cancer. Front Cell Dev Biol 2022; 10:805623. [PMID: 35273959 PMCID: PMC8902466 DOI: 10.3389/fcell.2022.805623] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2021] [Accepted: 02/02/2022] [Indexed: 12/26/2022] Open
Abstract
Long noncoding RNAs (lncRNAs) have diverse functions, including immune regulation. Increasing studies have reported immune-related lncRNAs in the prognosis of multiple cancers. In this study, we developed an individualized signature containing 13 immune-related lncRNA pairs (IRLPs) which could predict the overall survival, disease-free survival, progression-free survival, and disease-specific survival of gastric cancer (GC) patients in The Cancer Genome Atlas (TCGA) cohort, and internal and external validations, signature comparisons, and subgroup analyses further confirmed its superiority, stability, and generalizability. Notably, this signature also showed good applicability in discriminating the prognosis of pan-cancer patients. Then, we constructed and validated a nomogram for overall survival based on the signature and clinical factors, which allowed more accurate predictions of GC prognosis. In addition, we revealed that the low survival rate of patients with high-risk scores may be due to their aggressive clinical features, enriched cancer-related signaling pathways, the infiltration of specific immunosuppressive cells, and low tumor mutation burden. We further predicted obviously worse immunotherapeutic responses in the high-risk groups and identified some candidate compounds targeting GC risk group differentiation. This signature based on the IRLPs may be promising for predicting the survival outcomes and immunotherapeutic responses of GC patients in clinical practice.
Collapse
Affiliation(s)
- Chuang Nie
- Department of Epidemiology, College of Public Health, Harbin Medical University, Harbin, China
| | - Jiabao Zhai
- Department of Epidemiology, College of Public Health, Harbin Medical University, Harbin, China
| | - Qi Wang
- Department of Epidemiology, College of Public Health, Harbin Medical University, Harbin, China
| | - Xiaojie Zhu
- Department of Epidemiology, College of Public Health, Harbin Medical University, Harbin, China
| | - Guanghui Xiang
- Department of Epidemiology, College of Public Health, Harbin Medical University, Harbin, China
| | - Chang Liu
- Department of Epidemiology, College of Public Health, Harbin Medical University, Harbin, China
| | - Tianyu Liu
- Department of Epidemiology, College of Public Health, Harbin Medical University, Harbin, China
| | - Wanyu Wang
- Department of Epidemiology, College of Public Health, Harbin Medical University, Harbin, China
| | - Yimin Wang
- Department of Gastroenterological Surgery, Harbin Medical University Cancer Hospital, Harbin, China
| | - Yashuang Zhao
- Department of Epidemiology, College of Public Health, Harbin Medical University, Harbin, China
| | - Wenjing Tian
- Department of Epidemiology, College of Public Health, Harbin Medical University, Harbin, China
| | - Yingwei Xue
- Department of Gastroenterological Surgery, Harbin Medical University Cancer Hospital, Harbin, China
| | - Haibo Zhou
- Department of Epidemiology, College of Public Health, Harbin Medical University, Harbin, China
| |
Collapse
|
9
|
Shah AH, Suter R, Gudoor P, Doucet-O’Hare TT, Stathias V, Cajigas I, de la Fuente M, Govindarajan V, Morell AA, Eichberg DG, Luther E, Lu VM, Heiss J, Komotar RJ, Ivan ME, Schurer S, Gilbert MR, Ayad NG. A Multiparametric Pharmacogenomic Strategy for Drug Repositioning predicts Therapeutic Efficacy for Glioblastoma Cell Lines. Neurooncol Adv 2021; 4:vdab192. [DOI: 10.1093/noajnl/vdab192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Abstract
Background
Poor prognosis of glioblastoma patients and the extensive heterogeneity of glioblastoma at both the molecular and cellular level necessitates developing novel individualized treatment modalities via genomics-driven approaches.
Methods
This study leverages numerous pharmacogenomic and tissue databases to examine drug repositioning for glioblastoma. RNAseq of glioblastoma tumor samples from The Cancer Genome Atlas (TCGA, n=117) were compared to “normal” frontal lobe samples from Genotype-Tissue Expression Portal (GTEX, n=120) to find differentially expressed genes (DEGs). Using compound-gene expression data and drug activity data from the Library of Integrated Network-Based Cellular Signatures (LINCS, n=66,512 compounds) CCLE (71 glioma cell lines), and Chemical European Molecular Biology Laboratory (ChEMBL) platforms, we employed a summarized reversal gene expression metric (sRGES) to “reverse” the resultant disease signature for GBM and its subtypes. A multi-parametric strategy was employed to stratify compounds capable of blood brain barrier penetrance with a favorable pharmacokinetic profile (CNS-MPO).
Results
Significant correlations were identified between sRGES and drug efficacy in GBM cell lines in both ChEMBL(r=0.37,p<.001) and Cancer Therapeutic Response Portal (CTRP) databases (r=0.35, p<0.001). Our multiparametric algorithm identified two classes of drugs with highest sRGES and CNS-MPO: HDAC inhibitors (vorinostat and entinostat) and topoisomerase inhibitors suitable for drug repurposing.
Conclusions
Our studies suggest that reversal of glioblastoma disease signature correlates with drug potency for various GBM subtypes. This multiparametric approach may set the foundation for an early-phase personalized -omics clinical trial for glioblastoma by effectively identifying drugs that are capable of reversing the disease signature and have favorable pharmacokinetic and safety profiles.
Collapse
Affiliation(s)
- Ashish H Shah
- Department of Neurological Surgery, Sylvester Comprehensive Cancer Center, Miami
| | - Robert Suter
- Department of Neurological Surgery, Sylvester Comprehensive Cancer Center, Miami
| | - Pavan Gudoor
- Department of Neurological Surgery, Sylvester Comprehensive Cancer Center, Miami
| | | | | | - Iahn Cajigas
- Department of Neurological Surgery, Sylvester Comprehensive Cancer Center, Miami
| | | | - Vaidya Govindarajan
- Department of Neurological Surgery, Sylvester Comprehensive Cancer Center, Miami
| | - Alexis A Morell
- Department of Neurological Surgery, Sylvester Comprehensive Cancer Center, Miami
| | - Daniel G Eichberg
- Department of Neurological Surgery, Sylvester Comprehensive Cancer Center, Miami
| | - Evan Luther
- Department of Neurological Surgery, Sylvester Comprehensive Cancer Center, Miami
| | - Victor M Lu
- Department of Neurological Surgery, Sylvester Comprehensive Cancer Center, Miami
| | - John Heiss
- Surgical Neurology Division, NINDS National Institute of Health
| | - Ricardo J Komotar
- Department of Neurological Surgery, Sylvester Comprehensive Cancer Center, Miami
| | - Michael E Ivan
- Department of Neurological Surgery, Sylvester Comprehensive Cancer Center, Miami
| | | | | | - Nagi G Ayad
- Department of Neurological Surgery, Sylvester Comprehensive Cancer Center, Miami
| |
Collapse
|
10
|
Ansell BRE, Thomas SN, Bonelli R, Munro JE, Freytag S, Bahlo M. A survey of RNA editing at single-cell resolution links interneurons to schizophrenia and autism. RNA (NEW YORK, N.Y.) 2021; 27:1482-1496. [PMID: 34535545 PMCID: PMC8594476 DOI: 10.1261/rna.078804.121] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Accepted: 09/12/2021] [Indexed: 06/13/2023]
Abstract
Conversion of adenosine to inosine in RNA by ADAR enzymes, termed "RNA editing," is essential for healthy brain development. Editing is dysregulated in neuropsychiatric diseases, but has not yet been investigated at scale at the level of individual neurons. We quantified RNA editing sites in nuclear transcriptomes of 3055 neurons from six cortical regions of a neurotypical female donor, and found 41,930 sites present in at least ten nuclei. Most sites were located within Alu repeats in introns or 3' UTRs, and approximately 80% were cataloged in public RNA editing databases. We identified 9285 putative novel editing sites, 29% of which were also detectable in unrelated donors. Intersection with results from bulk RNA-seq studies provided cell-type and spatial context for 1730 sites that are differentially edited in schizophrenic brain donors, and 910 such sites in autistic donors. Autism-related genes were also enriched with editing sites predicted to modify RNA structure. Inhibitory neurons showed higher overall transcriptome editing than excitatory neurons, and the highest editing rates were observed in the frontal cortex. We used generalized linear models to identify differentially edited sites and genes between cell types. Twenty nine genes were preferentially edited in excitatory neurons, and 43 genes were edited more heavily in inhibitory neurons, including RBFOX1, its target genes, and genes in the autism-associated Prader-Willi locus (15q11). The abundance of SNORD115/116 genes from locus 15q11 was positively associated with editing activity across the transcriptome. We contend that insufficient editing of autism-related genes in inhibitory neurons may contribute to the specific perturbation of those cells in autism.
Collapse
Affiliation(s)
- Brendan Robert E Ansell
- Population Health and Immunity Division, Walter and Eliza Hall Institute of Medical Research, Parkville 3052, Victoria, Australia
- Department of Medical Biology, University of Melbourne, Parkville 3052, Victoria, Australia
| | - Simon N Thomas
- Population Health and Immunity Division, Walter and Eliza Hall Institute of Medical Research, Parkville 3052, Victoria, Australia
- Department of Medical Biology, University of Melbourne, Parkville 3052, Victoria, Australia
| | - Roberto Bonelli
- Population Health and Immunity Division, Walter and Eliza Hall Institute of Medical Research, Parkville 3052, Victoria, Australia
- Department of Medical Biology, University of Melbourne, Parkville 3052, Victoria, Australia
| | - Jacob E Munro
- Population Health and Immunity Division, Walter and Eliza Hall Institute of Medical Research, Parkville 3052, Victoria, Australia
- Department of Medical Biology, University of Melbourne, Parkville 3052, Victoria, Australia
| | - Saskia Freytag
- Molecular Medicine Division, Harry Perkins Institute of Medical Research, Nedlands 6009, Western Australia, Australia
| | - Melanie Bahlo
- Population Health and Immunity Division, Walter and Eliza Hall Institute of Medical Research, Parkville 3052, Victoria, Australia
- Department of Medical Biology, University of Melbourne, Parkville 3052, Victoria, Australia
| |
Collapse
|
11
|
Wilks C, Zheng SC, Chen FY, Charles R, Solomon B, Ling JP, Imada EL, Zhang D, Joseph L, Leek JT, Jaffe AE, Nellore A, Collado-Torres L, Hansen KD, Langmead B. recount3: summaries and queries for large-scale RNA-seq expression and splicing. Genome Biol 2021; 22:323. [PMID: 34844637 PMCID: PMC8628444 DOI: 10.1186/s13059-021-02533-6] [Citation(s) in RCA: 137] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Accepted: 10/29/2021] [Indexed: 12/12/2022] Open
Abstract
We present recount3, a resource consisting of over 750,000 publicly available human and mouse RNA sequencing (RNA-seq) samples uniformly processed by our new Monorail analysis pipeline. To facilitate access to the data, we provide the recount3 and snapcount R/Bioconductor packages as well as complementary web resources. Using these tools, data can be downloaded as study-level summaries or queried for specific exon-exon junctions, genes, samples, or other features. Monorail can be used to process local and/or private data, allowing results to be directly compared to any study in recount3. Taken together, our tools help biologists maximize the utility of publicly available RNA-seq data, especially to improve their understanding of newly collected data. recount3 is available from http://rna.recount.bio .
Collapse
Affiliation(s)
- Christopher Wilks
- Department of Computer Science, Johns Hopkins University, Baltimore, USA
| | - Shijie C Zheng
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, USA
| | | | - Rone Charles
- Department of Computer Science, Johns Hopkins University, Baltimore, USA
| | - Brad Solomon
- Thomas M. Siebel Center for Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Jonathan P Ling
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, USA
| | - Eddie Luidy Imada
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
| | - David Zhang
- Institute of Child Health, University College London (UCL), London, UK
| | | | - Jeffrey T Leek
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, USA
| | - Andrew E Jaffe
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, USA
- Lieber Institute for Brain Development, Baltimore, USA
- Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, USA
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, USA
| | - Abhinav Nellore
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
- Department of Surgery, Oregon Health & Science University, Portland, OR, USA
| | | | - Kasper D Hansen
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, USA.
- Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, USA.
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University, Baltimore, USA.
| |
Collapse
|
12
|
Yılmaz H, Toy HI, Marquardt S, Karakülah G, Küçük C, Kontou PI, Logotheti S, Pavlopoulou A. In Silico Methods for the Identification of Diagnostic and Favorable Prognostic Markers in Acute Myeloid Leukemia. Int J Mol Sci 2021; 22:ijms22179601. [PMID: 34502522 PMCID: PMC8431757 DOI: 10.3390/ijms22179601] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Revised: 08/13/2021] [Accepted: 08/20/2021] [Indexed: 12/13/2022] Open
Abstract
Acute myeloid leukemia (AML), the most common type of acute leukemia in adults, is mainly asymptomatic at early stages and progresses/recurs rapidly and frequently. These attributes necessitate the identification of biomarkers for timely diagnosis and accurate prognosis. In this study, differential gene expression analysis was performed on large-scale transcriptomics data of AML patients versus corresponding normal tissue. Weighted gene co-expression network analysis was conducted to construct networks of co-expressed genes, and detect gene modules. Finally, hub genes were identified from selected modules by applying network-based methods. This robust and integrative bioinformatics approach revealed a set of twenty-four genes, mainly related to cell cycle and immune response, the diagnostic significance of which was subsequently compared against two independent gene expression datasets. Furthermore, based on a recent notion suggesting that molecular characteristics of a few, unusual patients with exceptionally favorable survival can provide insights for improving the outcome of individuals with more typical disease trajectories, we defined groups of long-term survivors in AML patient cohorts and compared their transcriptomes versus the general population to infer favorable prognostic signatures. These findings could have potential applications in the clinical setting, in particular, in diagnosis and prognosis of AML.
Collapse
Affiliation(s)
- Hande Yılmaz
- Izmir Biomedicine and Genome Center, Balcova, 35340 Izmir, Turkey; (H.Y.); (H.I.T.); (G.K.); (C.K.)
- Izmir International Biomedicine and Genome Institute, Dokuz Eylül University, Balcova, 35340 Izmir, Turkey
- Institute of Experimental Gene Therapy and Cancer Research, Rostock University Medical Center, 18057 Rostock, Germany;
| | - Halil Ibrahim Toy
- Izmir Biomedicine and Genome Center, Balcova, 35340 Izmir, Turkey; (H.Y.); (H.I.T.); (G.K.); (C.K.)
- Izmir International Biomedicine and Genome Institute, Dokuz Eylül University, Balcova, 35340 Izmir, Turkey
| | - Stephan Marquardt
- Institute of Experimental Gene Therapy and Cancer Research, Rostock University Medical Center, 18057 Rostock, Germany;
| | - Gökhan Karakülah
- Izmir Biomedicine and Genome Center, Balcova, 35340 Izmir, Turkey; (H.Y.); (H.I.T.); (G.K.); (C.K.)
- Izmir International Biomedicine and Genome Institute, Dokuz Eylül University, Balcova, 35340 Izmir, Turkey
| | - Can Küçük
- Izmir Biomedicine and Genome Center, Balcova, 35340 Izmir, Turkey; (H.Y.); (H.I.T.); (G.K.); (C.K.)
- Izmir International Biomedicine and Genome Institute, Dokuz Eylül University, Balcova, 35340 Izmir, Turkey
- Department of Medical Biology, Faculty of Medicine, Dokuz Eylül University, Balcova, 35340 Izmir, Turkey
| | - Panagiota I. Kontou
- Department of Computer Science and Biomedical Informatics, University of Thessaly, 35131 Lamia, Greece;
| | - Stella Logotheti
- Institute of Experimental Gene Therapy and Cancer Research, Rostock University Medical Center, 18057 Rostock, Germany;
- Correspondence: (S.L.); (A.P.)
| | - Athanasia Pavlopoulou
- Izmir Biomedicine and Genome Center, Balcova, 35340 Izmir, Turkey; (H.Y.); (H.I.T.); (G.K.); (C.K.)
- Izmir International Biomedicine and Genome Institute, Dokuz Eylül University, Balcova, 35340 Izmir, Turkey
- Correspondence: (S.L.); (A.P.)
| |
Collapse
|
13
|
Wu T, Hu E, Xu S, Chen M, Guo P, Dai Z, Feng T, Zhou L, Tang W, Zhan L, Fu X, Liu S, Bo X, Yu G. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. Innovation (N Y) 2021; 2:100141. [PMID: 34557778 PMCID: PMC8454663 DOI: 10.1016/j.xinn.2021.100141] [Citation(s) in RCA: 4407] [Impact Index Per Article: 1101.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2021] [Accepted: 06/29/2021] [Indexed: 12/15/2022] Open
Abstract
Functional enrichment analysis is pivotal for interpreting high-throughput omics data in life science. It is crucial for this type of tool to use the latest annotation databases for as many organisms as possible. To meet these requirements, we present here an updated version of our popular Bioconductor package, clusterProfiler 4.0. This package has been enhanced considerably compared with its original version published 9 years ago. The new version provides a universal interface for functional enrichment analysis in thousands of organisms based on internally supported ontologies and pathways as well as annotation data provided by users or derived from online databases. It also extends the dplyr and ggplot2 packages to offer tidy interfaces for data operation and visualization. Other new features include gene set enrichment analysis and comparison of enrichment results from multiple gene lists. We anticipate that clusterProfiler 4.0 will be applied to a wide range of scenarios across diverse organisms. clusterProfiler supports exploring functional characteristics of both coding and non-coding genomics data for thousands of species with up-to-date gene annotation It provides a universal interface for gene functional annotation from a variety of sources and thus can be applied in diverse scenarios It provides a tidy interface to access, manipulate, and visualize enrichment results to help users achieve efficient data interpretation Datasets obtained from multiple treatments and time points can be analyzed and compared in a single run, easily revealing functional consensus and differences among distinct conditions
Collapse
Affiliation(s)
- Tianzhi Wu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Erqiang Hu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Shuangbin Xu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Meijun Chen
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Pingfan Guo
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Zehan Dai
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Tingze Feng
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Lang Zhou
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Wenli Tang
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Li Zhan
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Xiaocong Fu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Shanshan Liu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Xiaochen Bo
- Department of Biotechnology, Beijing Institute of Radiation Medicine, Beijing 100850, China
| | - Guangchuang Yu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China.,Guangdong Provincial Key Laboratory of Proteomics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China.,Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou 510515, China
| |
Collapse
|
14
|
Harikumar H, Quinn TP, Rana S, Gupta S, Venkatesh S. Personalized single-cell networks: a framework to predict the response of any gene to any drug for any patient. BioData Min 2021; 14:37. [PMID: 34353329 PMCID: PMC8340371 DOI: 10.1186/s13040-021-00263-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Accepted: 05/10/2021] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND The last decade has seen a major increase in the availability of genomic data. This includes expert-curated databases that describe the biological activity of genes, as well as high-throughput assays that measure gene expression in bulk tissue and single cells. Integrating these heterogeneous data sources can generate new hypotheses about biological systems. Our primary objective is to combine population-level drug-response data with patient-level single-cell expression data to predict how any gene will respond to any drug for any patient. METHODS We take 2 approaches to benchmarking a "dual-channel" random walk with restart (RWR) for data integration. First, we evaluate how well RWR can predict known gene functions from single-cell gene co-expression networks. Second, we evaluate how well RWR can predict known drug responses from individual cell networks. We then present two exploratory applications. In the first application, we combine the Gene Ontology database with glioblastoma single cells from 5 individual patients to identify genes whose functions differ between cancers. In the second application, we combine the LINCS drug-response database with the same glioblastoma data to identify genes that may exhibit patient-specific drug responses. CONCLUSIONS Our manuscript introduces two innovations to the integration of heterogeneous biological data. First, we use a "dual-channel" method to predict up-regulation and down-regulation separately. Second, we use individualized single-cell gene co-expression networks to make personalized predictions. These innovations let us predict gene function and drug response for individual patients. Taken together, our work shows promise that single-cell co-expression data could be combined in heterogeneous information networks to facilitate precision medicine.
Collapse
Affiliation(s)
- Haripriya Harikumar
- Applied Artificial Intelligence Institute, Deakin University, Geelong, Australia.
- Institute for Health Transformation, Deakin University, Geelong, Australia.
| | - Thomas P Quinn
- Applied Artificial Intelligence Institute, Deakin University, Geelong, Australia.
| | - Santu Rana
- Applied Artificial Intelligence Institute, Deakin University, Geelong, Australia
| | - Sunil Gupta
- Applied Artificial Intelligence Institute, Deakin University, Geelong, Australia
| | - Svetha Venkatesh
- Applied Artificial Intelligence Institute, Deakin University, Geelong, Australia
| |
Collapse
|
15
|
Bogatyrova O, Mattsson JSM, Ross EM, Sanderson MP, Backman M, Botling J, Brunnström H, Kurppa P, La Fleur L, Strell C, Wilm C, Zimmermann A, Esdar C, Micke P. FGFR1 overexpression in non-small cell lung cancer is mediated by genetic and epigenetic mechanisms and is a determinant of FGFR1 inhibitor response. Eur J Cancer 2021; 151:136-149. [PMID: 33984662 DOI: 10.1016/j.ejca.2021.04.005] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 03/11/2021] [Accepted: 04/06/2021] [Indexed: 02/06/2023]
Abstract
Amplification of fibroblast growth factor receptor 1 (FGFR1) in non-small cell lung cancer (NSCLC) has been considered as an actionable drug target. However, pan-FGFR tyrosine kinase inhibitors did not demonstrate convincing clinical efficacy in FGFR1-amplified NSCLC patients. This study aimed to characterise the molecular context of FGFR1 expression and to define biomarkers predictive of FGFR1 inhibitor response. In this study, 635 NSCLC samples were characterised for FGFR1 protein expression by immunohistochemistry and copy number gain (CNG) by in situ hybridisation (n = 298) or DNA microarray (n = 189). FGFR1 gene expression (n = 369) and immune cell profiles (n = 309) were also examined. Furthermore, gene expression, methylation and microRNA data from The Cancer Genome Atlas (TCGA) were compared. A panel of FGFR1-amplified NSCLC patient-derived xenograft (PDX) models were tested for response to the selective FGFR1 antagonist M6123. A minority of patients demonstrated FGFR1 CNG (10.5%) or increased FGFR1 mRNA (8.7%) and protein expression (4.4%). FGFR1 CNG correlated weakly with FGFR1 gene and protein expression. Tumours overexpressing FGFR1 protein were typically devoid of driver alterations (e.g. EGFR, KRAS) and showed reduced infiltration of T-lymphocytes and lower PD-L1 expression. Promoter methylation and microRNA were identified as regulators of FGFR1 expression in NSCLC and other cancers. Finally, NSCLC PDX models demonstrating FGFR1 amplification and FGFR1 protein overexpression were sensitive to M6123. The unique molecular and immune features of tumours with high FGFR1 expression provide a rationale to stratify patients in future clinical trials of FGFR1 pathway-targeting agents.
Collapse
MESH Headings
- Animals
- Antineoplastic Agents/pharmacology
- B7-H1 Antigen/metabolism
- Carcinoma, Non-Small-Cell Lung/drug therapy
- Carcinoma, Non-Small-Cell Lung/genetics
- Carcinoma, Non-Small-Cell Lung/immunology
- Carcinoma, Non-Small-Cell Lung/metabolism
- DNA Methylation
- Epigenesis, Genetic
- Female
- Gene Amplification
- Gene Expression Regulation, Neoplastic
- Humans
- Lung Neoplasms/drug therapy
- Lung Neoplasms/genetics
- Lung Neoplasms/immunology
- Lung Neoplasms/metabolism
- Lymphocytes, Tumor-Infiltrating/immunology
- Lymphocytes, Tumor-Infiltrating/metabolism
- Mice, Inbred NOD
- Mice, SCID
- MicroRNAs/genetics
- MicroRNAs/metabolism
- Molecular Targeted Therapy
- Receptor, Fibroblast Growth Factor, Type 1/antagonists & inhibitors
- Receptor, Fibroblast Growth Factor, Type 1/genetics
- Receptor, Fibroblast Growth Factor, Type 1/metabolism
- T-Lymphocytes/immunology
- T-Lymphocytes/metabolism
- Tumor Microenvironment
- Xenograft Model Antitumor Assays
- Mice
Collapse
Affiliation(s)
- Olga Bogatyrova
- Translational Innovation Platform Oncology & Immuno-Oncology, Merck KGaA, Darmstadt, Germany
| | - Johanna S M Mattsson
- Dept. of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Edith M Ross
- Translational Medicine, Merck KGaA, Darmstadt, Germany
| | - Michael P Sanderson
- Translational Innovation Platform Oncology & Immuno-Oncology, Merck KGaA, Darmstadt, Germany
| | - Max Backman
- Dept. of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Johan Botling
- Dept. of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Hans Brunnström
- Division of Pathology, Lund University, Skåne University Hospital, Lund, Sweden
| | - Pinja Kurppa
- Dept. of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Linnéa La Fleur
- Dept. of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Carina Strell
- Dept. of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Claudia Wilm
- Translational Innovation Platform Oncology & Immuno-Oncology, Merck KGaA, Darmstadt, Germany
| | - Astrid Zimmermann
- Translational Innovation Platform Oncology & Immuno-Oncology, Merck KGaA, Darmstadt, Germany
| | - Christina Esdar
- Translational Innovation Platform Oncology & Immuno-Oncology, Merck KGaA, Darmstadt, Germany
| | - Patrick Micke
- Dept. of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden.
| |
Collapse
|
16
|
Bhattacharya S, Hu Z, Butte AJ. Opportunities and Challenges in Democratizing Immunology Datasets. Front Immunol 2021; 12:647536. [PMID: 33936065 PMCID: PMC8086961 DOI: 10.3389/fimmu.2021.647536] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Accepted: 03/04/2021] [Indexed: 11/26/2022] Open
Abstract
The field of immunology is rapidly progressing toward a systems-level understanding of immunity to tackle complex infectious diseases, autoimmune conditions, cancer, and beyond. In the last couple of decades, advancements in data acquisition techniques have presented opportunities to explore untapped areas of immunological research. Broad initiatives are launched to disseminate the datasets siloed in the global, federated, or private repositories, facilitating interoperability across various research domains. Concurrently, the application of computational methods, such as network analysis, meta-analysis, and machine learning have propelled the field forward by providing insight into salient features that influence the immunological response, which was otherwise left unexplored. Here, we review the opportunities and challenges in democratizing datasets, repositories, and community-wide knowledge sharing tools. We present use cases for repurposing open-access immunology datasets with advanced machine learning applications and more.
Collapse
Affiliation(s)
- Sanchita Bhattacharya
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, United States
- Department of Pediatrics, University of California, San Francisco, San Francisco, CA, United States
| | - Zicheng Hu
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, United States
- Department of Pediatrics, University of California, San Francisco, San Francisco, CA, United States
| | - Atul J. Butte
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, United States
- Department of Pediatrics, University of California, San Francisco, San Francisco, CA, United States
| |
Collapse
|
17
|
Groleau M, White F, Cardenas A, Perron P, Hivert MF, Bouchard L, Jacques PÉ. Comparative epigenome-wide analysis highlights placenta-specific differentially methylated regions. Epigenomics 2021; 13:357-368. [PMID: 33661023 DOI: 10.2217/epi-2020-0271] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Aim: The placenta undergoes DNA methylation (DNAm) programming that is unique compared with all other fetal tissues. We aim to decipher some of the physiologic roles of the placenta by comparing its DNAm profile with that of another fetal tissue. Materials & methods: We performed a comparative analysis of genome-wide DNAm of 444 placentas paired with cord blood samples collected at birth. Gene ontology term analyses were conducted on the resulting differentially methylated regions. Results: Genomic regions upstream of transcription start sites showing lower DNAm in the placenta were enriched with terms related to miRNA functions and genes encoding G-protein-coupled receptors. Conclusion: These results highlight genomic regions that are differentially methylated in the placenta in contrast to fetal blood.
Collapse
Affiliation(s)
- Marika Groleau
- Département de Biologie, Université de Sherbrooke, Sherbrooke, Québec, J1K 2R1, Canada
| | - Frédérique White
- Département de Biologie, Université de Sherbrooke, Sherbrooke, Québec, J1K 2R1, Canada
| | - Andres Cardenas
- Division of Environmental Health Sciences, School of Public Health, University of California, Berkeley, CA, 94720-7360, USA
| | - Patrice Perron
- Département de Médecine, Université de Sherbrooke, Sherbrooke, Québec, J1K 2R1, Canada.,Centre de Recherche du Centre Hospitalier Universitaire de Sherbrooke, Sherbrooke, Québec, J1H 5N4, Canada
| | - Marie-France Hivert
- Centre de Recherche du Centre Hospitalier Universitaire de Sherbrooke, Sherbrooke, Québec, J1H 5N4, Canada.,Department of Population Medicine, Harvard Pilgrim Health Care Institute, Harvard Medical School, Boston, MA, 02115, USA.,Diabetes Unit, Massachusetts General Hospital, Boston, MA, 02114, USA
| | - Luigi Bouchard
- Centre de Recherche du Centre Hospitalier Universitaire de Sherbrooke, Sherbrooke, Québec, J1H 5N4, Canada.,Department of Biochemistry & Functional Genomics, Université de Sherbrooke, Sherbrooke, Québec, J1H 5N4, Canada.,Department of Medical Biology, CIUSSS Saguenay-Lac-Saint-Jean, Hôpital de Chicoutimi, Saguenay, Québec, G7H 7K9, Canada
| | - Pierre-Étienne Jacques
- Département de Biologie, Université de Sherbrooke, Sherbrooke, Québec, J1K 2R1, Canada.,Centre de Recherche du Centre Hospitalier Universitaire de Sherbrooke, Sherbrooke, Québec, J1H 5N4, Canada
| |
Collapse
|
18
|
Russkikh N, Antonets D, Shtokalo D, Makarov A, Vyatkin Y, Zakharov A, Terentyev E. Style transfer with variational autoencoders is a promising approach to RNA-Seq data harmonization and analysis. Bioinformatics 2020; 36:5076-5085. [PMID: 33026062 PMCID: PMC7755413 DOI: 10.1093/bioinformatics/btaa624] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Revised: 04/06/2020] [Accepted: 07/09/2020] [Indexed: 11/30/2022] Open
Abstract
Motivation The transcriptomic data are being frequently used in the research of biomarker genes of different diseases and biological states. The most common tasks there are the data harmonization and treatment outcome prediction. Both of them can be addressed via the style transfer approach. Either technical factors or any biological details about the samples which we would like to control (gender, biological state, treatment, etc.) can be used as style components. Results The proposed style transfer solution is based on Conditional Variational Autoencoders, Y-Autoencoders and adversarial feature decomposition. To quantitatively measure the quality of the style transfer, neural network classifiers which predict the style and semantics after training on real expression were used. Comparison with several existing style-transfer based approaches shows that proposed model has the highest style prediction accuracy on all considered datasets while having comparable or the best semantics prediction accuracy. Availability and implementation https://github.com/NRshka/stvae-source. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nikolai Russkikh
- AcademGene LLC, Novosibirsk 630090, Russia.,Laboratory of Complex Systems Simulation, A.P.Ershov Institute of Informatics Systems SB RAS, Novosibirsk 630090, Russia
| | - Denis Antonets
- AcademGene LLC, Novosibirsk 630090, Russia.,Laboratory of Complex Systems Simulation, A.P.Ershov Institute of Informatics Systems SB RAS, Novosibirsk 630090, Russia.,Theoretical Department, Research Center of Virology and Biotechnology "Vector" Rospotrebnadzor, Koltsovo 630559, Russia
| | - Dmitry Shtokalo
- AcademGene LLC, Novosibirsk 630090, Russia.,Laboratory of Complex Systems Simulation, A.P.Ershov Institute of Informatics Systems SB RAS, Novosibirsk 630090, Russia.,Cancer Research Foundation, Moscow 109316, Russia
| | | | | | | | | |
Collapse
|
19
|
Lung PY, Zhong D, Pang X, Li Y, Zhang J. Maximizing the reusability of gene expression data by predicting missing metadata. PLoS Comput Biol 2020; 16:e1007450. [PMID: 33156882 PMCID: PMC7673503 DOI: 10.1371/journal.pcbi.1007450] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2019] [Revised: 11/18/2020] [Accepted: 10/09/2020] [Indexed: 11/18/2022] Open
Abstract
Reusability is part of the FAIR data principle, which aims to make data Findable, Accessible, Interoperable, and Reusable. One of the current efforts to increase the reusability of public genomics data has been to focus on the inclusion of quality metadata associated with the data. When necessary metadata are missing, most researchers will consider the data useless. In this study, we developed a framework to predict the missing metadata of gene expression datasets to maximize their reusability. We found that when using predicted data to conduct other analyses, it is not optimal to use all the predicted data. Instead, one should only use the subset of data, which can be predicted accurately. We proposed a new metric called Proportion of Cases Accurately Predicted (PCAP), which is optimized in our specifically-designed machine learning pipeline. The new approach performed better than pipelines using commonly used metrics such as F1-score in terms of maximizing the reusability of data with missing values. We also found that different variables might need to be predicted using different machine learning methods and/or different data processing protocols. Using differential gene expression analysis as an example, we showed that when missing variables are accurately predicted, the corresponding gene expression data can be reliably used in downstream analyses.
Collapse
Affiliation(s)
- Pei-Yau Lung
- Department of Statistics, Florida State University, Tallahassee, United States of America
| | - Dongrui Zhong
- Department of Statistics, Florida State University, Tallahassee, United States of America
| | - Xiaodong Pang
- Insilicom LLC, Tallahassee, United States of America
| | - Yan Li
- Department of Breast Surgery, Peking Union Medical College Hospital, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, China
| | - Jinfeng Zhang
- Department of Statistics, Florida State University, Tallahassee, United States of America
- * E-mail:
| |
Collapse
|
20
|
Low Entropy Sub-Networks Prevent the Integration of Metabolomic and Transcriptomic Data. ENTROPY 2020; 22:e22111238. [PMID: 33287006 PMCID: PMC7712986 DOI: 10.3390/e22111238] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 10/23/2020] [Accepted: 10/27/2020] [Indexed: 02/08/2023]
Abstract
The constantly and rapidly increasing amount of the biological data gained from many different high-throughput experiments opens up new possibilities for data- and model-driven inference. Yet, alongside, emerges a problem of risks related to data integration techniques. The latter are not so widely taken account of. Especially, the approaches based on the flux balance analysis (FBA) are sensitive to the structure of a metabolic network for which the low-entropy clusters can prevent the inference from the activity of the metabolic reactions. In the following article, we set forth problems that may arise during the integration of metabolomic data with gene expression datasets. We analyze common pitfalls, provide their possible solutions, and exemplify them by a case study of the renal cell carcinoma (RCC). Using the proposed approach we provide a metabolic description of the known morphological RCC subtypes and suggest a possible existence of the poor-prognosis cluster of patients, which are commonly characterized by the low activity of the drug transporting enzymes crucial in the chemotherapy. This discovery suits and extends the already known poor-prognosis characteristics of RCC. Finally, the goal of this work is also to point out the problem that arises from the integration of high-throughput data with the inherently nonuniform, manually curated low-throughput data. In such cases, the over-represented information may potentially overshadow the non-trivial discoveries.
Collapse
|
21
|
Fanfani V, Cassano F, Stracquadanio G. PyGNA: a unified framework for geneset network analysis. BMC Bioinformatics 2020; 21:476. [PMID: 33092528 PMCID: PMC7579948 DOI: 10.1186/s12859-020-03801-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2020] [Accepted: 10/06/2020] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Gene and protein interaction experiments provide unique opportunities to study the molecular wiring of a cell. Integrating high-throughput functional genomics data with this information can help identifying networks associated with complex diseases and phenotypes. RESULTS Here we introduce an integrated statistical framework to test network properties of single and multiple genesets under different interaction models. We implemented this framework as an open-source software, called Python Geneset Network Analysis (PyGNA). Our software is designed for easy integration into existing analysis pipelines and to generate high quality figures and reports. We also developed PyGNA to take advantage of multi-core systems to generate calibrated null distributions on large datasets. We then present the results of extensive benchmarking of the tests implemented in PyGNA and a use case inspired by RNA sequencing data analysis, showing how PyGNA can be easily integrated to study biological networks. PyGNA is available at http://github.com/stracquadaniolab/pygna and can be easily installed using the PyPi or Anaconda package managers, and Docker. CONCLUSIONS We present a tool for network-aware geneset analysis. PyGNA can either be readily used and easily integrated into existing high-performance data analysis pipelines or as a Python package to implement new tests and analyses. With the increasing availability of population-scale omic data, PyGNA provides a viable approach for large scale geneset network analysis.
Collapse
Affiliation(s)
- Viola Fanfani
- School of Biological Science, The University of Edinburgh, Edinburgh, EH9 3BF, UK
| | - Fabio Cassano
- School of Biological Science, The University of Edinburgh, Edinburgh, EH9 3BF, UK
| | | |
Collapse
|
22
|
Abstract
RATIONALE There is growing evidence that common variants and rare sequence alterations in regulatory sequences can result in birth defects or predisposition to disease. Congenital heart defects are the most common birth defect and have a clear genetic component, yet only a third of cases can be attributed to structural variation in the genome or a mutation in a gene. The remaining unknown cases could be caused by alterations in regulatory sequences. OBJECTIVE Identify regulatory sequences and gene expression networks that are active during organogenesis of the human heart. Determine whether these sites and networks are enriched for disease-relevant genes and associated genetic variation. METHODS AND RESULTS We characterized ChromHMM (chromatin state) and gene expression dynamics during human heart organogenesis. We profiled 7 histone modifications in embryonic hearts from each of 9 distinct Carnegie stages (13-14, 16-21, and 23), annotated chromatin states, and compared these maps to over 100 human tissues and cell types. We also generated RNA-sequencing data, performed differential expression, and constructed weighted gene coexpression networks. We identified 177 412 heart enhancers; 12 395 had not been previously annotated as strong enhancers. We identified 92% of all functionally validated heart-positive enhancers (n=281; 7.5× enrichment; P<2.2×10-16). Integration of these data demonstrated novel heart enhancers are enriched near genes expressed more strongly in cardiac tissue and are enriched for variants associated with ECG measures and atrial fibrillation. Our gene expression network analysis identified gene modules strongly enriched for heart-related functions, regulatory control by heart-specific enhancers, and putative disease genes. CONCLUSIONS Well-connected hub genes with heart-specific expression targeted by embryonic heart-specific enhancers are likely disease candidates. Our functional annotations will allow for better interpretation of whole genome sequencing data in the large number of patients affected by congenital heart defects.
Collapse
Affiliation(s)
- Jennifer VanOudenhove
- Genetics and Genome Sciences, University of Connecticut School of Medicine, Farmington CT, USA
| | - Tara N. Yankee
- Genetics and Genome Sciences, University of Connecticut School of Medicine, Farmington CT, USA
- Graduate Program in Genetics and Developmental Biology, UConn Health, Farmington CT, USA
| | - Andrea Wilderman
- Genetics and Genome Sciences, University of Connecticut School of Medicine, Farmington CT, USA
- Graduate Program in Genetics and Developmental Biology, UConn Health, Farmington CT, USA
| | - Justin Cotney
- Genetics and Genome Sciences, University of Connecticut School of Medicine, Farmington CT, USA
- Institute for Systems Genomics, UConn, Storrs CT, USA
| |
Collapse
|
23
|
Kumar M, Papaleo E. A pan-cancer assessment of alterations of the kinase domain of ULK1, an upstream regulator of autophagy. Sci Rep 2020; 10:14874. [PMID: 32913252 PMCID: PMC7483646 DOI: 10.1038/s41598-020-71527-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Accepted: 06/22/2020] [Indexed: 02/06/2023] Open
Abstract
Autophagy is a key clearance process to recycle damaged cellular components. One important upstream regulator of autophagy is ULK1 kinase. Several three-dimensional structures of the ULK1 catalytic domain are available, but a comprehensive study, including molecular dynamics, is missing. Also, an exhaustive description of ULK1 alterations found in cancer samples is presently lacking. We here applied a framework which links -omics data to structural protein ensembles to study ULK1 alterations from genomics data available for more than 30 cancer types. We predicted the effects of mutations on ULK1 function and structural stability, accounting for protein dynamics, and the different layers of changes that a mutation can induce in a protein at the functional and structural level. ULK1 is down-regulated in gynecological tumors. In other cancer types, ULK2 could compensate for ULK1 downregulation and, in the majority of the cases, no marked changes in expression have been found. 36 missense mutations of ULK1, not limited to the catalytic domain, are co-occurring with mutations in a large number of ULK1 interactors or substrates, suggesting a pronounced effect of the upstream steps of autophagy in many cancer types. Moreover, our results pinpoint that more than 50% of the mutations in the kinase domain of ULK1, here investigated, are predicted to affect protein stability. Three mutations (S184F, D102N, and A28V) are predicted with only impact on kinase activity, either modifying the functional dynamics or the capability to exert effects from distal sites to the functional and catalytic regions. The framework here applied could be extended to other protein targets to aid the classification of missense mutations from cancer genomics studies, as well as to prioritize variants for experimental validation, or to select the appropriate biological readouts for experiments.
Collapse
Affiliation(s)
- Mukesh Kumar
- Computational Biology Laboratory, Center for Autophagy, Recycling and Disease (CARD), Danish Cancer Society Research Center, Strandboulevarden 49, 2100, Copenhagen, Denmark
| | - Elena Papaleo
- Computational Biology Laboratory, Center for Autophagy, Recycling and Disease (CARD), Danish Cancer Society Research Center, Strandboulevarden 49, 2100, Copenhagen, Denmark.
- Translational Disease System Biology, Faculty of Health and Medical Sciences, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
24
|
Vemuri S, Srivastava R, Mir Q, Hashemikhabir S, Dong XC, Janga SC. SliceIt: A genome-wide resource and visualization tool to design CRISPR/Cas9 screens for editing protein-RNA interaction sites in the human genome. Methods 2020; 178:104-113. [PMID: 31494246 PMCID: PMC7056568 DOI: 10.1016/j.ymeth.2019.09.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Revised: 06/25/2019] [Accepted: 09/01/2019] [Indexed: 12/26/2022] Open
Abstract
Several protein-RNA cross linking protocols have been established in recent years to delineate the molecular interaction of an RNA Binding Protein (RBP) and its target RNAs. However, functional dissection of the role of the RBP binding sites in modulating the post-transcriptional fate of the target RNA remains challenging. CRISPR/Cas9 genome editing system is being commonly employed to perturb both coding and noncoding regions in the genome. With the advancements in genome-scale CRISPR/Cas9 screens, it is now possible to not only perturb specific binding sites but also probe the global impact of protein-RNA interaction sites across cell types. Here, we present SliceIt (http://sliceit.soic.iupui.edu/), a database of in silico sgRNA (single guide RNA) library to facilitate conducting such high throughput screens. SliceIt comprises of ~4.8 million unique sgRNAs with an estimated range of 2-8 sgRNAs designed per RBP binding site, for eCLIP experiments of >100 RBPs in HepG2 and K562 cell lines from the ENCODE project. SliceIt provides a user friendly environment, developed using advanced search engine framework, Elasticsearch. It is available in both table and genome browser views facilitating the easy navigation of RBP binding sites, designed sgRNAs, exon expression levels across 53 human tissues along with prevalence of SNPs and GWAS hits on binding sites. Exon expression profiles enable examination of locus specific changes proximal to the binding sites. Users can also upload custom tracks of various file formats directly onto genome browser, to navigate additional genomic features in the genome and compare with other types of omics profiles. All the binding site-centric information is dynamically accessible via "search by gene", "search by coordinates" and "search by RBP" options and readily available to download. Validation of the sgRNA library in SliceIt was performed by selecting RBP binding sites in Lipt1 gene and designing sgRNAs. Effect of CRISPR/Cas9 perturbations on the selected binding sites in HepG2 cell line, was confirmed based on altered proximal exon expression levels using qPCR, further supporting the utility of the resource to design experiments for perturbing protein-RNA interaction networks. Thus, SliceIt provides a one-stop repertoire of guide RNA library to perturb RBP binding sites, along with several layers of functional information to design both low and high throughput CRISPR/Cas9 screens, for studying the phenotypes and diseases associated with RBP binding sites.
Collapse
Affiliation(s)
- Sasank Vemuri
- Department of BioHealth Informatics, School of Informatics and Computing, Indiana University Purdue University, 719 Indiana Ave Ste 319, Walker Plaza Building, Indianapolis, IN 46202, United States
| | - Rajneesh Srivastava
- Department of BioHealth Informatics, School of Informatics and Computing, Indiana University Purdue University, 719 Indiana Ave Ste 319, Walker Plaza Building, Indianapolis, IN 46202, United States
| | - Quoseena Mir
- Department of BioHealth Informatics, School of Informatics and Computing, Indiana University Purdue University, 719 Indiana Ave Ste 319, Walker Plaza Building, Indianapolis, IN 46202, United States
| | - Seyedsasan Hashemikhabir
- Department of BioHealth Informatics, School of Informatics and Computing, Indiana University Purdue University, 719 Indiana Ave Ste 319, Walker Plaza Building, Indianapolis, IN 46202, United States
| | - X Charlie Dong
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, 635 Barnhill Drive, Indianapolis, IN 46202, United States
| | - Sarath Chandra Janga
- Department of BioHealth Informatics, School of Informatics and Computing, Indiana University Purdue University, 719 Indiana Ave Ste 319, Walker Plaza Building, Indianapolis, IN 46202, United States; Department of Medical and Molecular Genetics, Indiana University School of Medicine, Medical Research and Library Building, 975 West Walnut Street, Indianapolis, IN 46202, United States; Centre for Computational Biology and Bioinformatics, Indiana University School of Medicine, 5021 Health Information and Translational Sciences (HITS), 410 West 10th Street, Indianapolis, IN 46202, United States.
| |
Collapse
|
25
|
Lederer S, Heskes T, van Heeringen SJ, Albers CA. Investigating the effect of dependence between conditions with Bayesian Linear Mixed Models for motif activity analysis. PLoS One 2020; 15:e0231824. [PMID: 32357166 PMCID: PMC7194367 DOI: 10.1371/journal.pone.0231824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Accepted: 04/01/2020] [Indexed: 11/19/2022] Open
Abstract
MOTIVATION Cellular identity and behavior is controlled by complex gene regulatory networks. Transcription factors (TFs) bind to specific DNA sequences to regulate the transcription of their target genes. On the basis of these TF motifs in cis-regulatory elements we can model the influence of TFs on gene expression. In such models of TF motif activity the data is usually modeled assuming a linear relationship between the motif activity and the gene expression level. A commonly used method to model motif influence is based on Ridge Regression. One important assumption of linear regression is the independence between samples. However, if samples are generated from the same cell line, tissue, or other biological source, this assumption may be invalid. This same assumption of independence is also applied to different yet similar experimental conditions, which may also be inappropriate. In theory, the independence assumption between samples could lead to loss in signal detection. Here we investigate whether a Bayesian model that allows for correlations results in more accurate inference of motif activities. RESULTS We extend the Ridge Regression to a Bayesian Linear Mixed Model, which allows us to model dependence between different samples. In a simulation study, we investigate the differences between the two model assumptions. We show that our Bayesian Linear Mixed Model implementation outperforms Ridge Regression in a simulation scenario where the noise, which is the signal that can not be explained by TF motifs, is uncorrelated. However, we demonstrate that there is no such gain in performance if the noise has a similar covariance structure over samples as the signal that can be explained by motifs. We give a mathematical explanation to why this is the case. Using four representative real datasets we show that at most ∼​40% of the signal is explained by motifs using the linear model. With these data there is no advantage to using the Bayesian Linear Mixed Model, due to the similarity of the covariance structure. AVAILABILITY & IMPLEMENTATION The project implementation is available at https://github.com/Sim19/SimGEXPwMotifs.
Collapse
Affiliation(s)
- Simone Lederer
- Data Science, Radboud University, Institute for Computing and Information Sciences, Nijmegen, The Netherlands
- Molecular Developmental Biology, Radboud University, Research Institute for Molecular Life Sciences, Nijmegen, The Netherlands
- * E-mail: (LS); (VHS)
| | - Tom Heskes
- Data Science, Radboud University, Institute for Computing and Information Sciences, Nijmegen, The Netherlands
| | - Simon J. van Heeringen
- Molecular Developmental Biology, Radboud University, Research Institute for Molecular Life Sciences, Nijmegen, The Netherlands
- * E-mail: (LS); (VHS)
| | - Cornelis A. Albers
- Molecular Developmental Biology, Radboud University, Research Institute for Molecular Life Sciences, Nijmegen, The Netherlands
| |
Collapse
|
26
|
Nieuwenhuis TO, Yang SY, Verma RX, Pillalamarri V, Arking DE, Rosenberg AZ, McCall MN, Halushka MK. Consistent RNA sequencing contamination in GTEx and other data sets. Nat Commun 2020; 11:1933. [PMID: 32321923 PMCID: PMC7176728 DOI: 10.1038/s41467-020-15821-9] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2019] [Accepted: 03/23/2020] [Indexed: 01/15/2023] Open
Abstract
A challenge of next generation sequencing is read contamination. We use Genotype-Tissue Expression (GTEx) datasets and technical metadata along with RNA-seq datasets from other studies to understand factors that contribute to contamination. Here we report, of 48 analyzed tissues in GTEx, 26 have variant co-expression clusters of four highly expressed and pancreas-enriched genes (PRSS1, PNLIP, CLPS, and/or CELA3A). Fourteen additional highly expressed genes from other tissues also indicate contamination. Sample contamination is strongly associated with a sample being sequenced on the same day as a tissue that natively expresses those genes. Discrepant SNPs across four contaminating genes validate the contamination. Low-level contamination affects ~40% of samples and leads to numerous eQTL assignments in inappropriate tissues among these 18 genes. This type of contamination occurs widely, impacting bulk and single cell (scRNA-seq) data set analysis. In conclusion, highly expressed, tissue-enriched genes basally contaminate GTEx and other datasets impacting analyses.
Collapse
Affiliation(s)
- Tim O Nieuwenhuis
- Department of Pathology, Johns Hopkins University SOM, Baltimore, MD, 21205, USA
- McKusick-Nathans Institute, Department of Genetic Medicine, Johns Hopkins University SOM, Baltimore, MD, 21205, USA
| | - Stephanie Y Yang
- McKusick-Nathans Institute, Department of Genetic Medicine, Johns Hopkins University SOM, Baltimore, MD, 21205, USA
| | - Rohan X Verma
- Department of Pathology, Johns Hopkins University SOM, Baltimore, MD, 21205, USA
| | - Vamsee Pillalamarri
- McKusick-Nathans Institute, Department of Genetic Medicine, Johns Hopkins University SOM, Baltimore, MD, 21205, USA
| | - Dan E Arking
- McKusick-Nathans Institute, Department of Genetic Medicine, Johns Hopkins University SOM, Baltimore, MD, 21205, USA
| | - Avi Z Rosenberg
- Department of Pathology, Johns Hopkins University SOM, Baltimore, MD, 21205, USA
| | - Matthew N McCall
- Department of Biostatistics and Computational Biology, University of Rochester Medical Center, Rochester, NY, 14642, USA
| | - Marc K Halushka
- Department of Pathology, Johns Hopkins University SOM, Baltimore, MD, 21205, USA.
| |
Collapse
|
27
|
David JK, Maden SK, Weeder BR, Thompson R, Nellore A. Putatively cancer-specific exon-exon junctions are shared across patients and present in developmental and other non-cancer cells. NAR Cancer 2020; 2:zcaa001. [PMID: 34316681 PMCID: PMC8209686 DOI: 10.1093/narcan/zcaa001] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Revised: 01/06/2020] [Accepted: 01/14/2020] [Indexed: 01/08/2023] Open
Abstract
This study probes the distribution of putatively cancer-specific junctions across a broad set of publicly available non-cancer human RNA sequencing (RNA-seq) datasets. We compared cancer and non-cancer RNA-seq data from The Cancer Genome Atlas (TCGA), the Genotype-Tissue Expression (GTEx) Project and the Sequence Read Archive. We found that (i) averaging across cancer types, 80.6% of exon-exon junctions thought to be cancer-specific based on comparison with tissue-matched samples (σ = 13.0%) are in fact present in other adult non-cancer tissues throughout the body; (ii) 30.8% of junctions not present in any GTEx or TCGA normal tissues are shared by multiple samples within at least one cancer type cohort, and 87.4% of these distinguish between different cancer types; and (iii) many of these junctions not found in GTEx or TCGA normal tissues (15.4% on average, σ = 2.4%) are also found in embryological and other developmentally associated cells. These findings refine the meaning of RNA splicing event novelty, particularly with respect to the human neoepitope repertoire. Ultimately, cancer-specific exon-exon junctions may have a substantial causal relationship with the biology of disease.
Collapse
Affiliation(s)
- Julianne K David
- Computational Biology Program, Oregon Health & Science University, Portland, OR 97239, USA
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR 97239, USA
| | - Sean K Maden
- Computational Biology Program, Oregon Health & Science University, Portland, OR 97239, USA
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR 97239, USA
| | - Benjamin R Weeder
- Computational Biology Program, Oregon Health & Science University, Portland, OR 97239, USA
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR 97239, USA
| | - Reid F Thompson
- Computational Biology Program, Oregon Health & Science University, Portland, OR 97239, USA
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR 97239, USA
- Department of Radiation Medicine, Oregon Health & Science University, Portland, OR 97239, USA
- Portland VA Research Foundation, Portland, OR 97239, USA
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR 97239, USA
- Division of Hospital and Specialty Medicine, VA Portland Healthcare System, Portland, OR 97239, USA
- Cancer Early Detection Advanced Research Center, Oregon Health & Science University, Portland, OR 97239, USA
| | - Abhinav Nellore
- Computational Biology Program, Oregon Health & Science University, Portland, OR 97239, USA
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR 97239, USA
- Department of Surgery, Oregon Health & Science University, Portland, OR 97239, USA
| |
Collapse
|
28
|
Imada EL, Sanchez DF, Collado-Torres L, Wilks C, Matam T, Dinalankara W, Stupnikov A, Lobo-Pereira F, Yip CW, Yasuzawa K, Kondo N, Itoh M, Suzuki H, Kasukawa T, Hon CC, de Hoon MJL, Shin JW, Carninci P, Jaffe AE, Leek JT, Favorov A, Franco GR, Langmead B, Marchionni L. Recounting the FANTOM CAGE-Associated Transcriptome. Genome Res 2020; 30:1073-1081. [PMID: 32079618 PMCID: PMC7397872 DOI: 10.1101/gr.254656.119] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2019] [Accepted: 02/11/2020] [Indexed: 02/02/2023]
Abstract
Long noncoding RNAs (lncRNAs) have emerged as key coordinators of biological and cellular processes. Characterizing lncRNA expression across cells and tissues is key to understanding their role in determining phenotypes, including human diseases. We present here FC-R2, a comprehensive expression atlas across a broadly defined human transcriptome, inclusive of over 109,000 coding and noncoding genes, as described in the FANTOM CAGE-Associated Transcriptome (FANTOM-CAT) study. This atlas greatly extends the gene annotation used in the original recount2 resource. We demonstrate the utility of the FC-R2 atlas by reproducing key findings from published large studies and by generating new results across normal and diseased human samples. In particular, we (a) identify tissue-specific transcription profiles for distinct classes of coding and noncoding genes, (b) perform differential expression analysis across thirteen cancer types, identifying novel noncoding genes potentially involved in tumor pathogenesis and progression, and (c) confirm the prognostic value for several enhancer lncRNAs expression in cancer. Our resource is instrumental for the systematic molecular characterization of lncRNA by the FANTOM6 Consortium. In conclusion, comprised of over 70,000 samples, the FC-R2 atlas will empower other researchers to investigate functions and biological roles of both known coding genes and novel lncRNAs.
Collapse
Affiliation(s)
- Eddie Luidy Imada
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland 21827, USA.,Departamento de Bioqúımica e Imunologia, ICB, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, 31270-901, Brazil
| | - Diego Fernando Sanchez
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland 21827, USA
| | | | - Christopher Wilks
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Tejasvi Matam
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland 21827, USA
| | - Wikum Dinalankara
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland 21827, USA
| | - Aleksey Stupnikov
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland 21827, USA
| | - Francisco Lobo-Pereira
- Departamento de Biologia General, ICB, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, 31270-901, Brazil
| | - Chi-Wai Yip
- RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Japan
| | - Kayoko Yasuzawa
- RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Japan
| | - Naoto Kondo
- RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Japan
| | - Masayoshi Itoh
- RIKEN, Preventive Medicine and Diagnostic Innovation Program, Yokohama, 351-0198, Japan
| | - Harukazu Suzuki
- RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Japan
| | - Takeya Kasukawa
- RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Japan
| | - Chung-Chau Hon
- RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Japan
| | | | - Jay W Shin
- RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Japan
| | - Piero Carninci
- RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Japan
| | - Andrew E Jaffe
- Lieber Institute for Brain Development, Baltimore, Maryland 21205, USA.,Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland 21205, USA.,Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland 21205, USA
| | - Jeffrey T Leek
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland 21205, USA
| | - Alexander Favorov
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland 21827, USA.,Laboratory of Systems Biology and Computational Genetics, VIGG RAS, 117971 Moscow, Russia
| | - Gloria R Franco
- Departamento de Bioqúımica e Imunologia, ICB, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, 31270-901, Brazil
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA.,Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland 21205, USA
| | - Luigi Marchionni
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland 21827, USA
| |
Collapse
|
29
|
Sepulveda JL. Using R and Bioconductor in Clinical Genomics and Transcriptomics. J Mol Diagn 2019; 22:3-20. [PMID: 31605800 DOI: 10.1016/j.jmoldx.2019.08.006] [Citation(s) in RCA: 75] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2017] [Revised: 05/02/2019] [Accepted: 08/08/2019] [Indexed: 02/08/2023] Open
Abstract
Bioinformatics pipelines are essential in the analysis of genomic and transcriptomic data generated by next-generation sequencing (NGS). Recent guidelines emphasize the need for rigorous validation and assessment of robustness, reproducibility, and quality of NGS analytic pipelines intended for clinical use. Software tools written in the R statistical language and, in particular, the set of tools available in the Bioconductor repository are widely used in research bioinformatics; and these frameworks offer several advantages for use in clinical bioinformatics, including the breath of available tools, modular nature of software packages, ease of installation, enforcement of interoperability, version control, and short learning curve. This review provides an introduction to R and Bioconductor software, its advantages and limitations for clinical bioinformatics, and illustrative examples of tools that can be used in various steps of NGS analysis.
Collapse
Affiliation(s)
- Jorge L Sepulveda
- Department of Pathology and Cell Biology, Columbia University Irving Medical Center, New York, New York; Informatics Subdivision Leadership, Association for Molecular Pathology, Bethesda, Maryland.
| |
Collapse
|
30
|
Deelen P, van Dam S, Herkert JC, Karjalainen JM, Brugge H, Abbott KM, van Diemen CC, van der Zwaag PA, Gerkes EH, Zonneveld-Huijssoon E, Boer-Bergsma JJ, Folkertsma P, Gillett T, van der Velde KJ, Kanninga R, van den Akker PC, Jan SZ, Hoorntje ET, Te Rijdt WP, Vos YJ, Jongbloed JDH, van Ravenswaaij-Arts CMA, Sinke R, Sikkema-Raddatz B, Kerstjens-Frederikse WS, Swertz MA, Franke L. Improving the diagnostic yield of exome- sequencing by predicting gene-phenotype associations using large-scale gene expression analysis. Nat Commun 2019; 10:2837. [PMID: 31253775 PMCID: PMC6599066 DOI: 10.1038/s41467-019-10649-4] [Citation(s) in RCA: 74] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2018] [Accepted: 05/23/2019] [Indexed: 02/06/2023] Open
Abstract
The diagnostic yield of exome and genome sequencing remains low (8-70%), due to incomplete knowledge on the genes that cause disease. To improve this, we use RNA-seq data from 31,499 samples to predict which genes cause specific disease phenotypes, and develop GeneNetwork Assisted Diagnostic Optimization (GADO). We show that this unbiased method, which does not rely upon specific knowledge on individual genes, is effective in both identifying previously unknown disease gene associations, and flagging genes that have previously been incorrectly implicated in disease. GADO can be run on www.genenetwork.nl by supplying HPO-terms and a list of genes that contain candidate variants. Finally, applying GADO to a cohort of 61 patients for whom exome-sequencing analysis had not resulted in a genetic diagnosis, yields likely causative genes for ten cases.
Collapse
Affiliation(s)
- Patrick Deelen
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands.,University of Groningen, University Medical Center Groningen, Genomics Coordination Center, 9700 VB, Groningen, The Netherlands
| | - Sipko van Dam
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands
| | - Johanna C Herkert
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands
| | - Juha M Karjalainen
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands
| | - Harm Brugge
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands
| | - Kristin M Abbott
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands
| | - Cleo C van Diemen
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands
| | - Paul A van der Zwaag
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands
| | - Erica H Gerkes
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands
| | - Evelien Zonneveld-Huijssoon
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands
| | - Jelkje J Boer-Bergsma
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands
| | - Pytrik Folkertsma
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands
| | - Tessa Gillett
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands
| | - K Joeri van der Velde
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands.,University of Groningen, University Medical Center Groningen, Genomics Coordination Center, 9700 VB, Groningen, The Netherlands
| | - Roan Kanninga
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands.,University of Groningen, University Medical Center Groningen, Genomics Coordination Center, 9700 VB, Groningen, The Netherlands
| | - Peter C van den Akker
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands
| | - Sabrina Z Jan
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands
| | - Edgar T Hoorntje
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands.,Netherlands Heart Institute, 3511 EP, Utrecht, The Netherlands
| | - Wouter P Te Rijdt
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands.,Netherlands Heart Institute, 3511 EP, Utrecht, The Netherlands
| | - Yvonne J Vos
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands
| | - Jan D H Jongbloed
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands
| | - Conny M A van Ravenswaaij-Arts
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands
| | - Richard Sinke
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands
| | - Birgit Sikkema-Raddatz
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands
| | | | - Morris A Swertz
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands.,University of Groningen, University Medical Center Groningen, Genomics Coordination Center, 9700 VB, Groningen, The Netherlands
| | - Lude Franke
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands.
| |
Collapse
|
31
|
New functionalities in the TCGAbiolinks package for the study and integration of cancer data from GDC and GTEx. PLoS Comput Biol 2019; 15:e1006701. [PMID: 30835723 PMCID: PMC6420023 DOI: 10.1371/journal.pcbi.1006701] [Citation(s) in RCA: 314] [Impact Index Per Article: 52.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Revised: 03/15/2019] [Accepted: 12/10/2018] [Indexed: 02/07/2023] Open
Abstract
The advent of Next-Generation Sequencing (NGS) technologies has opened new perspectives in deciphering the genetic mechanisms underlying complex diseases. Nowadays, the amount of genomic data is massive and substantial efforts and new tools are required to unveil the information hidden in the data. The Genomic Data Commons (GDC) Data Portal is a platform that contains different genomic studies including the ones from The Cancer Genome Atlas (TCGA) and the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) initiatives, accounting for more than 40 tumor types originating from nearly 30000 patients. Such platforms, although very attractive, must make sure the stored data are easily accessible and adequately harmonized. Moreover, they have the primary focus on the data storage in a unique place, and they do not provide a comprehensive toolkit for analyses and interpretation of the data. To fulfill this urgent need, comprehensive but easily accessible computational methods for integrative analyses of genomic data that do not renounce a robust statistical and theoretical framework are required. In this context, the R/Bioconductor package TCGAbiolinks was developed, offering a variety of bioinformatics functionalities. Here we introduce new features and enhancements of TCGAbiolinks in terms of i) more accurate and flexible pipelines for differential expression analyses, ii) different methods for tumor purity estimation and filtering, iii) integration of normal samples from other platforms iv) support for other genomics datasets, exemplified here by the TARGET data. Evidence has shown that accounting for tumor purity is essential in the study of tumorigenesis, as these factors promote confounding behavior regarding differential expression analysis. With this in mind, we implemented these filtering procedures in TCGAbiolinks. Moreover, a limitation of some of the TCGA datasets is the unavailability or paucity of corresponding normal samples. We thus integrated into TCGAbiolinks the possibility to use normal samples from the Genotype-Tissue Expression (GTEx) project, which is another large-scale repository cataloging gene expression from healthy individuals. The new functionalities are available in the TCGAbiolinks version 2.8 and higher released in Bioconductor version 3.7. The advent of Next-Generation Sequencing (NGS) technologies has been generating a massive amount of data which require continuous efforts in developing and maintain computational tool for data analyses. The Genomic Data Commons (GDC) Data Portal is a platform that contains different cancer genomic studies. Such platforms have often the primary focus on the data storage and they do not provide a comprehensive toolkit for analyses. To fulfil this urgent need, comprehensive but accessible computational protocols that do not renounce a robust statistical framework are thus required. In this context, we here present the new functions of the R/Bioconductor package TCGAbiolinks to improve the discovery of differentially expressed genes in cancer and tumor (sub)types, include the estimate of tumor purity and tumor infiltrations, use normal samples from other platforms and support more broadly other genomics datasets.
Collapse
|