1
|
Rahmatallah Y, Glazko G. Improving data interpretability with new differential sample variance gene set tests. BMC Bioinformatics 2025; 26:103. [PMID: 40229677 PMCID: PMC11998189 DOI: 10.1186/s12859-025-06117-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2024] [Accepted: 03/20/2025] [Indexed: 04/16/2025] Open
Abstract
BACKGROUND Gene set analysis methods have played a major role in generating biological interpretations of omics data such as gene expression datasets. However, most methods focus on detecting homogenous pattern changes in mean expression while methods detecting pattern changes in variance remain poorly explored. While a few studies attempted to use gene-level variance analysis, such approach remains under-utilized. When comparing two phenotypes, gene sets with distinct changes in subgroups under one phenotype are overlooked by available methods although they reflect meaningful biological differences between two phenotypes. Multivariate sample-level variance analysis methods are needed to detect such pattern changes. RESULTS We used ranking schemes based on minimum spanning tree to generalize the Cramer-Von Mises and Anderson-Darling univariate statistics into multivariate gene set analysis methods to detect differential sample variance or mean. We characterized the detection power and Type I error rate of these methods in addition to two methods developed earlier using simulation results with different parameters. We applied the developed methods to microarray gene expression dataset of prednisolone-resistant and prednisolone-sensitive children diagnosed with B-lineage acute lymphoblastic leukemia and bulk RNA-sequencing gene expression dataset of benign hyperplastic polyps and potentially malignant sessile serrated adenoma/polyps. One or both of the two compared phenotypes in each of these datasets have distinct molecular subtypes that contribute to within phenotype variability and to heterogeneous differences between two compared phenotypes. Our results show that methods designed to detect differential sample variance provide meaningful biological interpretations by detecting specific hallmark gene sets associated with the two compared phenotypes as documented in available literature. CONCLUSIONS The results of this study demonstrate the usefulness of methods designed to detect differential sample variance in providing biological interpretations when biologically relevant but heterogeneous changes between two phenotypes are prevalent in specific signaling pathways. Software implementation of the methods is available with detailed documentation from Bioconductor package GSAR. The available methods are applicable to gene expression datasets in a normalized matrix form and could be used with other omics datasets in a normalized matrix form with available collection of feature sets.
Collapse
Affiliation(s)
- Yasir Rahmatallah
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, 72205, USA.
| | - Galina Glazko
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, 72205, USA
| |
Collapse
|
2
|
Rahmatallah Y, Glazko G. Improving data interpretability with new differential sample variance gene set tests. RESEARCH SQUARE 2024:rs.3.rs-4888767. [PMID: 39315246 PMCID: PMC11419169 DOI: 10.21203/rs.3.rs-4888767/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
Background Gene set analysis methods have played a major role in generating biological interpretations from omics data such as gene expression datasets. However, most methods focus on detecting homogenous pattern changes in mean expression and methods detecting pattern changes in variance remain poorly explored. While a few studies attempted to use gene-level variance analysis, such approach remains under-utilized. When comparing two phenotypes, gene sets with distinct changes in subgroups under one phenotype are overlooked by available methods although they reflect meaningful biological differences between two phenotypes. Multivariate sample-level variance analysis methods are needed to detect such pattern changes. Results We use ranking schemes based on minimum spanning tree to generalize the Cramer-Von Mises and Anderson-Darling univariate statistics into multivariate gene set analysis methods to detect differential sample variance or mean. We characterize these methods in addition to two methods developed earlier using simulation results with different parameters. We apply the developed methods to microarray gene expression dataset of prednisolone-resistant and prednisolone-sensitive children diagnosed with B-lineage acute lymphoblastic leukemia and bulk RNA-sequencing gene expression dataset of benign hyperplastic polyps and potentially malignant sessile serrated adenoma/polyps. One or both of the two compared phenotypes in each of these datasets have distinct molecular subtypes that contribute to heterogeneous differences. Our results show that methods designed to detect differential sample variance are able to detect specific hallmark signaling pathways associated with the two compared phenotypes as documented in available literature. Conclusions The results in this study demonstrate the usefulness of methods designed to detect differential sample variance in providing biological interpretations when biologically relevant but heterogeneous changes between two phenotypes are prevalent in specific signaling pathways. Software implementation of the developed methods is available with detailed documentation from Bioconductor package GSAR. The available methods are applicable to gene expression datasets in a normalized matrix form and could be used with other omics datasets in a normalized matrix form with available collection of feature sets.
Collapse
Affiliation(s)
- Yasir Rahmatallah
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA
| | - Galina Glazko
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA
| |
Collapse
|
3
|
Kim K, Kim S, Ahn T, Kim H, Shin SJ, Choi CH, Park S, Kim YB, No JH, Suh DH. A differential diagnosis between uterine leiomyoma and leiomyosarcoma using transcriptome analysis. BMC Cancer 2023; 23:1215. [PMID: 38066476 PMCID: PMC10709939 DOI: 10.1186/s12885-023-11394-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 09/11/2023] [Indexed: 12/18/2023] Open
Abstract
BACKGROUND The objective of this study was to estimate the accuracy of transcriptome-based classifier in differential diagnosis of uterine leiomyoma and leiomyosarcoma. We manually selected 114 normal uterine tissue and 31 leiomyosarcoma samples from publicly available transcriptome data in UCSC Xena as training/validation sets. We developed pre-processing procedure and gene selection method to sensitively find genes of larger variance in leiomyosarcoma than normal uterine tissues. Through our method, 17 genes were selected to build transcriptome-based classifier. The prediction accuracies of deep feedforward neural network (DNN), support vector machine (SVM), random forest (RF), and gradient boosting (GB) models were examined. We interpret the biological functionality of selected genes via network-based analysis using GeneMANIA. To validate the performance of trained model, we additionally collected 35 clinical samples of leiomyosarcoma and leiomyoma as a test set (18 + 17 as 1st and 2nd test sets). RESULTS We discovered genes expressed in a highly variable way in leiomyosarcoma while these genes are expressed in a conserved way in normal uterine samples. These genes were mainly associated with DNA replication. As gene selection and model training were made in leiomyosarcoma and uterine normal tissue, proving discriminant of ability between leiomyosarcoma and leiomyoma is necessary. Thus, further validation of trained model was conducted in newly collected clinical samples of leiomyosarcoma and leiomyoma. The DNN classifier performed sensitivity 0.88, 0.77 (8/9, 7/9) while the specificity 1.0 (8/8, 8/8) in two test data set supporting that the selected genes in conjunction with DNN classifier are well discriminating the difference between leiomyosarcoma and leiomyoma in clinical sample. CONCLUSION The transcriptome-based classifier accurately distinguished uterine leiomyosarcoma from leiomyoma. Our method can be helpful in clinical practice through the biopsy of sample in advance of surgery. Identification of leiomyosarcoma let the doctor avoid of laparoscopic surgery, thus it minimizes un-wanted tumor spread.
Collapse
Affiliation(s)
- Kidong Kim
- Department of Obstetrics and Gynecology, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Sarah Kim
- Department of Life Science, Handong Global University, Pohang, Republic of Korea
| | - TaeJin Ahn
- Department of Life Science, Handong Global University, Pohang, Republic of Korea.
| | - Hyojin Kim
- Department of Pathology, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - So-Jin Shin
- Department of Gynecology and Obstetrics, School of Medicine, Keimyung University, Daegu, Republic of Korea
| | - Chel Hun Choi
- Department of Obstetrics and Gynecology, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Sungmin Park
- Department of Life Science, Handong Global University, Pohang, Republic of Korea
| | - Yong-Beom Kim
- Department of Obstetrics and Gynecology, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Jae Hong No
- Department of Obstetrics and Gynecology, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Dong Hoon Suh
- Department of Obstetrics and Gynecology, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| |
Collapse
|
4
|
Li H, Khang TF. clrDV: a differential variability test for RNA-Seq data based on the skew-normal distribution. PeerJ 2023; 11:e16126. [PMID: 37790621 PMCID: PMC10544356 DOI: 10.7717/peerj.16126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Accepted: 08/27/2023] [Indexed: 10/05/2023] Open
Abstract
Background Pathological conditions may result in certain genes having expression variance that differs markedly from that of the control. Finding such genes from gene expression data can provide invaluable candidates for therapeutic intervention. Under the dominant paradigm for modeling RNA-Seq gene counts using the negative binomial model, tests of differential variability are challenging to develop, owing to dependence of the variance on the mean. Methods Here, we describe clrDV, a statistical method for detecting genes that show differential variability between two populations. We present the skew-normal distribution for modeling gene-wise null distribution of centered log-ratio transformation of compositional RNA-seq data. Results Simulation results show that clrDV has false discovery rate and probability of Type II error that are on par with or superior to existing methodologies. In addition, its run time is faster than its closest competitors, and remains relatively constant for increasing sample size per group. Analysis of a large neurodegenerative disease RNA-Seq dataset using clrDV successfully recovers multiple gene candidates that have been reported to be associated with Alzheimer's disease.
Collapse
Affiliation(s)
- Hongxiang Li
- Institute of Mathematical Sciences, Universiti Malaya, Kuala Lumpur, Malaysia
| | - Tsung Fei Khang
- Institute of Mathematical Sciences, Universiti Malaya, Kuala Lumpur, Malaysia
- Universiti Malaya Centre for Data Analytics, Universiti Malaya, Kuala Lumpur, Malaysia
| |
Collapse
|
5
|
Roberts AGK, Catchpoole DR, Kennedy PJ. Identification of differentially distributed gene expression and distinct sets of cancer-related genes identified by changes in mean and variability. NAR Genom Bioinform 2022; 4:lqab124. [PMID: 35047816 PMCID: PMC8759562 DOI: 10.1093/nargab/lqab124] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2021] [Revised: 11/19/2021] [Accepted: 12/16/2021] [Indexed: 12/13/2022] Open
Abstract
There is increasing evidence that changes in the variability or overall distribution of gene expression are important both in normal biology and in diseases, particularly cancer. Genes whose expression differs in variability or distribution without a difference in mean are ignored by traditional differential expression-based analyses. Using a Bayesian hierarchical model that provides tests for both differential variability and differential distribution for bulk RNA-seq data, we report here an investigation into differential variability and distribution in cancer. Analysis of eight paired tumour-normal datasets from The Cancer Genome Atlas confirms that differential variability and distribution analyses are able to identify cancer-related genes. We further demonstrate that differential variability identifies cancer-related genes that are missed by differential expression analysis, and that differential expression and differential variability identify functionally distinct sets of potentially cancer-related genes. These results suggest that differential variability analysis may provide insights into genetic aspects of cancer that would not be revealed by differential expression, and that differential distribution analysis may allow for more comprehensive identification of cancer-related genes than analyses based on changes in mean or variability alone.
Collapse
|
6
|
Maßberg D, Hatt H. Human Olfactory Receptors: Novel Cellular Functions Outside of the Nose. Physiol Rev 2018; 98:1739-1763. [PMID: 29897292 DOI: 10.1152/physrev.00013.2017] [Citation(s) in RCA: 160] [Impact Index Per Article: 22.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Olfactory receptors (ORs) are not exclusively expressed in the olfactory sensory neurons; they are also observed outside of the olfactory system in all other human tissues tested to date, including the testis, lung, intestine, skin, heart, and blood. Within these tissues, certain ORs have been determined to be exclusively expressed in only one tissue, whereas other ORs are more widely distributed in many different tissues throughout the human body. For most of the ectopically expressed ORs, limited data are available for their functional roles. They have been shown to be involved in the modulation of cell-cell recognition, migration, proliferation, the apoptotic cycle, exocytosis, and pathfinding processes. Additionally, there is a growing body of evidence that they have the potential to serve as diagnostic and therapeutic tools, as ORs are highly expressed in different cancer tissues. Interestingly, in addition to the canonical signaling pathways activated by ORs in olfactory sensory neurons, alternative pathways have been demonstrated in nonolfactory tissues. In this review, the existing data concerning the expression, as well as the physiological and pathophysiological functions, of ORs outside of the nose are highlighted to provide insights into future lines of research.
Collapse
Affiliation(s)
- Désirée Maßberg
- Ruhr-University Bochum, Department of Cell Physiology , Bochum , Germany
| | - Hanns Hatt
- Ruhr-University Bochum, Department of Cell Physiology , Bochum , Germany
| |
Collapse
|
7
|
Roberts AGK, Catchpoole DR, Kennedy PJ. Variance-based Feature Selection for Classification of Cancer Subtypes Using Gene Expression Data. 2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) 2018:1-8. [DOI: 10.1109/ijcnn.2018.8489279] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2025]
|
8
|
Maßberg D, Jovancevic N, Offermann A, Simon A, Baniahmad A, Perner S, Pungsrinont T, Luko K, Philippou S, Ubrig B, Heiland M, Weber L, Altmüller J, Becker C, Gisselmann G, Gelis L, Hatt H. The activation of OR51E1 causes growth suppression of human prostate cancer cells. Oncotarget 2018; 7:48231-48249. [PMID: 27374083 PMCID: PMC5217014 DOI: 10.18632/oncotarget.10197] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2015] [Accepted: 06/06/2016] [Indexed: 01/23/2023] Open
Abstract
The development of prostate cancer (PCa) is regulated by the androgen-dependent activity of the androgen receptor (AR). Androgen-deprivation therapy (ADT) is therefore the gold standard treatment to suppress malignant progression of PCa. Nevertheless, due to the development of castration resistance, recurrence of disease after initial response to ADT is a major obstacle to successful treatment. As G-protein coupled receptors play a fundamental role in PCa physiology, they might represent promising alternative or combinatorial targets for advanced diseases. Here, we verified gene expression of the olfactory receptors (ORs) OR51E1 [prostate-specific G-protein coupled receptor 2 (PSGR2)] and OR51E2 (PSGR) in human PCa tissue by RNA-Seq analysis and RT-PCR and elucidated the subcellular localization of both receptor proteins in human prostate tissue. The OR51E1 agonist nonanoic acid (NA) leads to the phosphorylation of various protein kinases and growth suppression of the PCa cell line LNCaP. Furthermore, treatment with NA causes reduction of androgen-mediated AR target gene expression. Interestingly, NA induces cellular senescence, which coincides with reduced E2F1 mRNA levels. In contrast, treatment with the structurally related compound 1-nonanol or the OR2AG1 agonist amyl butyrate, neither of which activates OR51E1, did not lead to reduced cell growth or an induction of cellular senescence. However, decanoic acid, another OR51E1 agonist, also induces cellular senescence. Thus, our results suggest the involvement of OR51E1 in growth processes of PCa cells and its impact on AR-mediated signaling. These findings provide novel evidences to support the functional importance of ORs in PCa pathogenesis.
Collapse
Affiliation(s)
- Désirée Maßberg
- Department of Cell Physiology, Ruhr-University Bochum, Bochum, Germany
| | | | - Anne Offermann
- Pathology of the University Hospital of Luebeck and the Leibniz Research Center Borstel, Luebeck and Borstel, Germany
| | - Annika Simon
- Department of Cell Physiology, Ruhr-University Bochum, Bochum, Germany
| | - Aria Baniahmad
- Institute of Human Genetics, Jena University Hospital, Jena, Germany
| | - Sven Perner
- Pathology of the University Hospital of Luebeck and the Leibniz Research Center Borstel, Luebeck and Borstel, Germany
| | | | - Katarina Luko
- Institute of Human Genetics, Jena University Hospital, Jena, Germany
| | - Stathis Philippou
- Institute for Pathology und Cytology, Augusta-Kranken-Anstalt gGmbH Bochum, Bochum, Germany
| | - Burkhard Ubrig
- Clinic for Urology, Augusta-Kranken-Anstalt gGmbH Bochum, Bochum, Germany
| | - Markus Heiland
- Clinic for Urology, Augusta-Kranken-Anstalt gGmbH Bochum, Bochum, Germany
| | - Lea Weber
- Department of Cell Physiology, Ruhr-University Bochum, Bochum, Germany
| | | | | | - Günter Gisselmann
- Department of Cell Physiology, Ruhr-University Bochum, Bochum, Germany
| | - Lian Gelis
- Department of Cell Physiology, Ruhr-University Bochum, Bochum, Germany.,Present address: Global Drug Discovery - Clinical Sciences, Bayer Pharma AG, Wuppertal, Germany
| | - Hanns Hatt
- Department of Cell Physiology, Ruhr-University Bochum, Bochum, Germany
| |
Collapse
|
9
|
Yeo J, Crawford EL, Zhang X, Khuder S, Chen T, Levin A, Blomquist TM, Willey JC. A lung cancer risk classifier comprising genome maintenance genes measured in normal bronchial epithelial cells. BMC Cancer 2017; 17:301. [PMID: 28464886 PMCID: PMC5412061 DOI: 10.1186/s12885-017-3287-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2016] [Accepted: 04/20/2017] [Indexed: 12/14/2022] Open
Abstract
Background Annual low dose CT (LDCT) screening of individuals at high demographic risk reduces lung cancer mortality by more than 20%. However, subjects selected for screening based on demographic criteria typically have less than a 10% lifetime risk for lung cancer. Thus, there is need for a biomarker that better stratifies subjects for LDCT screening. Toward this goal, we previously reported a lung cancer risk test (LCRT) biomarker comprising 14 genome-maintenance (GM) pathway genes measured in normal bronchial epithelial cells (NBEC) that accurately classified cancer (CA) from non-cancer (NC) subjects. The primary goal of the studies reported here was to optimize the LCRT biomarker for high specificity and ease of clinical implementation. Methods Targeted competitive multiplex PCR amplicon libraries were prepared for next generation sequencing (NGS) analysis of transcript abundance at 68 sites among 33 GM target genes in NBEC specimens collected from a retrospective cohort of 120 subjects, including 61 CA cases and 59 NC controls. Genes were selected for analysis based on contribution to the previously reported LCRT biomarker and/or prior evidence for association with lung cancer risk. Linear discriminant analysis was used to identify the most accurate classifier suitable to stratify subjects for screening. Results After cross-validation, a model comprising expression values from 12 genes (CDKN1A, E2F1, ERCC1, ERCC4, ERCC5, GPX1, GSTP1, KEAP1, RB1, TP53, TP63, and XRCC1) and demographic factors age, gender, and pack-years smoking, had Receiver Operator Characteristic area under the curve (ROC AUC) of 0.975 (95% CI: 0.96–0.99). The overall classification accuracy was 93% (95% CI 88%–98%) with sensitivity 93.1%, specificity 92.9%, positive predictive value 93.1% and negative predictive value 93%. The ROC AUC for this classifier was significantly better (p < 0.0001) than the best model comprising demographic features alone. Conclusions The LCRT biomarker reported here displayed high accuracy and ease of implementation on a high throughput, quality-controlled targeted NGS platform. As such, it is optimized for clinical validation in specimens from the ongoing LCRT blinded prospective cohort study. Following validation, the biomarker is expected to have clinical utility by better stratifying subjects for annual lung cancer screening compared to current demographic criteria alone. Electronic supplementary material The online version of this article (doi:10.1186/s12885-017-3287-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jiyoun Yeo
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, The University of Toledo College of Medicine, 3000 Arlington Avenue, HEB 219, Toledo, OH, 43614, USA
| | - Erin L Crawford
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, The University of Toledo College of Medicine, 3000 Arlington Avenue, HEB 219, Toledo, OH, 43614, USA
| | - Xiaolu Zhang
- Cancer Genetics and Comparative Genomics Branch (CGCGB), National Human Genomes Research Institute (NHGRI), National Institutes of Health (NIH), Bldg 50, Rm 5341, 50 South Dr., Bethesda, MD, 20892, USA
| | - Sadik Khuder
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, The University of Toledo College of Medicine, 3000 Arlington Avenue, RHC 0012, Toledo, OH, 43614, USA
| | - Tian Chen
- Department of Mathematics and Statistics, The University of Toledo, 2801 W. Bancroft Street, Toledo, OH, 43606, USA
| | - Albert Levin
- Department of Biostatistics, Henry Ford Health System, 1 Ford Place, Detroit, MI, 48202, USA
| | - Thomas M Blomquist
- Department of Pathology, The University of Toledo College of Medicine, 3000 Arlington Avenue, Toledo, OH, 43614, USA
| | - James C Willey
- Ruppert 0012, Division of Pulmonary and Critical Care Medicine, Department of Medicine, The University of Toledo College of Medicine, 3000 Arlington Avenue, Toledo, OH, 43614, USA.
| |
Collapse
|
10
|
Ross-Adams H, Lamb A, Dunning M, Halim S, Lindberg J, Massie C, Egevad L, Russell R, Ramos-Montoya A, Vowler S, Sharma N, Kay J, Whitaker H, Clark J, Hurst R, Gnanapragasam V, Shah N, Warren A, Cooper C, Lynch A, Stark R, Mills I, Grönberg H, Neal D, on behalf of the CamCaP Study Group. Integration of copy number and transcriptomics provides risk stratification in prostate cancer: A discovery and validation cohort study. EBioMedicine 2015; 2:1133-44. [PMID: 26501111 PMCID: PMC4588396 DOI: 10.1016/j.ebiom.2015.07.017] [Citation(s) in RCA: 228] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2015] [Revised: 07/10/2015] [Accepted: 07/14/2015] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Understanding the heterogeneous genotypes and phenotypes of prostate cancer is fundamental to improving the way we treat this disease. As yet, there are no validated descriptions of prostate cancer subgroups derived from integrated genomics linked with clinical outcome. METHODS In a study of 482 tumour, benign and germline samples from 259 men with primary prostate cancer, we used integrative analysis of copy number alterations (CNA) and array transcriptomics to identify genomic loci that affect expression levels of mRNA in an expression quantitative trait loci (eQTL) approach, to stratify patients into subgroups that we then associated with future clinical behaviour, and compared with either CNA or transcriptomics alone. FINDINGS We identified five separate patient subgroups with distinct genomic alterations and expression profiles based on 100 discriminating genes in our separate discovery and validation sets of 125 and 103 men. These subgroups were able to consistently predict biochemical relapse (p = 0.0017 and p = 0.016 respectively) and were further validated in a third cohort with long-term follow-up (p = 0.027). We show the relative contributions of gene expression and copy number data on phenotype, and demonstrate the improved power gained from integrative analyses. We confirm alterations in six genes previously associated with prostate cancer (MAP3K7, MELK, RCBTB2, ELAC2, TPD52, ZBTB4), and also identify 94 genes not previously linked to prostate cancer progression that would not have been detected using either transcript or copy number data alone. We confirm a number of previously published molecular changes associated with high risk disease, including MYC amplification, and NKX3-1, RB1 and PTEN deletions, as well as over-expression of PCA3 and AMACR, and loss of MSMB in tumour tissue. A subset of the 100 genes outperforms established clinical predictors of poor prognosis (PSA, Gleason score), as well as previously published gene signatures (p = 0.0001). We further show how our molecular profiles can be used for the early detection of aggressive cases in a clinical setting, and inform treatment decisions. INTERPRETATION For the first time in prostate cancer this study demonstrates the importance of integrated genomic analyses incorporating both benign and tumour tissue data in identifying molecular alterations leading to the generation of robust gene sets that are predictive of clinical outcome in independent patient cohorts.
Collapse
Affiliation(s)
- H. Ross-Adams
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge CB2 0RE, UK
| | - A.D. Lamb
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge CB2 0RE, UK
- Department of Urology, Addenbrooke's Hospital, Cambridge CB2 2QQ, UK
- Academic Urology Group, University of Cambridge, Cambridge, CB2 0QQ, UK
| | - M.J. Dunning
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge CB2 0RE, UK
| | - S. Halim
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge CB2 0RE, UK
| | - J. Lindberg
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - C.M. Massie
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge CB2 0RE, UK
| | - L.A. Egevad
- Department of Oncology–Pathology, Karolinska Institutet, Stockholm, Sweden
| | - R. Russell
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge CB2 0RE, UK
| | - A. Ramos-Montoya
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge CB2 0RE, UK
| | - S.L. Vowler
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge CB2 0RE, UK
| | - N.L. Sharma
- Nuffield Department of Surgical Sciences, University of Oxford, Roosevelt Drive, Oxford, UK
| | - J. Kay
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge CB2 0RE, UK
- Molecular Diagnostics and Therapeutics Group, University College London, WC1E 6BT, UK
| | - H. Whitaker
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge CB2 0RE, UK
- Molecular Diagnostics and Therapeutics Group, University College London, WC1E 6BT, UK
| | - J. Clark
- University of East Anglia, Norwich Research Park, Norwich NR4 7TJ, UK
| | - R. Hurst
- University of East Anglia, Norwich Research Park, Norwich NR4 7TJ, UK
| | - V.J. Gnanapragasam
- Department of Urology, Addenbrooke's Hospital, Cambridge CB2 2QQ, UK
- Academic Urology Group, University of Cambridge, Cambridge, CB2 0QQ, UK
| | - N.C. Shah
- Department of Urology, Addenbrooke's Hospital, Cambridge CB2 2QQ, UK
| | - A.Y. Warren
- Department of Pathology, Addenbrooke's Hospital, Cambridge CB2 2QQ, UK
| | - C.S. Cooper
- University of East Anglia, Norwich Research Park, Norwich NR4 7TJ, UK
| | - A.G. Lynch
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge CB2 0RE, UK
| | - R. Stark
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge CB2 0RE, UK
| | - I.G. Mills
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge CB2 0RE, UK
- Prostate Cancer Research Group, Centre for Molecular Medicine Norway, Nordic EMBL Partnership, University of Oslo and Oslo University Hospital, N-0318 Oslo, Norway
- Department of Molecular Oncology, Institute of Cancer Research, Oslo University Hospitals, N-0424 Oslo, Norway
- Prostate Cancer UK/Movember Centre of Excellence for Prostate Cancer Research, Centre for Cancer Research and Cell Biology, Queen's University, Belfast, UK
| | - H. Grönberg
- Academic Urology Group, University of Cambridge, Cambridge, CB2 0QQ, UK
| | - D.E. Neal
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge CB2 0RE, UK
- Department of Urology, Addenbrooke's Hospital, Cambridge CB2 2QQ, UK
| | | |
Collapse
|
11
|
Gorlov IP, Yang JY, Byun J, Logothetis C, Gorlova OY, Do KA, Amos C. How to get the most from microarray data: advice from reverse genomics. BMC Genomics 2014; 15:223. [PMID: 24656147 PMCID: PMC3997969 DOI: 10.1186/1471-2164-15-223] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2012] [Accepted: 03/10/2014] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Whole-genome profiling of gene expression is a powerful tool for identifying cancer-associated genes. Genes differentially expressed between normal and tumorous tissues are usually considered to be cancer associated. We recently demonstrated that the analysis of interindividual variation in gene expression can be useful for identifying cancer associated genes. The goal of this study was to identify the best microarray data-derived predictor of known cancer associated genes. RESULTS We found that the traditional approach of identifying cancer genes--identifying differentially expressed genes--is not very efficient. The analysis of interindividual variation of gene expression in tumor samples identifies cancer-associated genes more effectively. The results were consistent across 4 major types of cancer: breast, colorectal, lung, and prostate. We used recently reported cancer-associated genes (2011-2012) for validation and found that novel cancer-associated genes can be best identified by elevated variance of the gene expression in tumor samples. CONCLUSIONS The observation that the high interindividual variation of gene expression in tumor tissues is the best predictor of cancer-associated genes is likely a result of tumor heterogeneity on gene level. Computer simulation demonstrates that in the case of heterogeneity, an assessment of variance in tumors provides a better identification of cancer genes than does the comparison of the expression in normal and tumor tissues. Our results thus challenge the current paradigm that comparing the mean expression between normal and tumorous tissues is the best approach to identifying cancer-associated genes; we found that the high interindividual variation in expression is a better approach, and that using variation would improve our chances of identifying cancer-associated genes.
Collapse
Affiliation(s)
- Ivan P Gorlov
- Department of Genitourinary Medical Oncology, Unit 1374, The University of Texas MD Anderson Cancer Center, 1155 Pressler Street, Houston, TX 77030-3721, USA.
| | | | | | | | | | | | | |
Collapse
|