1
|
Zheng X, Amos CI, Frost HR. Pan-cancer evaluation of gene expression and somatic alteration data for cancer prognosis prediction. BMC Cancer 2021; 21:1053. [PMID: 34563154 PMCID: PMC8467202 DOI: 10.1186/s12885-021-08796-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 08/16/2021] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND Over the past decades, approaches for diagnosing and treating cancer have seen significant improvement. However, the variability of patient and tumor characteristics has limited progress on methods for prognosis prediction. The development of high-throughput omics technologies now provides multiple approaches for characterizing tumors. Although a large number of published studies have focused on integration of multi-omics data and use of pathway-level models for cancer prognosis prediction, there still exists a gap of knowledge regarding the prognostic landscape across multi-omics data for multiple cancer types using both gene-level and pathway-level predictors. METHODS In this study, we systematically evaluated three often available types of omics data (gene expression, copy number variation and somatic point mutation) covering both DNA-level and RNA-level features. We evaluated the landscape of predictive performance of these three omics modalities for 33 cancer types in the TCGA using a Lasso or Group Lasso-penalized Cox model and either gene or pathway level predictors. RESULTS We constructed the prognostic landscape using three types of omics data for 33 cancer types on both the gene and pathway levels. Based on this landscape, we found that predictive performance is cancer type dependent and we also highlighted the cancer types and omics modalities that support the most accurate prognostic models. In general, models estimated on gene expression data provide the best predictive performance on either gene or pathway level and adding copy number variation or somatic point mutation data to gene expression data does not improve predictive performance, with some exceptional cohorts including low grade glioma and thyroid cancer. In general, pathway-level models have better interpretative performance, higher stability and smaller model size across multiple cancer types and omics data types relative to gene-level models. CONCLUSIONS Based on this landscape and comprehensively comparison, models estimated on gene expression data provide the best predictive performance on either gene or pathway level. Pathway-level models have better interpretative performance, higher stability and smaller model size relative to gene-level models.
Collapse
Affiliation(s)
- Xingyu Zheng
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
| | - Christopher I Amos
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA. .,Department of Medicine, Institute for Clinical and Translational Research, Baylor College of Medicine, Houston, TX, USA.
| | - H Robert Frost
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA.
| |
Collapse
|
2
|
Tian S, Wang C, Wang B. Incorporating Pathway Information into Feature Selection towards Better Performed Gene Signatures. BIOMED RESEARCH INTERNATIONAL 2019; 2019:2497509. [PMID: 31073522 PMCID: PMC6470448 DOI: 10.1155/2019/2497509] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Accepted: 03/07/2019] [Indexed: 12/29/2022]
Abstract
To analyze gene expression data with sophisticated grouping structures and to extract hidden patterns from such data, feature selection is of critical importance. It is well known that genes do not function in isolation but rather work together within various metabolic, regulatory, and signaling pathways. If the biological knowledge contained within these pathways is taken into account, the resulting method is a pathway-based algorithm. Studies have demonstrated that a pathway-based method usually outperforms its gene-based counterpart in which no biological knowledge is considered. In this article, a pathway-based feature selection is firstly divided into three major categories, namely, pathway-level selection, bilevel selection, and pathway-guided gene selection. With bilevel selection methods being regarded as a special case of pathway-guided gene selection process, we discuss pathway-guided gene selection methods in detail and the importance of penalization in such methods. Last, we point out the potential utilizations of pathway-guided gene selection in one active research avenue, namely, to analyze longitudinal gene expression data. We believe this article provides valuable insights for computational biologists and biostatisticians so that they can make biology more computable.
Collapse
Affiliation(s)
- Suyan Tian
- Division of Clinical Research, The First Hospital of Jilin University, 71 Xinmin Street, Changchun, Jilin 130021, China
| | - Chi Wang
- Department of Biostatistics, Markey Cancer Center, The University of Kentucky, 800 Rose St., Lexington, KY 40536, USA
| | - Bing Wang
- School of Life Science, Jilin University, 2699 Qianjin Street, Changchun, Jilin 130012, China
| |
Collapse
|
3
|
Feature Selection for Longitudinal Data by Using Sign Averages to Summarize Gene Expression Values over Time. BIOMED RESEARCH INTERNATIONAL 2019; 2019:1724898. [PMID: 31016185 PMCID: PMC6444255 DOI: 10.1155/2019/1724898] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Accepted: 02/25/2019] [Indexed: 01/02/2023]
Abstract
With the rapid evolution of high-throughput technologies, time series/longitudinal high-throughput experiments have become possible and affordable. However, the development of statistical methods dealing with gene expression profiles across time points has not kept up with the explosion of such data. The feature selection process is of critical importance for longitudinal microarray data. In this study, we proposed aggregating a gene's expression values across time into a single value using the sign average method, thereby degrading a longitudinal feature selection process into a classic one. Regularized logistic regression models with pseudogenes (i.e., the sign average of genes across time as predictors) were then optimized by either the coordinate descent method or the threshold gradient descent regularization method. By applying the proposed methods to simulated data and a traumatic injury dataset, we have demonstrated that the proposed methods, especially for the combination of sign average and threshold gradient descent regularization, outperform other competitive algorithms. To conclude, the proposed methods are highly recommended for studies with the objective of carrying out feature selection for longitudinal gene expression data.
Collapse
|
4
|
Zhang X, Li B, Han H, Song S, Xu H, Yi Z, Hong Y, Zhuang W, Yi N. Pathway-structured predictive modeling for multi-level drug response in multiple myeloma. Bioinformatics 2018; 34:3609-3615. [PMID: 29850860 PMCID: PMC6198861 DOI: 10.1093/bioinformatics/bty436] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2018] [Revised: 05/08/2018] [Accepted: 05/24/2018] [Indexed: 11/12/2022] Open
Abstract
Motivation Molecular analyses suggest that myeloma is composed of distinct sub-types that have different molecular pathologies and various response rates to certain treatments. Drug responses in multiple myeloma (MM) are usually recorded as a multi-level ordinal outcome. One of the goals of drug response studies is to predict which response category any patients belong to with high probability based on their clinical and molecular features. However, as most of genes have small effects, gene-based models may provide limited predictive accuracy. In that case, methods for predicting multi-level ordinal drug responses by incorporating biological pathways are desired but have not been developed yet. Results We propose a pathway-structured method for predicting multi-level ordinal responses using a two-stage approach. We first develop hierarchical ordinal logistic models and an efficient quasi-Newton algorithm for jointly analyzing numerous correlated variables. Our two-stage approach first obtains the linear predictor (called the pathway score) for each pathway by fitting all predictors within each pathway using the hierarchical ordinal logistic approach, and then combines the pathway scores as new predictors to build a predictive model. We applied the proposed method to two publicly available datasets for predicting multi-level ordinal drug responses in MM using large-scale gene expression data and pathway information. Our results show that our approach not only significantly improved the predictive performance compared with the corresponding gene-based model but also allowed us to identify biologically relevant pathways. Availability and implementation The proposed approach has been implemented in our R package BhGLM, which is freely available from the public GitHub repository https://github.com/abbyyan3/BhGLM.
Collapse
Affiliation(s)
- Xinyan Zhang
- Department of Biostatistics, Jiann-Ping Hsu College of Public Health, Georgia Southern University, Statesboro, GA, USA
| | - Bingzong Li
- Department of Hematology, The Second Affiliated Hospital of Soochow University, Suzhou, China
| | - Huiying Han
- Department of Cell Biology, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Sha Song
- Department of Cell Biology, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Hongxia Xu
- Department of Cell Biology, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Zixuan Yi
- School of Medicine, Eastern Virginia Medical School, Norfork, VA, USA
| | - Yating Hong
- Department of Hematology, The Second Affiliated Hospital of Soochow University, Suzhou, China
| | - Wenzhuo Zhuang
- Department of Cell Biology, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Nengjun Yi
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL, USA
| |
Collapse
|
5
|
Ozturk K, Dow M, Carlin DE, Bejar R, Carter H. The Emerging Potential for Network Analysis to Inform Precision Cancer Medicine. J Mol Biol 2018; 430:2875-2899. [PMID: 29908887 PMCID: PMC6097914 DOI: 10.1016/j.jmb.2018.06.016] [Citation(s) in RCA: 54] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Revised: 05/30/2018] [Accepted: 06/06/2018] [Indexed: 12/19/2022]
Abstract
Precision cancer medicine promises to tailor clinical decisions to patients using genomic information. Indeed, successes of drugs targeting genetic alterations in tumors, such as imatinib that targets BCR-ABL in chronic myelogenous leukemia, have demonstrated the power of this approach. However, biological systems are complex, and patients may differ not only by the specific genetic alterations in their tumor, but also by more subtle interactions among such alterations. Systems biology and more specifically, network analysis, provides a framework for advancing precision medicine beyond clinical actionability of individual mutations. Here we discuss applications of network analysis to study tumor biology, early methods for N-of-1 tumor genome analysis, and the path for such tools to the clinic.
Collapse
Affiliation(s)
- Kivilcim Ozturk
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA
| | - Michelle Dow
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA
| | - Daniel E Carlin
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA
| | - Rafael Bejar
- Moores Cancer Center, Division of Hematology and Oncology, University of California San Diego, La Jolla, CA 92093, USA
| | - Hannah Carter
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA; Moores Cancer Center and Institute for Genomic Medicine, University of California San Diego, La Jolla, CA 92093, USA; CIFAR, MaRS Centre, West Tower, 661 University Ave., Suite 505, Toronto, ON M5G 1M1, Canada.
| |
Collapse
|
6
|
Sinnott JA, Cai T. Pathway aggregation for survival prediction via multiple kernel learning. Stat Med 2018; 37:2501-2515. [PMID: 29664143 PMCID: PMC5994931 DOI: 10.1002/sim.7681] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2016] [Revised: 03/10/2018] [Accepted: 03/20/2018] [Indexed: 01/05/2023]
Abstract
Attempts to predict prognosis in cancer patients using high-dimensional genomic data such as gene expression in tumor tissue can be made difficult by the large number of features and the potential complexity of the relationship between features and the outcome. Integrating prior biological knowledge into risk prediction with such data by grouping genomic features into pathways and networks reduces the dimensionality of the problem and could improve prediction accuracy. Additionally, such knowledge-based models may be more biologically grounded and interpretable. Prediction could potentially be further improved by allowing for complex nonlinear pathway effects. The kernel machine framework has been proposed as an effective approach for modeling the nonlinear and interactive effects of genes in pathways for both censored and noncensored outcomes. When multiple pathways are under consideration, one may efficiently select informative pathways and aggregate their signals via multiple kernel learning (MKL), which has been proposed for prediction of noncensored outcomes. In this paper, we propose MKL methods for censored survival outcomes. We derive our approach for a general survival modeling framework with a convex objective function and illustrate its application under the Cox proportional hazards and semiparametric accelerated failure time models. Numerical studies demonstrate that the proposed MKL-based prediction methods work well in finite sample and can potentially outperform models constructed assuming linear effects or ignoring the group knowledge. The methods are illustrated with an application to 2 cancer data sets.
Collapse
Affiliation(s)
| | - Tianxi Cai
- Department of Biostatistics, Harvard University, Boston, MA, USA
| |
Collapse
|
7
|
Tian S. Identification of subtype-specific prognostic signatures using Cox models with redundant gene elimination. Oncol Lett 2018; 15:8545-8555. [PMID: 29805591 PMCID: PMC5950526 DOI: 10.3892/ol.2018.8418] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2017] [Accepted: 03/02/2018] [Indexed: 12/14/2022] Open
Abstract
Lung cancer (LC) is a leading cause of cancer-associated mortalities worldwide. Adenocarcinoma (AC) and squamous cell carcinoma (SCC) account for ~70% of all cases of LC. Since AC and SCC are two distinct diseases, their corresponding prognostic genes associated with patient survival time are expected to be different. To date, only a few studies have distinguished patients with good prognosis from those with poor prognosis for each specific subtype. In the present study, the Cox filter model, a feature selection algorithm that identifies subtype-specific prognostic genes to incorporate pathway information and eliminate redundant genes, was adopted. By applying the proposed model to data on non-small cell lung cancer (NSCLC), it was demonstrated that both redundant gene elimination and search space restriction can improve the predictive capacity and the model stability of resulting prognostic gene signatures. To conclude, a pre-filtering procedure that incorporates pathway information for screening likely irrelevant genes prior to complex downstream analysis is recommended. Furthermore, a feature selection algorithm that considers redundant gene elimination may be preferable to one without such a consideration.
Collapse
Affiliation(s)
- Suyan Tian
- Division of Clinical Research, The First Hospital of Jilin University, Changchun, Jilin 130021, P.R. China
| |
Collapse
|
8
|
Wei W, Sun Z, da Silveira WA, Yu Z, Lawson A, Hardiman G, Kelemen LE, Chung D. Semi-supervised identification of cancer subgroups using survival outcomes and overlapping grouping information. Stat Methods Med Res 2018; 28:2137-2149. [PMID: 29336210 DOI: 10.1177/0962280217752980] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Identification of cancer patient subgroups using high throughput genomic data is of critical importance to clinicians and scientists because it can offer opportunities for more personalized treatment and overlapping treatments of cancers. In spite of tremendous efforts, this problem still remains challenging because of low reproducibility and instability of identified cancer subgroups and molecular features. In order to address this challenge, we developed Integrative Genomics Robust iDentification of cancer subgroups (InGRiD), a statistical approach that integrates information from biological pathway databases with high-throughput genomic data to improve the robustness for identification and interpretation of molecularly-defined subgroups of cancer patients. We applied InGRiD to the gene expression data of high-grade serous ovarian cancer from The Cancer Genome Atlas and the Australian Ovarian Cancer Study. The results indicate clear benefits of the pathway-level approaches over the gene-level approaches. In addition, using the proposed InGRiD framework, we also investigate and address the issue of gene sharing among pathways, which often occurs in practice, to further facilitate biological interpretation of key molecular features associated with cancer progression. The R package "InGRiD" implementing the proposed approach is currently available in our research group GitHub webpage ( https://dongjunchung.github.io/INGRID/ ).
Collapse
Affiliation(s)
- Wei Wei
- 1 Department of Public Health Sciences, Medical University of South Carolina, Charleston, USA.,2 Department of Biostatistics, Yale University, New Haven, USA
| | - Zequn Sun
- 1 Department of Public Health Sciences, Medical University of South Carolina, Charleston, USA
| | - Willian A da Silveira
- 3 Department of Pathology and Laboratory Medicine, Medical University of South Carolina, Charleston, USA.,4 Center for Genomic Medicine, Medical University of South Carolina, Charleston, USA
| | - Zhenning Yu
- 1 Department of Public Health Sciences, Medical University of South Carolina, Charleston, USA
| | - Andrew Lawson
- 1 Department of Public Health Sciences, Medical University of South Carolina, Charleston, USA
| | - Gary Hardiman
- 1 Department of Public Health Sciences, Medical University of South Carolina, Charleston, USA.,4 Center for Genomic Medicine, Medical University of South Carolina, Charleston, USA.,5 Department of Medicine, Medical University of South Carolina, Charleston, USA
| | - Linda E Kelemen
- 1 Department of Public Health Sciences, Medical University of South Carolina, Charleston, USA
| | - Dongjun Chung
- 1 Department of Public Health Sciences, Medical University of South Carolina, Charleston, USA
| |
Collapse
|
9
|
Identification of prognostic genes and gene sets for early-stage non-small cell lung cancer using bi-level selection methods. Sci Rep 2017; 7:46164. [PMID: 28387364 PMCID: PMC5384004 DOI: 10.1038/srep46164] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2016] [Accepted: 03/09/2017] [Indexed: 12/18/2022] Open
Abstract
In contrast to feature selection and gene set analysis, bi-level selection is a process of selecting not only important gene sets but also important genes within those gene sets. Depending on the order of selections, a bi-level selection method can be classified into three categories – forward selection, which first selects relevant gene sets followed by the selection of relevant individual genes; backward selection which takes the reversed order; and simultaneous selection, which performs the two tasks simultaneously usually with the aids of a penalized regression model. To test the existence of subtype-specific prognostic genes for non-small cell lung cancer (NSCLC), we had previously proposed the Cox-filter method that examines the association between patients’ survival time after diagnosis with one specific gene, the disease subtypes, and their interaction terms. In this study, we further extend it to carry out forward and backward bi-level selection. Using simulations and a NSCLC application, we demonstrate that the forward selection outperforms the backward selection and other relevant algorithms in our setting. Both proposed methods are readily understandable and interpretable. Therefore, they represent useful tools for the researchers who are interested in exploring the prognostic value of gene expression data for specific subtypes or stages of a disease.
Collapse
|
10
|
Zhang X, Li Y, Akinyemiju T, Ojesina AI, Buckhaults P, Liu N, Xu B, Yi N. Pathway-Structured Predictive Model for Cancer Survival Prediction: A Two-Stage Approach. Genetics 2017; 205:89-100. [PMID: 28049703 PMCID: PMC5223526 DOI: 10.1534/genetics.116.189191] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2016] [Accepted: 10/31/2016] [Indexed: 12/11/2022] Open
Abstract
Heterogeneity in terms of tumor characteristics, prognosis, and survival among cancer patients has been a persistent problem for many decades. Currently, prognosis and outcome predictions are made based on clinical factors and/or by incorporating molecular profiling data. However, inaccurate prognosis and prediction may result by using only clinical or molecular information directly. One of the main shortcomings of past studies is the failure to incorporate prior biological information into the predictive model, given strong evidence of the pathway-based genetic nature of cancer, i.e., the potential for oncogenes to be grouped into pathways based on biological functions such as cell survival, proliferation, and metastatic dissemination. To address this problem, we propose a two-stage approach to incorporate pathway information into the prognostic modeling using large-scale gene expression data. In the first stage, we fit all predictors within each pathway using the penalized Cox model and Bayesian hierarchical Cox model. In the second stage, we combine the cross-validated prognostic scores of all pathways obtained in the first stage as new predictors to build an integrated prognostic model for prediction. We apply the proposed method to analyze two independent breast and ovarian cancer datasets from The Cancer Genome Atlas (TCGA), predicting overall survival using large-scale gene expression profiling data. The results from both datasets show that the proposed approach not only improves survival prediction compared with the alternative analyses that ignore the pathway information, but also identifies significant biological pathways.
Collapse
Affiliation(s)
- Xinyan Zhang
- Department of Biostatistics, University of Alabama at Birmingham, Alabama 35294
| | - Yan Li
- Department of Biostatistics, University of Alabama at Birmingham, Alabama 35294
| | - Tomi Akinyemiju
- Department of Epidemiology, University of Alabama at Birmingham, Alabama 35294
| | - Akinyemi I Ojesina
- Department of Epidemiology, University of Alabama at Birmingham, Alabama 35294
| | - Phillip Buckhaults
- Department of Drug Discovery and Biomedical Sciences, The South Carolina College of Pharmacy, The University of South Carolina, Columbia, South Carolina 29208
| | - Nianjun Liu
- Department of Epidemiology and Biostatistics, School of Public Health, Indiana University, Bloomington, Indiana 47405
| | - Bo Xu
- Department of Oncology, Southern Research Institute, Birmingham, Alabama 35205
| | - Nengjun Yi
- Department of Biostatistics, University of Alabama at Birmingham, Alabama 35294
| |
Collapse
|
11
|
Eng KH, Schiller E, Morrell K. On representing the prognostic value of continuous gene expression biomarkers with the restricted mean survival curve. Oncotarget 2016; 6:36308-18. [PMID: 26486086 PMCID: PMC4742179 DOI: 10.18632/oncotarget.6121] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2015] [Accepted: 09/12/2015] [Indexed: 12/04/2022] Open
Abstract
Motivation Researchers developing biomarkers for cancer prognosis from quantitative gene expression data are often faced with an odd methodological discrepancy: while Cox's proportional hazards model, the appropriate and popular technique, produces a continuous and relative risk score, it is hard to cast the estimate in clear clinical terms like median months of survival and percent of patients affected. To produce a familiar Kaplan-Meier plot, researchers commonly make the decision to dichotomize a continuous (often unimodal and symmetric) score. It is well known in the statistical literature that this procedure induces significant bias. Results We illustrate the liabilities of common techniques for categorizing a risk score and discuss alternative approaches. We promote the use of the restricted mean survival (RMS) and the corresponding RMS curve that may be thought of as an analog to the best fit line from simple linear regression. Conclusions Continuous biomarker workflows should be modified to include the more rigorous statistical techniques and descriptive plots described in this article. All statistics discussed can be computed via standard functions in the Survival package of the R statistical programming language. Example R language code for the RMS curve is presented in the appendix.
Collapse
Affiliation(s)
- Kevin H Eng
- Department of Biostatistics and Bioinformatics, Roswell Park Cancer Institute, Buffalo, NY, USA
| | - Emily Schiller
- Department of Biostatistics and Bioinformatics, Roswell Park Cancer Institute, Buffalo, NY, USA
| | - Kayla Morrell
- Department of Biostatistics and Bioinformatics, Roswell Park Cancer Institute, Buffalo, NY, USA
| |
Collapse
|
12
|
Choi J, Ye S, Eng KH, Korthauer K, Bradley WH, Rader JS, Kendziorski C. IPI59: An Actionable Biomarker to Improve Treatment Response in Serous Ovarian Carcinoma Patients. STATISTICS IN BIOSCIENCES 2016; 9:1-12. [PMID: 28966695 DOI: 10.1007/s12561-016-9144-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Despite improvements in operative management and therapies, overall survival rates in advanced ovarian cancer have remained largely unchanged over the past three decades. Although it is possible to identify high-risk patients following surgery, the knowledge does not provide information about the genomic aberrations conferring risk, or the implications for treatment. To address these challenges, we developed an integrative pathway-index model and applied it to messenger RNA expression from 458 patients with serous ovarian carcinoma from the Cancer Genome Atlas project. The biomarker derived from this approach, IPI59, contains 59 genes from six pathways. As we demonstrate using independent datasets from six studies, IPI59 is strongly associated with overall and progression-free survival, and also identifies high-risk patients who may benefit from enhanced adjuvant therapy.
Collapse
Affiliation(s)
- J Choi
- University of Wisconsin Madison, Madison, WI, USA
| | - S Ye
- University of Wisconsin Madison, Madison, WI, USA
| | - K H Eng
- University of Wisconsin Madison, Madison, WI, USA
| | - K Korthauer
- University of Wisconsin Madison, Madison, WI, USA
| | - W H Bradley
- Medical College of Wisconsin, Milwaukee, WI, USA
| | - J S Rader
- Medical College of Wisconsin, Milwaukee, WI, USA
| | | |
Collapse
|
13
|
Bradley WH, Eng K, Le M, Mackinnon AC, Kendziorski C, Rader JS. Comparing gene expression data from formalin-fixed, paraffin embedded tissues and qPCR with that from snap-frozen tissue and microarrays for modeling outcomes of patients with ovarian carcinoma. BMC Clin Pathol 2015; 15:17. [PMID: 26412982 PMCID: PMC4582729 DOI: 10.1186/s12907-015-0017-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Accepted: 09/08/2015] [Indexed: 12/18/2022] Open
Abstract
Background Previously, we have used clinical and gene expression data from The Cancer Genome Atlas (TCGA) to model a pathway-based index predicting outcomes in ovarian carcinoma. This data were obtained from snap-frozen tissue measured with the Affymetrix U133 platform. In the current study, we correlate the data used to model with data derived from TaqMan qPCR both snap frozen and paraffin embedded (FFPE) samples. Methods To compare the effect of preservation methods on gene expression measured by qPCR, we assessed 18 patient and tumor sample matched snap-frozen and FFPE ovarian carcinoma samples. To compare gene measurement technologies, we correlated qPCR data from 10 patients with tumor sample matched snap-frozen ovarian carcinoma samples with the microarray data from TCGA. We normalized results to the average expression of three housekeeping genes. We scaled and centered the data for comparison to the Affymetrix output. Results For the 18 specimens, gene expression data obtained from snap-frozen tissue correlated highly with that from FFPE samples in our TaqMan assay (r > 0.82). For the 10 duplicate TCGA specimens, the reported microarray data correlated well (r = 0.6) with our qPCR data, and ranges of expression along pathways were similar. Conclusions Gene expression data obtained by qPCR from FFPE serous ovarian carcinoma samples can be used to assess in the pathway-based predictive model. The normalization procedures described control variations in expression, and the range calculated along a specific pathway can be interpreted for a patient’s risk profile. Electronic supplementary material The online version of this article (doi:10.1186/s12907-015-0017-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- William H Bradley
- Department of Obstetrics and Gynecology, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI 53226 USA
| | - Kevin Eng
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53792 USA ; Current Address: Department of Biostatistics and Bioinformatics, Roswell Park Cancer Institute, Buffalo, NY USA
| | - Min Le
- Department of Pathology, Medical College of Wisconsin, Milwaukee, WI 53226 USA
| | - A Craig Mackinnon
- Department of Pathology, Medical College of Wisconsin, Milwaukee, WI 53226 USA
| | - Christina Kendziorski
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53792 USA
| | - Janet S Rader
- Department of Obstetrics and Gynecology, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI 53226 USA
| |
Collapse
|
14
|
Zhao SD, Parmigiani G, Huttenhower C, Waldron L. Más-o-menos: a simple sign averaging method for discrimination in genomic data analysis. Bioinformatics 2014; 30:3062-9. [PMID: 25061068 DOI: 10.1093/bioinformatics/btu488] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
MOTIVATION The successful translation of genomic signatures into clinical settings relies on good discrimination between patient subgroups. Many sophisticated algorithms have been proposed in the statistics and machine learning literature, but in practice simpler algorithms are often used. However, few simple algorithms have been formally described or systematically investigated. RESULTS We give a precise definition of a popular simple method we refer to as más-o-menos, which calculates prognostic scores for discrimination by summing standardized predictors, weighted by the signs of their marginal associations with the outcome. We study its behavior theoretically, in simulations and in an extensive analysis of 27 independent gene expression studies of bladder, breast and ovarian cancer, altogether totaling 3833 patients with survival outcomes. We find that despite its simplicity, más-o-menos can achieve good discrimination performance. It performs no worse, and sometimes better, than popular and much more CPU-intensive methods for discrimination, including lasso and ridge regression. AVAILABILITY AND IMPLEMENTATION Más-o-menos is implemented for survival analysis as an option in the survHD package, available from http://www.bitbucket.org/lwaldron/survhd and submitted to Bioconductor.
Collapse
Affiliation(s)
- Sihai Dave Zhao
- Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, IL 61820, Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA 02115, City University of New York School of Public Health, Hunter College, New York, NY 10035, USA
| | - Giovanni Parmigiani
- Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, IL 61820, Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA 02115, City University of New York School of Public Health, Hunter College, New York, NY 10035, USA Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, IL 61820, Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA 02115, City University of New York School of Public Health, Hunter College, New York, NY 10035, USA
| | - Curtis Huttenhower
- Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, IL 61820, Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA 02115, City University of New York School of Public Health, Hunter College, New York, NY 10035, USA
| | - Levi Waldron
- Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, IL 61820, Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA 02115, City University of New York School of Public Health, Hunter College, New York, NY 10035, USA
| |
Collapse
|