1
|
Chapman OS, Luebeck J, Sridhar S, Wong ITL, Dixit D, Wang S, Prasad G, Rajkumar U, Pagadala MS, Larson JD, He BJ, Hung KL, Lange JT, Dehkordi SR, Chandran S, Adam M, Morgan L, Wani S, Tiwari A, Guccione C, Lin Y, Dutta A, Lo YY, Juarez E, Robinson JT, Korshunov A, Michaels JEA, Cho YJ, Malicki DM, Coufal NG, Levy ML, Hobbs C, Scheuermann RH, Crawford JR, Pomeroy SL, Rich JN, Zhang X, Chang HY, Dixon JR, Bagchi A, Deshpande AJ, Carter H, Fraenkel E, Mischel PS, Wechsler-Reya RJ, Bafna V, Mesirov JP, Chavez L. Circular extrachromosomal DNA promotes tumor heterogeneity in high-risk medulloblastoma. Nat Genet 2023; 55:2189-2199. [PMID: 37945900 PMCID: PMC10703696 DOI: 10.1038/s41588-023-01551-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 09/22/2023] [Indexed: 11/12/2023]
Abstract
Circular extrachromosomal DNA (ecDNA) in patient tumors is an important driver of oncogenic gene expression, evolution of drug resistance and poor patient outcomes. Applying computational methods for the detection and reconstruction of ecDNA across a retrospective cohort of 481 medulloblastoma tumors from 465 patients, we identify circular ecDNA in 82 patients (18%). Patients with ecDNA-positive medulloblastoma were more than twice as likely to relapse and three times as likely to die within 5 years of diagnosis. A subset of tumors harbored multiple ecDNA lineages, each containing distinct amplified oncogenes. Multimodal sequencing, imaging and CRISPR inhibition experiments in medulloblastoma models reveal intratumoral heterogeneity of ecDNA copy number per cell and frequent putative 'enhancer rewiring' events on ecDNA. This study reveals the frequency and diversity of ecDNA in medulloblastoma, stratified into molecular subgroups, and suggests copy number heterogeneity and enhancer rewiring as oncogenic features of ecDNA.
Collapse
Affiliation(s)
- Owen S Chapman
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, San Diego, CA, USA
- Department of Medicine, University of California San Diego, San Diego, CA, USA
- Sanford Burnham Prebys Medical Discovery Institute, San Diego, CA, USA
| | - Jens Luebeck
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, San Diego, CA, USA
- Department of Computer Science and Engineering, University of California San Diego, San Diego, CA, USA
| | - Sunita Sridhar
- Department of Medicine, University of California San Diego, San Diego, CA, USA
- Department of Pediatrics, UC San Diego and Rady Children's Hospital, San Diego, CA, USA
| | - Ivy Tsz-Lo Wong
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
- Sarafan ChEM-H, Stanford University, Stanford, CA, USA
| | - Deobrat Dixit
- Sanford Burnham Prebys Medical Discovery Institute, San Diego, CA, USA
- Department of Neurology and Herbert Irving Comprehensive Cancer Center, Columbia University, New York, NY, USA
| | - Shanqing Wang
- Department of Computer Science and Engineering, University of California San Diego, San Diego, CA, USA
| | - Gino Prasad
- Department of Computer Science and Engineering, University of California San Diego, San Diego, CA, USA
| | - Utkrisht Rajkumar
- Department of Computer Science and Engineering, University of California San Diego, San Diego, CA, USA
| | - Meghana S Pagadala
- Medical Scientist Training Program, University of California San Diego, San Diego, CA, USA
- Biomedical Sciences Graduate Program, University of California San Diego, San Diego, CA, USA
| | - Jon D Larson
- Sanford Burnham Prebys Medical Discovery Institute, San Diego, CA, USA
| | - Britney Jiayu He
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA, USA
| | - King L Hung
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA, USA
| | - Joshua T Lange
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
- Sarafan ChEM-H, Stanford University, Stanford, CA, USA
| | - Siavash R Dehkordi
- Department of Computer Science and Engineering, University of California San Diego, San Diego, CA, USA
| | | | - Miriam Adam
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Ling Morgan
- Department of Medicine, University of California San Diego, San Diego, CA, USA
| | - Sameena Wani
- Sanford Burnham Prebys Medical Discovery Institute, San Diego, CA, USA
| | - Ashutosh Tiwari
- Sanford Burnham Prebys Medical Discovery Institute, San Diego, CA, USA
| | - Caitlin Guccione
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, San Diego, CA, USA
- Department of Medicine, University of California San Diego, San Diego, CA, USA
| | - Yingxi Lin
- Department of Computer Science and Engineering, University of California San Diego, San Diego, CA, USA
| | - Aditi Dutta
- Department of Computer Science and Engineering, University of California San Diego, San Diego, CA, USA
| | - Yan Yuen Lo
- Sanford Burnham Prebys Medical Discovery Institute, San Diego, CA, USA
- Rady Children's Institute for Genomic Medicine, Rady Children's Hospital and Healthcare Center, San Diego, CA, USA
| | - Edwin Juarez
- Department of Medicine, University of California San Diego, San Diego, CA, USA
| | - James T Robinson
- Department of Medicine, University of California San Diego, San Diego, CA, USA
| | - Andrey Korshunov
- Clinical Cooperation Unit Neuropathology (B300), German Cancer Research Center (DKFZ), German Cancer Consortium (DKTK), and National Center for Tumor Diseases (NCT), Im Neuenheimer Feld 280, Heidelberg, Germany
| | - John-Edward A Michaels
- Papé Pediatric Research Institute, Department of Pediatrics and Knight Cancer Insitute, Oregon Health and Sciences University, Portland, OR, USA
| | - Yoon-Jae Cho
- Papé Pediatric Research Institute, Department of Pediatrics and Knight Cancer Insitute, Oregon Health and Sciences University, Portland, OR, USA
| | - Denise M Malicki
- Division of Pathology, UC San Diego and Rady Children's Hospital, San Diego, CA, USA
| | - Nicole G Coufal
- Department of Pediatrics, UC San Diego and Rady Children's Hospital, San Diego, CA, USA
| | - Michael L Levy
- Division of Pathology, UC San Diego and Rady Children's Hospital, San Diego, CA, USA
| | - Charlotte Hobbs
- Rady Children's Institute for Genomic Medicine, Rady Children's Hospital and Healthcare Center, San Diego, CA, USA
| | - Richard H Scheuermann
- J. Craig Venter Institute, La Jolla, CA, USA
- Department of Pathology, University of California San Diego, San Diego, CA, USA
| | - John R Crawford
- Department of Pediatrics, University of California Irvine and Children's Hospital Orange County, Irvine, CA, USA
| | - Scott L Pomeroy
- Eli and Edythe Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Neurology, Boston Children's Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Jeremy N Rich
- UPMC Hillman Cancer Center, Pittsburgh, PA, USA
- Department of Neurology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Xinlian Zhang
- Division of Biostatistics and Bioinformatics, Department of Family Medicine and Public Health, University of California San Diego, San Diego, CA, USA
| | - Howard Y Chang
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
- Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Jesse R Dixon
- Salk Institute for Biological Studies, La Jolla, CA, USA
| | - Anindya Bagchi
- Sanford Burnham Prebys Medical Discovery Institute, San Diego, CA, USA
| | | | - Hannah Carter
- Department of Medicine, University of California San Diego, San Diego, CA, USA
- Moores Cancer Center, University of California San Diego, San Diego, CA, USA
| | - Ernest Fraenkel
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
- Eli and Edythe Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Paul S Mischel
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
- Sarafan ChEM-H, Stanford University, Stanford, CA, USA
| | - Robert J Wechsler-Reya
- Sanford Burnham Prebys Medical Discovery Institute, San Diego, CA, USA
- Department of Neurology and Herbert Irving Comprehensive Cancer Center, Columbia University, New York, NY, USA
| | - Vineet Bafna
- Department of Computer Science and Engineering, University of California San Diego, San Diego, CA, USA
- Moores Cancer Center, University of California San Diego, San Diego, CA, USA
| | - Jill P Mesirov
- Department of Medicine, University of California San Diego, San Diego, CA, USA
- Moores Cancer Center, University of California San Diego, San Diego, CA, USA
| | - Lukas Chavez
- Department of Medicine, University of California San Diego, San Diego, CA, USA.
- Sanford Burnham Prebys Medical Discovery Institute, San Diego, CA, USA.
- Rady Children's Institute for Genomic Medicine, Rady Children's Hospital and Healthcare Center, San Diego, CA, USA.
- Moores Cancer Center, University of California San Diego, San Diego, CA, USA.
| |
Collapse
|
2
|
Geroldinger A, Lusa L, Nold M, Heinze G. Leave-one-out cross-validation, penalization, and differential bias of some prediction model performance measures-a simulation study. Diagn Progn Res 2023; 7:9. [PMID: 37127679 PMCID: PMC10152625 DOI: 10.1186/s41512-023-00146-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Accepted: 02/20/2023] [Indexed: 05/03/2023] Open
Abstract
BACKGROUND The performance of models for binary outcomes can be described by measures such as the concordance statistic (c-statistic, area under the curve), the discrimination slope, or the Brier score. At internal validation, data resampling techniques, e.g., cross-validation, are frequently employed to correct for optimism in these model performance criteria. Especially with small samples or rare events, leave-one-out cross-validation is a popular choice. METHODS Using simulations and a real data example, we compared the effect of different resampling techniques on the estimation of c-statistics, discrimination slopes, and Brier scores for three estimators of logistic regression models, including the maximum likelihood and two maximum penalized likelihood estimators. RESULTS Our simulation study confirms earlier studies reporting that leave-one-out cross-validated c-statistics can be strongly biased towards zero. In addition, our study reveals that this bias is even more pronounced for model estimators shrinking estimated probabilities towards the observed event fraction, such as ridge regression. Leave-one-out cross-validation also provided pessimistic estimates of the discrimination slope but nearly unbiased estimates of the Brier score. CONCLUSIONS We recommend to use leave-pair-out cross-validation, fivefold cross-validation with repetitions, the enhanced or the .632+ bootstrap to estimate c-statistics, and leave-pair-out or fivefold cross-validation to estimate discrimination slopes.
Collapse
Affiliation(s)
- Angelika Geroldinger
- Center for Medical Data Science, Institute of Clinical Biometrics, Medical University of Vienna, Spitalgasse 23, 1090, Vienna, Austria
| | - Lara Lusa
- Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Koper, Slovenia
- Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia
| | - Mariana Nold
- Department of Sociology, Friedrich Schiller University Jena, Jena, Germany
| | - Georg Heinze
- Center for Medical Data Science, Institute of Clinical Biometrics, Medical University of Vienna, Spitalgasse 23, 1090, Vienna, Austria.
| |
Collapse
|
3
|
Liang J, Zhang W, Yang J, Wu M, Dai Q, Yin H, Xiao Y, Kong L. Deep learning supported discovery of biomarkers for clinical prognosis of liver cancer. NAT MACH INTELL 2023. [DOI: 10.1038/s42256-023-00635-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/05/2023]
|
4
|
Belhechmi S, Le Teuff G, De Bin R, Rotolo F, Michiels S. Favoring the hierarchical constraint in penalized survival models for randomized trials in precision medicine. BMC Bioinformatics 2023; 24:96. [PMID: 36927444 PMCID: PMC10022294 DOI: 10.1186/s12859-023-05162-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2022] [Accepted: 01/27/2023] [Indexed: 03/18/2023] Open
Abstract
BACKGROUND The research of biomarker-treatment interactions is commonly investigated in randomized clinical trials (RCT) for improving medicine precision. The hierarchical interaction constraint states that an interaction should only be in a model if its main effects are also in the model. However, this constraint is not guaranteed in the standard penalized statistical approaches. We aimed to find a compromise for high-dimensional data between the need for sparse model selection and the need for the hierarchical constraint. RESULTS To favor the property of the hierarchical interaction constraint, we proposed to create groups composed of the biomarker main effect and its interaction with treatment and to perform the bi-level selection on these groups. We proposed two weighting approaches (Single Wald (SW) and likelihood ratio test (LRT)) for the adaptive lasso method. The selection performance of these two approaches is compared to alternative lasso extensions (adaptive lasso with ridge-based weights, composite Minimax Concave Penalty, group exponential lasso and Sparse Group Lasso) through a simulation study. A RCT (NSABP B-31) randomizing 1574 patients (431 events) with early breast cancer aiming to evaluate the effect of adjuvant trastuzumab on distant-recurrence free survival with expression data from 462 genes measured in the tumour will serve for illustration. The simulation study illustrates that the adaptive lasso LRT and SW, and the group exponential lasso favored the hierarchical interaction constraint. Overall, in the alternative scenarios, they had the best balance of false discovery and false negative rates for the main effects of the selected interactions. For NSABP B-31, 12 gene-treatment interactions were identified more than 20% by the different methods. Among them, the adaptive lasso (SW) approach offered the best trade-off between a high number of selected gene-treatment interactions and a high proportion of selection of both the gene-treatment interaction and its main effect. CONCLUSIONS Adaptive lasso with Single Wald and likelihood ratio test weighting and the group exponential lasso approaches outperformed their competitors in favoring the hierarchical constraint of the biomarker-treatment interaction. However, the performance of the methods tends to decrease in the presence of prognostic biomarkers.
Collapse
Affiliation(s)
- Shaima Belhechmi
- Université Paris-Saclay, CESP, INSERM U1018 Oncostat, labeled Ligue Contre le Cancer, Villejuif, France.,Bureau de Biostatistique et d'Epidémiologie, Gustave Roussy, Villejuif, France
| | - Gwénaël Le Teuff
- Université Paris-Saclay, CESP, INSERM U1018 Oncostat, labeled Ligue Contre le Cancer, Villejuif, France.,Bureau de Biostatistique et d'Epidémiologie, Gustave Roussy, Villejuif, France
| | | | - Federico Rotolo
- Biostatistics and Data Management Unit, Innate Pharma, Marseille, France
| | - Stefan Michiels
- Université Paris-Saclay, CESP, INSERM U1018 Oncostat, labeled Ligue Contre le Cancer, Villejuif, France. .,Bureau de Biostatistique et d'Epidémiologie, Gustave Roussy, Villejuif, France.
| |
Collapse
|
5
|
Salerno S, Li Y. High-Dimensional Survival Analysis: Methods and Applications. ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION 2023; 10:25-49. [PMID: 36968638 PMCID: PMC10038209 DOI: 10.1146/annurev-statistics-032921-022127] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
In the era of precision medicine, time-to-event outcomes such as time to death or progression are routinely collected, along with high-throughput covariates. These high-dimensional data defy classical survival regression models, which are either infeasible to fit or likely to incur low predictability due to over-fitting. To overcome this, recent emphasis has been placed on developing novel approaches for feature selection and survival prognostication. We will review various cutting-edge methods that handle survival outcome data with high-dimensional predictors, highlighting recent innovations in machine learning approaches for survival prediction. We will cover the statistical intuitions and principles behind these methods and conclude with extensions to more complex settings, where competing events are observed. We exemplify these methods with applications to the Boston Lung Cancer Survival Cohort study, one of the largest cancer epidemiology cohorts investigating the complex mechanisms of lung cancer.
Collapse
Affiliation(s)
- Stephen Salerno
- Department of Biostatistics, University of Michigan, Ann Arbor, United States, 48109
| | - Yi Li
- Department of Biostatistics, University of Michigan, Ann Arbor, United States, 48109
| |
Collapse
|
6
|
Bayesian ridge regression for survival data based on a vine copula-based prior. ASTA ADVANCES IN STATISTICAL ANALYSIS 2022. [DOI: 10.1007/s10182-022-00466-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
7
|
Jardillier R, Koca D, Chatelain F, Guyon L. Prognosis of lasso-like penalized Cox models with tumor profiling improves prediction over clinical data alone and benefits from bi-dimensional pre-screening. BMC Cancer 2022; 22:1045. [PMID: 36199072 PMCID: PMC9533541 DOI: 10.1186/s12885-022-10117-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Accepted: 09/14/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Prediction of patient survival from tumor molecular '-omics' data is a key step toward personalized medicine. Cox models performed on RNA profiling datasets are popular for clinical outcome predictions. But these models are applied in the context of "high dimension", as the number p of covariates (gene expressions) greatly exceeds the number n of patients and e of events. Thus, pre-screening together with penalization methods are widely used for dimensional reduction. METHODS In the present paper, (i) we benchmark the performance of the lasso penalization and three variants (i.e., ridge, elastic net, adaptive elastic net) on 16 cancers from TCGA after pre-screening, (ii) we propose a bi-dimensional pre-screening procedure based on both gene variability and p-values from single variable Cox models to predict survival, and (iii) we compare our results with iterative sure independence screening (ISIS). RESULTS First, we show that integration of mRNA-seq data with clinical data improves predictions over clinical data alone. Second, our bi-dimensional pre-screening procedure can only improve, in moderation, the C-index and/or the integrated Brier score, while excluding irrelevant genes for prediction. We demonstrate that the different penalization methods reached comparable prediction performances, with slight differences among datasets. Finally, we provide advice in the case of multi-omics data integration. CONCLUSIONS Tumor profiles convey more prognostic information than clinical variables such as stage for many cancer subtypes. Lasso and Ridge penalizations perform similarly than Elastic Net penalizations for Cox models in high-dimension. Pre-screening of the top 200 genes in term of single variable Cox model p-values is a practical way to reduce dimension, which may be particularly useful when integrating multi-omics.
Collapse
Affiliation(s)
- Rémy Jardillier
- IRIG, Biosanté U1292, Univ. Grenoble Alpes, Inserm, CEA, Grenoble, France.,GIPSA-lab, Institute of Engineering University Grenoble Alpes, Univ. Grenoble Alpes, CNRS, Grenoble INP, Grenoble, France
| | - Dzenis Koca
- IRIG, Biosanté U1292, Univ. Grenoble Alpes, Inserm, CEA, Grenoble, France
| | - Florent Chatelain
- GIPSA-lab, Institute of Engineering University Grenoble Alpes, Univ. Grenoble Alpes, CNRS, Grenoble INP, Grenoble, France
| | - Laurent Guyon
- IRIG, Biosanté U1292, Univ. Grenoble Alpes, Inserm, CEA, Grenoble, France.
| |
Collapse
|
8
|
Abstract
AbstractTree-based models are increasingly popular due to their ability to identify complex relationships that are beyond the scope of parametric models. Survival tree methods adapt these models to allow for the analysis of censored outcomes, which often appear in medical data. We present a new Optimal Survival Trees algorithm that leverages mixed-integer optimization (MIO) and local search techniques to generate globally optimized survival tree models. We demonstrate that the OST algorithm improves on the accuracy of existing survival tree methods, particularly in large datasets.
Collapse
|
9
|
Alqahtani K, Taylor CC, Wood HM, Gusnanto A. Sparse modelling of cancer patients' survival based on genomic copy number alterations. J Biomed Inform 2022; 128:104025. [PMID: 35181494 DOI: 10.1016/j.jbi.2022.104025] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Revised: 02/03/2022] [Accepted: 02/05/2022] [Indexed: 11/24/2022]
Abstract
Copy number alterations (CNA) are structural variation in the genome, in which some regions exhibit more or less than the normal two chromosomal copies. This genomic CNA profile provides critical information in tumour progression and is therefore informative for patients' survival. It is currently a statistical challenge to model patients' survival using their genomic CNA profiles while at the same time identify regions in the genome that are associated with patients' survival. Some methods have been proposed, including Cox proportional hazard (PH) model with ridge, lasso, or elastic net penalties. However, these methods do not take the general dependencies between genomic regions into account and produce results that are difficult to interpret. In this paper, we extend the elastic net penalty by introducing additional penalty that takes into account general dependencies between genomic regions. This new model produces smooth parameter estimates while simultaneously performs variable selection via sparse solution. The results indicate that the proposed method shows a better prediction performance than other models in our simulation study, while enabling us to investigate regions in the genome that are associated with the patients' survival with sensible interpretation. We illustrate the method using a real dataset from a lung cancer cohort and simulated data.
Collapse
Affiliation(s)
- Khaled Alqahtani
- Department of Mathematics, College of Science and Humanitarian Studies, Prince Sattam Bin Abdulaziz University, Al Kharj, Saudi Arabia; Department of Statistics, University of Leeds, Leeds LS2 9JT, United Kingdom
| | - Charles C Taylor
- Department of Statistics, University of Leeds, Leeds LS2 9JT, United Kingdom
| | - Henry M Wood
- Leeds Institute of Medical Research at St. James's, University of Leeds, Leeds LS9 7TF
| | - Arief Gusnanto
- Department of Statistics, University of Leeds, Leeds LS2 9JT, United Kingdom
| |
Collapse
|
10
|
Madjar K, Rahnenführer J. Weighted Cox regression for the prediction of heterogeneous patient subgroups. BMC Med Inform Decis Mak 2021; 21:342. [PMID: 34876106 PMCID: PMC8650299 DOI: 10.1186/s12911-021-01698-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Accepted: 11/23/2021] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND An important task in clinical medicine is the construction of risk prediction models for specific subgroups of patients based on high-dimensional molecular measurements such as gene expression data. Major objectives in modeling high-dimensional data are good prediction performance and feature selection to find a subset of predictors that are truly associated with a clinical outcome such as a time-to-event endpoint. In clinical practice, this task is challenging since patient cohorts are typically small and can be heterogeneous with regard to their relationship between predictors and outcome. When data of several subgroups of patients with the same or similar disease are available, it is tempting to combine them to increase sample size, such as in multicenter studies. However, heterogeneity between subgroups can lead to biased results and subgroup-specific effects may remain undetected. METHODS For this situation, we propose a penalized Cox regression model with a weighted version of the Cox partial likelihood that includes patients of all subgroups but assigns them individual weights based on their subgroup affiliation. The weights are estimated from the data such that patients who are likely to belong to the subgroup of interest obtain higher weights in the subgroup-specific model. RESULTS Our proposed approach is evaluated through simulations and application to real lung cancer cohorts, and compared to existing approaches. Simulation results demonstrate that our proposed model is superior to standard approaches in terms of prediction performance and variable selection accuracy when the sample size is small. CONCLUSIONS The results suggest that sharing information between subgroups by incorporating appropriate weights into the likelihood can increase power to identify the prognostic covariates and improve risk prediction.
Collapse
Affiliation(s)
- Katrin Madjar
- Department of Statistics, TU Dortmund University, 44221, Dortmund, Germany.
| | - Jörg Rahnenführer
- Department of Statistics, TU Dortmund University, 44221, Dortmund, Germany
| |
Collapse
|
11
|
Penalized spline estimation for panel count data model with time-varying coefficients. Comput Stat 2021. [DOI: 10.1007/s00180-021-01109-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
12
|
Signorelli M, Spitali P, Szigyarto CAK, Tsonaka R. Penalized regression calibration: A method for the prediction of survival outcomes using complex longitudinal and high-dimensional data. Stat Med 2021; 40:6178-6196. [PMID: 34464990 PMCID: PMC9293191 DOI: 10.1002/sim.9178] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Revised: 08/10/2021] [Accepted: 08/10/2021] [Indexed: 11/18/2022]
Abstract
Longitudinal and high‐dimensional measurements have become increasingly common in biomedical research. However, methods to predict survival outcomes using covariates that are both longitudinal and high‐dimensional are currently missing. In this article, we propose penalized regression calibration (PRC), a method that can be employed to predict survival in such situations. PRC comprises three modeling steps: First, the trajectories described by the longitudinal predictors are flexibly modeled through the specification of multivariate mixed effects models. Second, subject‐specific summaries of the longitudinal trajectories are derived from the fitted mixed models. Third, the time to event outcome is predicted using the subject‐specific summaries as covariates in a penalized Cox model. To ensure a proper internal validation of the fitted PRC models, we furthermore develop a cluster bootstrap optimism correction procedure that allows to correct for the optimistic bias of apparent measures of predictiveness. PRC and the CBOCP are implemented in the R package pencal, available from CRAN. After studying the behavior of PRC via simulations, we conclude by illustrating an application of PRC to data from an observational study that involved patients affected by Duchenne muscular dystrophy, where the goal is predict time to loss of ambulation using longitudinal blood biomarkers.
Collapse
Affiliation(s)
- Mirko Signorelli
- Mathematical Institute, Leiden University, Leiden, The Netherlands
| | - Pietro Spitali
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | | | | | - Roula Tsonaka
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| |
Collapse
|
13
|
Survival analysis with semi-supervised predictive clustering trees. Comput Biol Med 2021; 141:105001. [PMID: 34782112 DOI: 10.1016/j.compbiomed.2021.105001] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Revised: 10/26/2021] [Accepted: 10/27/2021] [Indexed: 11/21/2022]
Abstract
Many clinical studies follow patients over time and record the time until the occurrence of an event of interest (e.g., recovery, death, …). When patients drop out of the study or when their event did not happen before the study ended, the collected dataset is said to contain censored observations. Given the rise of personalized medicine, clinicians are often interested in accurate risk prediction models that predict, for unseen patients, a survival profile, including the expected time until the event. Survival analysis methods are used to detect associations or compare subpopulations of patients in this context. In this article, we propose to cast the time-to-event prediction task as a multi-target regression task, with censored observations modeled as partially labeled examples. We then apply semi-supervised learning to the resulting data representation. More specifically, we use semi-supervised predictive clustering trees and ensembles thereof. Empirical results over eleven real-life datasets demonstrate superior or equivalent predictive performance of the proposed approach as compared to three competitor methods. Moreover, smaller models are obtained compared to random survival forests, another tree ensemble method. Finally, we illustrate the informative feature selection mechanism of our method, by interpreting the splits induced by a single tree model when predicting survival for amyotrophic lateral sclerosis patients.
Collapse
|
14
|
Mimi A, Khan MHR. Variable selection for censored data using Modified Correlation Adjusted coRrelation (MCAR) scores. Stat Med 2021; 40:5046-5064. [PMID: 34155660 DOI: 10.1002/sim.9110] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2019] [Revised: 05/08/2021] [Accepted: 06/07/2021] [Indexed: 11/06/2022]
Abstract
Dealing with high-dimensional censored data is very challenging because of the complexities in data structure. This article focuses on developing a variable selection procedure for censored high-dimensional data with the AFT models using the Modified Correlation Adjusted coRrelation (MCAR) scores method. The latter is developed based on CAR scores method that provides a canonical ordering that encourages grouping of correlated predictors and down-weights antagonistic variables. The proposed MCAR scores method is developed as an extension of the CAR scores method using NOVEL integration of the sample and threshold estimator of the correlation matrix as suggested by Huang and Frylewicz. The proposed MCAR exhibits computationally more efficient estimates under model sparsity and can provide a canonical ordering among the predictors. The MCAR method is a greedy method that is also easy to understand and can perform estimation and variable selection simultaneously. Performances of variable selection by the MCAR method have been compared with other existing regularized techniques in literature-such as the lasso, elastic net and with a machine learning technique called boosting and with the censored CAR by a number of simulation studies and a real microarray data set called diffuse large-B-cell lymphoma. Results indicate that when correlation exists among the covariates, the MCAR method outperforms all five techniques while for uncorrelated data, the MCAR performs quite similar to the CAR method but clearly outperforms the other three methods. The empirical study further reveals that the MCAR method exhibits the best predictive performance among the methods.
Collapse
Affiliation(s)
- Afsana Mimi
- Applied Statistics, Institute of Statistical Research and Training, University of Dhaka, Dhaka, Bangladesh
| | - Md Hasinur Rahaman Khan
- Applied Statistics, Institute of Statistical Research and Training, University of Dhaka, Dhaka, Bangladesh
| |
Collapse
|
15
|
Cui E, Crainiceanu CM, Leroux A. Additive Functional Cox Model. J Comput Graph Stat 2021; 30:780-793. [PMID: 34898969 PMCID: PMC8664082 DOI: 10.1080/10618600.2020.1853550] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Revised: 10/10/2020] [Accepted: 11/13/2020] [Indexed: 10/22/2022]
Abstract
We propose the Additive Functional Cox Model to flexibly quantify the association between functional covariates and time to event data. The model extends the linear functional proportional hazards model by allowing the association between the functional covariate and log hazard to vary non-linearly in both the functional domain and the value of the functional covariate. Additionally, we introduce critical transformations of the functional covariate which address the weak model identifiability in areas of information sparsity and discuss their impact on interpretation and inference. We also introduce a novel estimation procedure that accounts for identifiability constraints directly during model fitting. Methods are applied to the National Health and Nutrition Examination Survey (NHANES) 2003-2006 accelerometry data and quantify new and interpretable circadian patterns of physical activity that are associated with all-cause mortality. We also introduce a simple and novel simulation framework for generating survival data with functional predictors which resemble the observed data. The accompanying inferential R software is fast, open source and publicly available. Our data application and simulations are fully reproducible through the accompanying vignette.
Collapse
Affiliation(s)
- Erjia Cui
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, USA
| | | | - Andrew Leroux
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, USA; Department of Biostatistics and Bioinformatics, University of Colorado, Anschutz Medical Campus, USA
| |
Collapse
|
16
|
Cowling TE, Cromwell DA, Sharples LD, van der Meulen J. A novel approach selected small sets of diagnosis codes with high prediction performance in large healthcare datasets. J Clin Epidemiol 2020; 128:20-28. [DOI: 10.1016/j.jclinepi.2020.08.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Revised: 07/15/2020] [Accepted: 08/05/2020] [Indexed: 12/23/2022]
|
17
|
Abdella GM, Shaaban K. Modeling the Impact of Weather Conditions on Pedestrian Injury Counts Using LASSO-Based Poisson Model. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2020. [DOI: 10.1007/s13369-020-05045-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
18
|
Zheng X, Amos CI, Frost HR. Cancer prognosis prediction using somatic point mutation and copy number variation data: a comparison of gene-level and pathway-based models. BMC Bioinformatics 2020; 21:467. [PMID: 33081688 PMCID: PMC7574407 DOI: 10.1186/s12859-020-03791-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Accepted: 09/30/2020] [Indexed: 01/11/2023] Open
Abstract
BACKGROUND Genomic profiling of solid human tumors by projects such as The Cancer Genome Atlas (TCGA) has provided important information regarding the somatic alterations that drive cancer progression and patient survival. Although researchers have successfully leveraged TCGA data to build prognostic models, most efforts have focused on specific cancer types and a targeted set of gene-level predictors. Less is known about the prognostic ability of pathway-level variables in a pan-cancer setting. To address these limitations, we systematically evaluated and compared the prognostic ability of somatic point mutation (SPM) and copy number variation (CNV) data, gene-level and pathway-level models for a diverse set of TCGA cancer types and predictive modeling approaches. RESULTS We evaluated gene-level and pathway-level penalized Cox proportional hazards models using SPM and CNV data for 29 different TCGA cohorts. We measured predictive accuracy as the concordance index for predicting survival outcomes. Our comprehensive analysis suggests that the use of pathway-level predictors did not offer superior predictive power relative to gene-level models for all cancer types but had the advantages of robustness and parsimony. We identified a set of cohorts for which somatic alterations could not predict prognosis, and a unique cohort LGG, for which SPM data was more predictive than CNV data and the predictive accuracy is good for all model types. We found that the pathway-level predictors provide superior interpretative value and that there is often a serious collinearity issue for the gene-level models while pathway-level models avoided this issue. CONCLUSION Our comprehensive analysis suggests that when using somatic alterations data for cancer prognosis prediction, pathway-level models are more interpretable, stable and parsimonious compared to gene-level models. Pathway-level models also avoid the issue of collinearity, which can be serious for gene-level somatic alterations. The prognostic power of somatic alterations is highly variable across different cancer types and we have identified a set of cohorts for which somatic alterations could not predict prognosis. In general, CNV data predicts prognosis better than SPM data with the exception of the LGG cohort.
Collapse
Affiliation(s)
- Xingyu Zheng
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH, 03755, USA
| | - Christopher I Amos
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH, 03755, USA.
- Department of Medicine, Institute for Clinical and Translational Research, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX, 77030, USA.
| | - H Robert Frost
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH, 03755, USA.
| |
Collapse
|
19
|
Dwomoh D, Adu B, Dodoo D, Theisen M, Iddi S, Gerds TA. Evaluating the predictive performance of malaria antibodies and FCGR3B gene polymorphisms on Plasmodium falciparum infection outcome: a prospective cohort study. Malar J 2020; 19:307. [PMID: 32854708 PMCID: PMC7450914 DOI: 10.1186/s12936-020-03381-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2019] [Accepted: 08/19/2020] [Indexed: 12/03/2022] Open
Abstract
Background Malaria antigen-specific antibodies and polymorphisms in host receptors involved in antibody functionality have been associated with different outcomes of Plasmodium falciparum infections. Thus, to identify key prospective malaria antigens for vaccine development, there is the need to evaluate the associations between malaria antibodies and antibody dependent host factors with more rigorous statistical methods. In this study, different statistical models were used to evaluate the predictive performance of malaria-specific antibodies and host gene polymorphisms on P. falciparum infection in a longitudinal cohort study involving Ghanaian children. Methods Models with different functional forms were built using known predictors (age, sickle cell status, blood group status, parasite density, and mosquito bed net use) and malaria antigen-specific immunoglobulin (Ig) G and IgG subclasses and FCGR3B polymorphisms shown to mediate antibody-dependent cellular functions. Malaria antigens studied were Merozoite surface proteins (MSP-1 and MSP-3), Glutamate Rich Protein (GLURP)-R0, R2, and the Apical Membrane Antigen (AMA-1). The models were evaluated through visualization and assessment of differences between the Area Under the Receiver Operating Characteristic Curve and Brier Score estimated by suitable internal cross-validation designs. Results This study found that the FCGR3B-c.233C>A genotype and IgG against AMA1 were relatively better compared to the other antibodies and FCGR3B genotypes studied in classifying or predicting malaria risk among children. Conclusions The data supports the P. falciparum, AMA1 as an important malaria vaccine antigen, while FCGR3B-c.233C>A under the additive and dominant models of inheritance could be an important modifier of the effect of malaria protective antibodies.
Collapse
Affiliation(s)
- Duah Dwomoh
- Department of Biostatistics, School of Public Health, University of Ghana, Accra, Ghana.
| | - Bright Adu
- Department of Immunology, Noguchi Memorial Institute of Medical Research, College of Health Sciences, University of Ghana, Accra, Ghana
| | - Daniel Dodoo
- Department of Immunology, Noguchi Memorial Institute of Medical Research, College of Health Sciences, University of Ghana, Accra, Ghana
| | - Michael Theisen
- Department for Congenital Disorders, Statens Serum Institut, Copenhagen, Denmark.,Centre for Medical Parasitology at Department of International Health, Immunology and Microbiology, University of Copenhagen, Copenhagen, Denmark.,Department of Infectious Diseases, Copenhagen University Hospital, Rigshospitalet, Copenhagen, Denmark
| | - Samuel Iddi
- Department of Statistics and Actuarial Sciences, University of Ghana, Accra, Ghana
| | - Thomas A Gerds
- Section of Biostatistics, Department of Public Health, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
20
|
Holsbø E, Perduca V, Bongo LA, Lund E, Birmelé E. Predicting breast cancer metastasis from whole-blood transcriptomic measurements. BMC Res Notes 2020; 13:248. [PMID: 32434554 PMCID: PMC7238609 DOI: 10.1186/s13104-020-05088-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2020] [Accepted: 05/10/2020] [Indexed: 01/23/2023] Open
Abstract
Objective In this exploratory work we investigate whether blood gene expression measurements predict breast cancer metastasis. Early detection of increased metastatic risk could potentially be life-saving. Our data comes from the Norwegian Women and Cancer epidemiological cohort study. The women who contributed to these data provided a blood sample up to a year before receiving a breast cancer diagnosis. We estimate a penalized maximum likelihood logistic regression. We evaluate this in terms of calibration, concordance probability, and stability, all of which we estimate by the bootstrap. Results We identify a set of 108 candidate predictor genes that exhibit a fold change in average metastasized observation where there is none for the average non-metastasized observation.
Collapse
Affiliation(s)
- Einar Holsbø
- Department of Computer Science, UiT - The Arctic University of Norway, Tromsø, Norway.
| | - Vittorio Perduca
- Laboratoire MAP5 (UMR CNRS 8145), Université Paris Descartes, Université de Paris, Paris, France
| | - Lars Ailo Bongo
- Department of Computer Science, UiT - The Arctic University of Norway, Tromsø, Norway
| | - Eiliv Lund
- Cancer Registry of Norway, Oslo, Norway.,Department of Community Medicine, UiT - The Arctic University of Norway, Tromsø, Norway
| | - Etienne Birmelé
- Laboratoire MAP5 (UMR CNRS 8145), Université Paris Descartes, Université de Paris, Paris, France
| |
Collapse
|
21
|
Davis AA, Iams WT, Chan D, Oh MS, Lentz RW, Peterman N, Robertson A, Shah A, Srivas R, Wilson TJ, Lambert NJ, George PS, Wong B, Wood HW, Close JC, Tezcan A, Nesmith K, Tezcan H, Chae YK. Early Assessment of Molecular Progression and Response by Whole-genome Circulating Tumor DNA in Advanced Solid Tumors. Mol Cancer Ther 2020; 19:1486-1496. [PMID: 32371589 DOI: 10.1158/1535-7163.mct-19-1060] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Revised: 02/07/2020] [Accepted: 04/24/2020] [Indexed: 12/22/2022]
Abstract
Treatment response assessment for patients with advanced solid tumors is complex and existing methods require greater precision. Current guidelines rely on imaging, which has known limitations, including the time required to show a deterministic change in target lesions. Serial changes in whole-genome (WG) circulating tumor DNA (ctDNA) were used to assess response or resistance to treatment early in the treatment course. Ninety-six patients with advanced cancer were prospectively enrolled (91 analyzed and 5 excluded), and blood was collected before and after initiation of a new, systemic treatment. Plasma cell-free DNA libraries were prepared for either WG or WG bisulfite sequencing. Longitudinal changes in the fraction of ctDNA were quantified to retrospectively identify molecular progression (MP) or major molecular response (MMR). Study endpoints were concordance with first follow-up imaging (FFUI) and stratification of progression-free survival (PFS) and overall survival (OS). Patients with MP (n = 13) had significantly shorter PFS (median 62 days vs. 310 days) and OS (255 days vs. not reached). Sensitivity for MP to identify clinical progression was 54% and specificity was 100%. MP calls were from samples taken a median of 28 days into treatment and 39 days before FFUI. Patients with MMR (n = 27) had significantly longer PFS and OS compared with those with neither call (n = 51). These results demonstrated that ctDNA changes early after treatment initiation inform response to treatment and correlate with long-term clinical outcomes. Once validated, molecular response assessment can enable early treatment change minimizing side effects and costs associated with additional cycles of ineffective treatment.
Collapse
Affiliation(s)
- Andrew A Davis
- Feinberg School of Medicine, Northwestern University, Chicago, Illinois
- Robert H. Lurie Comprehensive Cancer Center of Northwestern University, Chicago, Illinois
| | - Wade T Iams
- Vanderbilt University Medical Center, Nashville, Tennessee
| | - David Chan
- Cancer Care Associates TMPN, Redondo Beach, California
| | - Michael S Oh
- Feinberg School of Medicine, Northwestern University, Chicago, Illinois
| | - Robert W Lentz
- Feinberg School of Medicine, Northwestern University, Chicago, Illinois
| | - Neil Peterman
- Lexent Bio, Inc., San Francisco and San Diego, California
| | - Alex Robertson
- Lexent Bio, Inc., San Francisco and San Diego, California
| | - Abhik Shah
- Lexent Bio, Inc., San Francisco and San Diego, California
| | - Rohith Srivas
- Lexent Bio, Inc., San Francisco and San Diego, California
| | | | | | - Peter S George
- Lexent Bio, Inc., San Francisco and San Diego, California
| | - Becky Wong
- Lexent Bio, Inc., San Francisco and San Diego, California
| | - Haleigh W Wood
- Lexent Bio, Inc., San Francisco and San Diego, California
| | - Jason C Close
- Lexent Bio, Inc., San Francisco and San Diego, California
| | - Ayse Tezcan
- Lexent Bio, Inc., San Francisco and San Diego, California
| | - Ken Nesmith
- Lexent Bio, Inc., San Francisco and San Diego, California
| | - Haluk Tezcan
- Lexent Bio, Inc., San Francisco and San Diego, California.
| | - Young Kwang Chae
- Feinberg School of Medicine, Northwestern University, Chicago, Illinois.
- Robert H. Lurie Comprehensive Cancer Center of Northwestern University, Chicago, Illinois
| |
Collapse
|
22
|
Kawaguchi ES, Suchard MA, Liu Z, Li G. A surrogate ℓ 0 sparse Cox's regression with applications to sparse high-dimensional massive sample size time-to-event data. Stat Med 2020; 39:675-686. [PMID: 31814146 DOI: 10.1002/sim.8438] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2019] [Revised: 09/30/2019] [Accepted: 11/02/2019] [Indexed: 11/11/2022]
Abstract
Sparse high-dimensional massive sample size (sHDMSS) time-to-event data present multiple challenges to quantitative researchers as most current sparse survival regression methods and software will grind to a halt and become practically inoperable. This paper develops a scalable ℓ0 -based sparse Cox regression tool for right-censored time-to-event data that easily takes advantage of existing high performance implementation of ℓ2 -penalized regression method for sHDMSS time-to-event data. Specifically, we extend the ℓ0 -based broken adaptive ridge (BAR) methodology to the Cox model, which involves repeatedly performing reweighted ℓ2 -penalized regression. We rigorously show that the resulting estimator for the Cox model is selection consistent, oracle for parameter estimation, and has a grouping property for highly correlated covariates. Furthermore, we implement our BAR method in an R package for sHDMSS time-to-event data by leveraging existing efficient algorithms for massive ℓ2 -penalized Cox regression. We evaluate the BAR Cox regression method by extensive simulations and illustrate its application on an sHDMSS time-to-event data from the National Trauma Data Bank with hundreds of thousands of observations and tens of thousands sparsely represented covariates.
Collapse
Affiliation(s)
- Eric S Kawaguchi
- Department of Preventive Medicine, University of Southern California, Los Angeles, California
| | - Marc A Suchard
- Department of Preventive Medicine, University of Southern California, Los Angeles, California.,Department of Biomathematics, University of California, Los Angeles, California.,Department of Human Genetics, University of California, Los Angeles, California
| | - Zhenqiu Liu
- Department of Public Health Sciences, Penn State Cancer Institute, Hershey, Pennsylvania
| | - Gang Li
- Department of Preventive Medicine, University of Southern California, Los Angeles, California.,Department of Biomathematics, University of California, Los Angeles, California
| |
Collapse
|
23
|
Choi CH, Chung JY, Kang JH, Paik ES, Lee YY, Park W, Byeon SJ, Chung EJ, Kim BG, Hewitt SM, Bae DS. Chemoradiotherapy response prediction model by proteomic expressional profiling in patients with locally advanced cervical cancer. Gynecol Oncol 2020; 157:437-443. [PMID: 32107047 DOI: 10.1016/j.ygyno.2020.02.017] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Revised: 02/03/2020] [Accepted: 02/09/2020] [Indexed: 12/18/2022]
Abstract
OBJECTIVE Resistance to chemo-radiation therapy is a substantial obstacle that compromises treatment of advanced cervical cancer. The objective of this study was to investigate if a proteomic panel associated with radioresistance could predict survival of patients with locally advanced cervical cancer. METHODS A total of 181 frozen tissue samples were prospectively obtained from patients with locally advanced cervical cancer before chemoradiation. Expression levels of 22 total and phosphorylated proteins were evaluated using well-based reverse phase protein arrays. Selected proteins were validated with western blotting analysis and immunohistochemistry. Performances of models were internally and externally validated. RESULTS Unsupervised clustering stratified patients into three major groups with different overall survival (OS, P = 0.001) and progression-free survival (PFS, P = 0.003) based on detection of BCL2, HER2, CD133, CAIX, and ERCC1. Reverse-phase protein array results significantly correlated with western blotting results (R2 = 0.856). The C-index of model was higher than clinical model in the prediction of OS (C-index: 0.86 and 0.62, respectively) and PFS (C-index: 0.82 and 0.64, respectively). The Kaplan-Meier survival curve showed a dose-dependent prognostic significance of risk score for PFS and OS. Multivariable Cox proportional hazard model confirmed that the risk score was an independent predictor of PFS (HR: 1.6; 95% CI: 1.4-1.9; P < 0.001) and OS (HR: 2.1; 95% CI: 1.7-2.5; P < 0.001). CONCLUSION A proteomic panel of BCL2, HER2, CD133, CAIX, and ERCC1 independently predicted survival in locally advanced cervical cancer patients. This prediction model can help identify chemoradiation responsive tumors and improve prediction for clinical outcome of cervical cancer patients.
Collapse
Affiliation(s)
- Chel Hun Choi
- Department of Obstetrics and Gynecology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea; Experimental Pathology Laboratory, Laboratory of Pathology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, USA
| | - Joon-Yong Chung
- Experimental Pathology Laboratory, Laboratory of Pathology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, USA
| | - Jun Hyeok Kang
- Department of Obstetrics and Gynecology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - E Sun Paik
- Department of Obstetrics and Gynecology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Yoo-Young Lee
- Department of Obstetrics and Gynecology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Won Park
- Department of Radiation Oncology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Sun-Ju Byeon
- Department of Pathology, Dongtan Sacred Heart Hospital, Hallym University College of Medicine, Hwaseong, Republic of Korea
| | - Eun Joo Chung
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, USA
| | - Byoung-Gie Kim
- Department of Obstetrics and Gynecology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Stephen M Hewitt
- Experimental Pathology Laboratory, Laboratory of Pathology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, USA.
| | - Duk-Soo Bae
- Department of Obstetrics and Gynecology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea.
| |
Collapse
|
24
|
Golmakani MK, Polley EC. Super Learner for Survival Data Prediction. Int J Biostat 2020; 16:/j/ijb.ahead-of-print/ijb-2019-0065/ijb-2019-0065.xml. [PMID: 32097120 DOI: 10.1515/ijb-2019-0065] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2019] [Accepted: 01/24/2020] [Indexed: 11/15/2022]
Abstract
Survival analysis is a widely used method to establish a connection between a time to event outcome and a set of potential covariates. Accurately predicting the time of an event of interest is of primary importance in survival analysis. Many different algorithms have been proposed for survival prediction. However, for a given prediction problem it is rarely, if ever, possible to know in advance which algorithm will perform the best. In this paper we propose two algorithms for constructing super learners in survival data prediction where the individual algorithms are based on proportional hazards. A super learner is a flexible approach to statistical learning that finds the best weighted ensemble of the individual algorithms. Finding the optimal combination of the individual algorithms through minimizing cross-validated risk controls for over-fitting of the final ensemble learner. Candidate algorithms may range from a basic Cox model to tree-based machine learning algorithms, assuming all candidate algorithms are based on the proportional hazards framework. The ensemble weights are estimated by minimizing the cross-validated negative log partial likelihood. We compare the performance of the proposed super learners with existing models through extensive simulation studies. In all simulation scenarios, the proposed super learners are either the best fit or near the best fit. The performances of the newly proposed algorithms are also demonstrated with clinical data examples.
Collapse
Affiliation(s)
| | - Eric C Polley
- Health Science Research, Mayo Clinic Minnesota, Rochester, Minnesota, USA
| |
Collapse
|
25
|
Sun C, Li H, Mills RE, Guan Y. Prognostic model for multiple myeloma progression integrating gene expression and clinical features. Gigascience 2019; 8:giz153. [PMID: 31886876 PMCID: PMC6936209 DOI: 10.1093/gigascience/giz153] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2019] [Revised: 12/05/2019] [Accepted: 12/06/2019] [Indexed: 01/18/2023] Open
Abstract
BACKGROUND Multiple myeloma (MM) is a hematological cancer caused by abnormal accumulation of monoclonal plasma cells in bone marrow. With the increase in treatment options, risk-adapted therapy is becoming more and more important. Survival analysis is commonly applied to study progression or other events of interest and stratify the risk of patients. RESULTS In this study, we present the current state-of-the-art model for MM prognosis and the molecular biomarker set for stratification: the winning algorithm in the 2017 Multiple Myeloma DREAM Challenge, Sub-Challenge 3. Specifically, we built a non-parametric complete hazard ranking model to map the right-censored data into a linear space, where commonplace machine learning techniques, such as Gaussian process regression and random forests, can play their roles. Our model integrated both the gene expression profile and clinical features to predict the progression of MM. Compared with conventional models, such as Cox model and random survival forests, our model achieved higher accuracy in 3 within-cohort predictions. In addition, it showed robust predictive power in cross-cohort validations. Key molecular signatures related to MM progression were identified from our model, which may function as the core determinants of MM progression and provide important guidance for future research and clinical practice. Functional enrichment analysis and mammalian gene-gene interaction network revealed crucial biological processes and pathways involved in MM progression. The model is dockerized and publicly available at https://www.synapse.org/#!Synapse:syn11459638. Both data and reproducible code are included in the docker. CONCLUSIONS We present the current state-of-the-art prognostic model for MM integrating gene expression and clinical features validated in an independent test set.
Collapse
Affiliation(s)
- Chen Sun
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA
| | - Hongyang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA
| | - Ryan E Mills
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA
- Department of Human Genetics, University of Michigan, 1241 East Catherine Street, Ann Arbor, MI 48109, USA
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA
- Department of Internal Medicine, Nephrology Division, University of Michigan, 1150 West Medical Center Drive, Ann Arbor, MI 48109, USA
| |
Collapse
|
26
|
Wang L, He K, Schaubel DE. Penalized survival models for the analysis of alternating recurrent event data. Biometrics 2019; 76:448-459. [PMID: 31535737 DOI: 10.1111/biom.13153] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2018] [Accepted: 09/09/2019] [Indexed: 12/21/2022]
Abstract
Recurrent event data are widely encountered in clinical and observational studies. Most methods for recurrent events treat the outcome as a point process and, as such, neglect any associated event duration. This generally leads to a less informative and potentially biased analysis. We propose a joint model for the recurrent event rate (of incidence) and duration. The two processes are linked through a bivariate normal frailty. For example, when the event is hospitalization, we can treat the time to admission and length-of-stay as two alternating recurrent events. In our method, the regression parameters are estimated through a penalized partial likelihood, and the variance-covariance matrix of the frailty is estimated through a recursive estimating formula. Moreover, we develop a likelihood ratio test to assess the dependence between the incidence and duration processes. Simulation results demonstrate that our method provides accurate parameter estimation, with a relatively fast computation time. We illustrate the methods through an analysis of hospitalizations among end-stage renal disease patients.
Collapse
Affiliation(s)
- Lili Wang
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan
| | - Kevin He
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan
| | - Douglas E Schaubel
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennslyvania, Philadelphia, Pennslyvania
| |
Collapse
|
27
|
Verdecchia P, Angeli F, Cavallini C, Aita A, Turturiello D, De Fano M, Reboldi G. Sudden Cardiac Death in Hypertensive Patients. Hypertension 2019; 73:1071-1078. [DOI: 10.1161/hypertensionaha.119.12684] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Affiliation(s)
- Paolo Verdecchia
- From the Fondazione Umbra Cuore e Ipertensione-ONLUS and Struttura Complessa di Cardiologia, Hospital S. Maria della Misericordia, Perugia, Italy (P.V., C.C., A.A.)
| | - Fabio Angeli
- Struttura Complessa di Cardiologia e Fisiopatologia Cardiovascolare, Hospital S. Maria della Misericordia, Perugia, Italy (F.A., D.T.)
| | - Claudio Cavallini
- From the Fondazione Umbra Cuore e Ipertensione-ONLUS and Struttura Complessa di Cardiologia, Hospital S. Maria della Misericordia, Perugia, Italy (P.V., C.C., A.A.)
| | - Adolfo Aita
- From the Fondazione Umbra Cuore e Ipertensione-ONLUS and Struttura Complessa di Cardiologia, Hospital S. Maria della Misericordia, Perugia, Italy (P.V., C.C., A.A.)
| | - Dario Turturiello
- Struttura Complessa di Cardiologia e Fisiopatologia Cardiovascolare, Hospital S. Maria della Misericordia, Perugia, Italy (F.A., D.T.)
| | | | | |
Collapse
|
28
|
Al Sheeb B, Abdella GM, Hamouda AM, Abdulwahed MS. Predictive modeling of first-year student performance in engineering education using sequential penalization-based regression. JOURNAL OF STATISTICS & MANAGEMENT SYSTEMS 2019. [DOI: 10.1080/09720510.2018.1509817] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
- Bothaina Al Sheeb
- Department of Mechanical and Industrial Engineering, Technology Innovation & Engineering Education Unit, College of Engineering, Qatar University, Doha 2713, Qatar
| | - Galal M. Abdella
- Department of Mechanical and Industrial Engineering, Technology Innovation & Engineering Education Unit, College of Engineering, Qatar University, Doha 2713, Qatar,
| | - Abdel Magid Hamouda
- Department of Mechanical and Industrial Engineering, Technology Innovation & Engineering Education Unit, College of Engineering, Qatar University, Doha 2713, Qatar,
| | - Mahmoud Samir Abdulwahed
- Department of Mechanical and Industrial Engineering, Technology Innovation & Engineering Education Unit, College of Engineering, Qatar University, Doha 2713, Qatar,
| |
Collapse
|
29
|
Bansal A, Heagerty PJ. A Tutorial on Evaluating the Time-Varying Discrimination Accuracy of Survival Models Used in Dynamic Decision Making. Med Decis Making 2018; 38:904-916. [PMID: 30319014 DOI: 10.1177/0272989x18801312] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Many medical decisions involve the use of dynamic information collected on individual patients toward predicting likely transitions in their future health status. If accurate predictions are developed, then a prognostic model can identify patients at greatest risk for future adverse events and may be used clinically to define populations appropriate for targeted intervention. In practice, a prognostic model is often used to guide decisions at multiple time points over the course of disease, and classification performance (i.e., sensitivity and specificity) for distinguishing high-risk v. low-risk individuals may vary over time as an individual's disease status and prognostic information change. In this tutorial, we detail contemporary statistical methods that can characterize the time-varying accuracy of prognostic survival models when used for dynamic decision making. Although statistical methods for evaluating prognostic models with simple binary outcomes are well established, methods appropriate for survival outcomes are less well known and require time-dependent extensions of sensitivity and specificity to fully characterize longitudinal biomarkers or models. The methods we review are particularly important in that they allow for appropriate handling of censored outcomes commonly encountered with event time data. We highlight the importance of determining whether clinical interest is in predicting cumulative (or prevalent) cases over a fixed future time interval v. predicting incident cases over a range of follow-up times and whether patient information is static or updated over time. We discuss implementation of time-dependent receiver operating characteristic approaches using relevant R statistical software packages. The statistical summaries are illustrated using a liver prognostic model to guide transplantation in primary biliary cirrhosis.
Collapse
Affiliation(s)
- Aasthaa Bansal
- The Comparative Health Outcomes, Policy, and Economics (CHOICE) Institute, School of Pharmacy, University of Washington, Seattle, WA (AB).,Department of Biostatistics, University of Washington, Seattle, WA (PJH)
| | - Patrick J Heagerty
- The Comparative Health Outcomes, Policy, and Economics (CHOICE) Institute, School of Pharmacy, University of Washington, Seattle, WA (AB).,Department of Biostatistics, University of Washington, Seattle, WA (PJH)
| |
Collapse
|
30
|
Biganzoli E, Boracchi P, Daidone M, Gion M, Marubini E. Flexible Modelling in Survival Analysis. Structuring Biological Complexity from the Information Provided by Tumor Markers. Int J Biol Markers 2018. [DOI: 10.1177/172460089801300301] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The aim of the present article is to introduce and discuss the problem of optimal modelling of the prognostic information provided by putative prognostic variables, possibly measured on a quantitative scale. A number of methodological aspects will be treated, with particular reference to the role of spline functions and artificial neural networks, which will be discussed in the context of the analysis of survival data. The problem of the evaluation and the choice of the optimal statistical models will be examined, with particular attention to the critical aspects related to the definition of prognostic indexes on the basis of the results of the selected models. Clinical examples in breast cancer on the evaluation of the prognostic impact of several tumor markers are provided. This paper is addressed to all researchers who are interested in the evaluation of the prognostic role of tumor markers, therefore we will stress the necessity of integrating the methodologies of biological, clinical and statistical research in the assessment of prognosis.
Collapse
Affiliation(s)
- E. Biganzoli
- Division of Medical Statistics and Biometry, Istituto Nazionale per lo Studio e la Cura dei Tumori, Milano
| | - P. Boracchi
- Institute of Medical Statistics and Biometry, Università degli Studi di Milano, Milano
| | - M.G Daidone
- U.O. Determinazioni Biomolecolari nella Prognosi e Terapia dei Tumori, Department of Experimental Oncology, Istituto Nazionale per lo Studio e la Cura dei Tumori, Milano
| | - M. Gion
- Centro Regionale Indicatori Biochimici di Tumore, Ospedale Civile, Venezia - Italy
| | - E. Marubini
- Division of Medical Statistics and Biometry, Istituto Nazionale per lo Studio e la Cura dei Tumori, Milano
- Institute of Medical Statistics and Biometry, Università degli Studi di Milano, Milano
| |
Collapse
|
31
|
Utazirubanda JC, Leon T, Ngom P. Variable selection with Group LASSO approach: Application to Cox regression with frailty model. COMMUN STAT-SIMUL C 2018; 50:881-901. [PMID: 34248255 DOI: 10.1080/03610918.2019.1571605] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
In analysis of survival outcomes supplemented with both clinical information and high-dimensional gene expression data, use of the traditional Cox proportional hazards model fails to meet some emerging needs in biomedical research. First, the number of covariates is generally much larger the sample size. Secondly, predicting an outcome based on individual gene expression is inadequate because multiple biological processes and functional pathways regulate phenotypic expression. Another challenge is that the Cox model assumes that populations are homogenous, implying that all individuals have the same risk of death, which is rarely true due to unmeasured risk factors among populations. In this paper we propose group LASSO with gamma-distributed frailty for variable selection in Cox regression by extending previous scholarship to account for heterogeneity among group structures related to exposure and susceptibility. The consistency property of the proposed method is established. This method is appropriate for addressing a wide variety of research questions from genetics to air pollution. Simulated and real world data analysis shows promising performance by group LASSO compared with other methods, including group SCAD and group MCP. Future research directions include expanding the use of frailty with adaptive group LASSO and sparse group LASSO methods.
Collapse
Affiliation(s)
| | - Tomas Leon
- School of Public Health, University of California, Berkeley, USA
| | - Papa Ngom
- LMA,Université Cheikh Anta Diop, Dakar, Senegal
| |
Collapse
|
32
|
Jylhävä J, Kananen L, Raitanen J, Marttila S, Nevalainen T, Hervonen A, Jylhä M, Hurme M. Methylomic predictors demonstrate the role of NF-κB in old-age mortality and are unrelated to the aging-associated epigenetic drift. Oncotarget 2017; 7:19228-41. [PMID: 27015559 PMCID: PMC4991378 DOI: 10.18632/oncotarget.8278] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2015] [Accepted: 03/10/2016] [Indexed: 01/24/2023] Open
Abstract
Changes in the DNA methylation (DNAm) landscape have been implicated in aging and cellular senescence. To unravel the role of specific DNAm patterns in late-life survival, we performed genome-wide methylation profiling in nonagenarians (n=111) and determined the performance of the methylomic predictors and conventional risk markers in a longitudinal setting. The survival model containing only the methylomic markers was superior in terms of predictive accuracy compared with the model containing only the conventional predictors or the model containing conventional predictors combined with the methylomic markers. At the 2.55-year follow-up, we identified 19 mortality-associated (false-discovery rate <0.5) CpG sites that mapped to genes functionally clustering around the nuclear factor kappa B (NF-κB) complex. Interestingly, none of the mortality-associated CpG sites overlapped with the established aging-associated DNAm sites. Our results are in line with previous findings on the role of NF-κB in controlling animal life spans and demonstrate the role of this complex in human longevity.
Collapse
Affiliation(s)
- Juulia Jylhävä
- Department of Microbiology and Immunology, School of Medicine, University of Tampere, Tampere, Finland.,Gerontology Research Center, University of Tampere, Tampere, Finland
| | - Laura Kananen
- Department of Microbiology and Immunology, School of Medicine, University of Tampere, Tampere, Finland.,Gerontology Research Center, University of Tampere, Tampere, Finland
| | - Jani Raitanen
- School of Health Sciences, University of Tampere, Tampere, Finland.,UKK Institute for Health Promotion Research, Tampere, Finland
| | - Saara Marttila
- Department of Microbiology and Immunology, School of Medicine, University of Tampere, Tampere, Finland.,Gerontology Research Center, University of Tampere, Tampere, Finland
| | - Tapio Nevalainen
- Department of Microbiology and Immunology, School of Medicine, University of Tampere, Tampere, Finland.,Gerontology Research Center, University of Tampere, Tampere, Finland
| | - Antti Hervonen
- Gerontology Research Center, University of Tampere, Tampere, Finland.,School of Health Sciences, University of Tampere, Tampere, Finland
| | - Marja Jylhä
- Gerontology Research Center, University of Tampere, Tampere, Finland.,School of Health Sciences, University of Tampere, Tampere, Finland
| | - Mikko Hurme
- Department of Microbiology and Immunology, School of Medicine, University of Tampere, Tampere, Finland.,Gerontology Research Center, University of Tampere, Tampere, Finland.,Fimlab Laboratories, Tampere, Finland
| |
Collapse
|
33
|
Cole SR, Edwards JK, Westreich D, Lesko CR, Lau B, Mugavero MJ, Mathews WC, Eron JJ, Greenland S. Estimating multiple time-fixed treatment effects using a semi-Bayes semiparametric marginal structural Cox proportional hazards regression model. Biom J 2017; 60:100-114. [PMID: 29076182 DOI: 10.1002/bimj.201600140] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2016] [Revised: 06/30/2017] [Accepted: 06/30/2017] [Indexed: 11/10/2022]
Abstract
Marginal structural models for time-fixed treatments fit using inverse-probability weighted estimating equations are increasingly popular. Nonetheless, the resulting effect estimates are subject to finite-sample bias when data are sparse, as is typical for large-sample procedures. Here we propose a semi-Bayes estimation approach which penalizes or shrinks the estimated model parameters to improve finite-sample performance. This approach uses simple symmetric data-augmentation priors. Limited simulation experiments indicate that the proposed approach reduces finite-sample bias and improves confidence-interval coverage when the true values lie within the central "hill" of the prior distribution. We illustrate the approach with data from a nonexperimental study of HIV treatments.
Collapse
Affiliation(s)
- Stephen R Cole
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, USA
| | - Jessie K Edwards
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, USA
| | - Daniel Westreich
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, USA
| | - Catherine R Lesko
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Bryan Lau
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Michael J Mugavero
- Department of Medicine, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | - W Christopher Mathews
- Department of Medicine, School of Medicine, University of California, San Diego, CA, USA
| | - Joseph J Eron
- Department of Medicine, School of Medicine, University of North Carolina, Chapel Hill, NC, USA
| | - Sander Greenland
- Departments of Epidemiology and Statistics, UCLA, Los Angeles, CA, USA
| | | |
Collapse
|
34
|
Liu XR, Pawitan Y, Clements MS. Generalized survival models for correlated time-to-event data. Stat Med 2017; 36:4743-4762. [DOI: 10.1002/sim.7451] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2016] [Revised: 07/20/2017] [Accepted: 08/07/2017] [Indexed: 11/06/2022]
Affiliation(s)
- Xing-Rong Liu
- Department of Medical Epidemiology and Biostatistics; Karolinska Institutet; Nobels väg 12A S-171 77 Stockholm Sweden
| | - Yudi Pawitan
- Department of Medical Epidemiology and Biostatistics; Karolinska Institutet; Nobels väg 12A S-171 77 Stockholm Sweden
| | - Mark S. Clements
- Department of Medical Epidemiology and Biostatistics; Karolinska Institutet; Nobels väg 12A S-171 77 Stockholm Sweden
| |
Collapse
|
35
|
Liu G, Piantadosi S. Ridge estimation in generalized linear models and proportional hazards regressions. COMMUN STAT-THEOR M 2017. [DOI: 10.1080/03610926.2016.1267767] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Guanghan Liu
- Merck Research Laboratories, North Wales, PA, USA
| | - Steven Piantadosi
- Samuel Oschin Cancer Institute, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| |
Collapse
|
36
|
Brückner M, Titman A, Jaki T. Estimation in multi-arm two-stage trials with treatment selection and time-to-event endpoint. Stat Med 2017; 36:3137-3153. [PMID: 28612371 PMCID: PMC5575545 DOI: 10.1002/sim.7367] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2016] [Revised: 05/08/2017] [Accepted: 05/12/2017] [Indexed: 12/29/2022]
Abstract
We consider estimation of treatment effects in two‐stage adaptive multi‐arm trials with a common control. The best treatment is selected at interim, and the primary endpoint is modeled via a Cox proportional hazards model. The maximum partial‐likelihood estimator of the log hazard ratio of the selected treatment will overestimate the true treatment effect in this case. Several methods for reducing the selection bias have been proposed for normal endpoints, including an iterative method based on the estimated conditional selection biases and a shrinkage approach based on empirical Bayes theory. We adapt these methods to time‐to‐event data and compare the bias and mean squared error of all methods in an extensive simulation study and apply the proposed methods to reconstructed data from the FOCUS trial. We find that all methods tend to overcorrect the bias, and only the shrinkage methods can reduce the mean squared error. © 2017 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Collapse
Affiliation(s)
- Matthias Brückner
- Department of Mathematics and Statistics, Lancaster University, Lancaster, LA1 4YF, U.K
| | - Andrew Titman
- Department of Mathematics and Statistics, Lancaster University, Lancaster, LA1 4YF, U.K
| | - Thomas Jaki
- Department of Mathematics and Statistics, Lancaster University, Lancaster, LA1 4YF, U.K
| |
Collapse
|
37
|
Wellek S. A critical evaluation of the current "p-value controversy". Biom J 2017; 59:854-872. [PMID: 28504870 DOI: 10.1002/bimj.201700001] [Citation(s) in RCA: 44] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2017] [Revised: 03/29/2017] [Accepted: 04/07/2017] [Indexed: 11/06/2022]
Abstract
This article has been triggered by the initiative launched in March 2016 by the Board of Directors of the American Statistical Association (ASA) to counteract the current p-value focus of statistical research practices that allegedly "have contributed to a reproducibility crisis in science." It is pointed out that in the very wide field of statistics applied to medicine, many of the problems raised in the ASA statement are not as severe as in the areas the authors may have primarily in mind, although several of them are well-known experts in biostatistics and epidemiology. This is mainly due to the fact that a large proportion of medical research falls under the realm of a well developed body of regulatory rules banning the most frequently occurring misuses of p-values. Furthermore, it is argued that reducing the statistical hypotheses tests nowadays available to the class of procedures based on p-values calculated under a traditional one-point null hypothesis amounts to ignoring important developments having taken place and going on within the statistical sciences. Although hypotheses testing is still an indispensable part of the statistical methodology required in medical and other areas of empirical research, there is a large repertoire of methods based on different paradigms of inference that provide ample options for supplementing and enhancing the methods of data analysis blamed in the ASA statement for causing a crisis.
Collapse
Affiliation(s)
- Stefan Wellek
- Department of Biostatistics, CIMH Mannheim, Mannheim Medical School of the University of Heidelberg, D-68159, Mannheim, J5, Germany.,Department of Medical Biostatistics, Epidemiology and Informatics, University of Mainz, D-55101, Mainz, Germany
| |
Collapse
|
38
|
Hermans K, Waegeman W, Opsomer G, Van Ranst B, De Koster J, Van Eetvelde M, Hostens M. Novel approaches to assess the quality of fertility data stored in dairy herd management software. J Dairy Sci 2017; 100:4078-4089. [DOI: 10.3168/jds.2016-11896] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2016] [Accepted: 01/04/2017] [Indexed: 11/19/2022]
|
39
|
Puhr R, Heinze G, Nold M, Lusa L, Geroldinger A. Firth's logistic regression with rare events: accurate effect estimates and predictions? Stat Med 2017; 36:2302-2317. [DOI: 10.1002/sim.7273] [Citation(s) in RCA: 71] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2016] [Revised: 02/14/2017] [Accepted: 02/14/2017] [Indexed: 11/10/2022]
Affiliation(s)
- Rainer Puhr
- The Kirby Institute; University of New South Wales; Sydney Australia
| | - Georg Heinze
- Center for Medical Statistics, Informatics and Intelligent Systems; Medical University of Vienna; Vienna Austria
| | - Mariana Nold
- Institute of Medical Statistics, Computer Sciences and Documentation; University Hospital Jena; Jena Germany
| | - Lara Lusa
- Institute for Biostatistics and Medical Informatics, Faculty of Medicine; University of Ljubljana; Ljubljana Slovenia
| | - Angelika Geroldinger
- Center for Medical Statistics, Informatics and Intelligent Systems; Medical University of Vienna; Vienna Austria
| |
Collapse
|
40
|
Ha ID, Christian NJ, Jeong JH, Park J, Lee Y. Analysis of clustered competing risks data using subdistribution hazard models with multivariate frailties. Stat Methods Med Res 2016; 25:2488-2505. [PMID: 24619110 PMCID: PMC5771528 DOI: 10.1177/0962280214526193] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Competing risks data often exist within a center in multi-center randomized clinical trials where the treatment effects or baseline risks may vary among centers. In this paper, we propose a subdistribution hazard regression model with multivariate frailty to investigate heterogeneity in treatment effects among centers from multi-center clinical trials. For inference, we develop a hierarchical likelihood (or h-likelihood) method, which obviates the need for an intractable integration over the frailty terms. We show that the profile likelihood function derived from the h-likelihood is identical to the partial likelihood, and hence it can be extended to the weighted partial likelihood for the subdistribution hazard frailty models. The proposed method is illustrated with a dataset from a multi-center clinical trial on breast cancer as well as with a simulation study. We also demonstrate how to present heterogeneity in treatment effects among centers by using a confidence interval for the frailty for each individual center and how to perform a statistical test for such heterogeneity using a restricted h-likelihood.
Collapse
Affiliation(s)
- Il Do Ha
- Department of Asset Management, Daegu Haany University, Gyeongsan, South Korea
| | | | - Jong-Hyeon Jeong
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, USA
| | - Junwoo Park
- Department of Statistics, Seoul National University, Seoul, South Korea
| | - Youngjo Lee
- Department of Statistics, Seoul National University, Seoul, South Korea
| |
Collapse
|
41
|
Ternès N, Rotolo F, Heinze G, Michiels S. Identification of biomarker-by-treatment interactions in randomized clinical trials with survival outcomes and high-dimensional spaces. Biom J 2016; 59:685-701. [PMID: 27862181 PMCID: PMC5763402 DOI: 10.1002/bimj.201500234] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2015] [Revised: 06/17/2016] [Accepted: 08/09/2016] [Indexed: 01/05/2023]
Abstract
Stratified medicine seeks to identify biomarkers or parsimonious gene signatures distinguishing patients that will benefit most from a targeted treatment. We evaluated 12 approaches in high-dimensional Cox models in randomized clinical trials: penalization of the biomarker main effects and biomarker-by-treatment interactions (full-lasso, three kinds of adaptive lasso, ridge+lasso and group-lasso); dimensionality reduction of the main effect matrix via linear combinations (PCA+lasso (where PCA is principal components analysis) or PLS+lasso (where PLS is partial least squares)); penalization of modified covariates or of the arm-specific biomarker effects (two-I model); gradient boosting; and univariate approach with control of multiple testing. We compared these methods via simulations, evaluating their selection abilities in null and alternative scenarios. We varied the number of biomarkers, of nonnull main effects and true biomarker-by-treatment interactions. We also proposed a novel measure evaluating the interaction strength of the developed gene signatures. In the null scenarios, the group-lasso, two-I model, and gradient boosting performed poorly in the presence of nonnull main effects, and performed well in alternative scenarios with also high interaction strength. The adaptive lasso with grouped weights was too conservative. The modified covariates, PCA+lasso, PLS+lasso, and ridge+lasso performed moderately. The full-lasso and adaptive lassos performed well, with the exception of the full-lasso in the presence of only nonnull main effects. The univariate approach performed poorly in alternative scenarios. We also illustrate the methods using gene expression data from 614 breast cancer patients treated with adjuvant chemotherapy.
Collapse
Affiliation(s)
- Nils Ternès
- INSERM U1018, CESP, Université Paris-Sud, Université Paris-Saclay, Villejuif, F-94805, France.,Gustave Roussy, Paris-Saclay, Service de Biostatistique et d'Epidémiologie, Villejuif, F-94805, France
| | - Federico Rotolo
- INSERM U1018, CESP, Université Paris-Sud, Université Paris-Saclay, Villejuif, F-94805, France.,Gustave Roussy, Paris-Saclay, Service de Biostatistique et d'Epidémiologie, Villejuif, F-94805, France
| | - Georg Heinze
- Section for Clinical Biometrics, Center for Medical Statistics, Informatics and Intelligent Systems, Medical University of Vienna, Vienna, A-1090, Austria
| | - Stefan Michiels
- INSERM U1018, CESP, Université Paris-Sud, Université Paris-Saclay, Villejuif, F-94805, France.,Gustave Roussy, Paris-Saclay, Service de Biostatistique et d'Epidémiologie, Villejuif, F-94805, France
| |
Collapse
|
42
|
Multiple Bayesian discriminant functions for high-dimensional massive data classification. Data Min Knowl Discov 2016. [DOI: 10.1007/s10618-016-0481-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
43
|
Liu XR, Pawitan Y, Clements M. Parametric and penalized generalized survival models. Stat Methods Med Res 2016; 27:1531-1546. [DOI: 10.1177/0962280216664760] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
We describe generalized survival models, where g( S( t| z)), for link function g, survival S, time t, and covariates z, is modeled by a linear predictor in terms of covariate effects and smooth time effects. These models include proportional hazards and proportional odds models, and extend the parametric Royston–Parmar models. Estimation is described for both fully parametric linear predictors and combinations of penalized smoothers and parametric effects. The penalized smoothing parameters can be selected automatically using several information criteria. The link function may be selected based on prior assumptions or using an information criterion. We have implemented the models in R. All of the penalized smoothers from the mgcv package are available for smooth time effects and smooth covariate effects. The generalized survival models perform well in a simulation study, compared with some existing models. The estimation of smooth covariate effects and smooth time-dependent hazard or odds ratios is simplified, compared with many non-parametric models. Applying these models to three cancer survival datasets, we find that the proportional odds model is better than the proportional hazards model for two of the datasets.
Collapse
Affiliation(s)
- Xing-Rong Liu
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Sweden
| | - Yudi Pawitan
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Sweden
| | - Mark Clements
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Sweden
| |
Collapse
|
44
|
Pölsterl S, Conjeti S, Navab N, Katouzian A. Survival analysis for high-dimensional, heterogeneous medical data: Exploring feature extraction as an alternative to feature selection. Artif Intell Med 2016; 72:1-11. [PMID: 27664504 DOI: 10.1016/j.artmed.2016.07.004] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2016] [Revised: 06/15/2016] [Accepted: 07/25/2016] [Indexed: 10/21/2022]
Abstract
BACKGROUND In clinical research, the primary interest is often the time until occurrence of an adverse event, i.e., survival analysis. Its application to electronic health records is challenging for two main reasons: (1) patient records are comprised of high-dimensional feature vectors, and (2) feature vectors are a mix of categorical and real-valued features, which implies varying statistical properties among features. To learn from high-dimensional data, researchers can choose from a wide range of methods in the fields of feature selection and feature extraction. Whereas feature selection is well studied, little work focused on utilizing feature extraction techniques for survival analysis. RESULTS We investigate how well feature extraction methods can deal with features having varying statistical properties. In particular, we consider multiview spectral embedding algorithms, which specifically have been developed for these situations. We propose to use random survival forests to accurately determine local neighborhood relations from right censored survival data. We evaluated 10 combinations of feature extraction methods and 6 survival models with and without intrinsic feature selection in the context of survival analysis on 3 clinical datasets. Our results demonstrate that for small sample sizes - less than 500 patients - models with built-in feature selection (Cox model with ℓ1 penalty, random survival forest, and gradient boosted models) outperform feature extraction methods by a median margin of 6.3% in concordance index (inter-quartile range: [-1.2%;14.6%]). CONCLUSIONS If the number of samples is insufficient, feature extraction methods are unable to reliably identify the underlying manifold, which makes them of limited use in these situations. For large sample sizes - in our experiments, 2500 samples or more - feature extraction methods perform as well as feature selection methods.
Collapse
Affiliation(s)
- Sebastian Pölsterl
- Computer Aided Medical Procedures, Technische Universität München, Boltzmannstraße 3, 85748 Garching bei München, Germany.
| | - Sailesh Conjeti
- Computer Aided Medical Procedures, Technische Universität München, Boltzmannstraße 3, 85748 Garching bei München, Germany.
| | - Nassir Navab
- Computer Aided Medical Procedures, Technische Universität München, Boltzmannstraße 3, 85748 Garching bei München, Germany; Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA.
| | - Amin Katouzian
- IBM Almaden Research Center, 650 Harry Road, San Jose, CA 95120, USA.
| |
Collapse
|
45
|
Ojeda FM, Müller C, Börnigen D, Trégouët DA, Schillert A, Heinig M, Zeller T, Schnabel RB. Comparison of Cox Model Methods in A Low-dimensional Setting with Few Events. GENOMICS PROTEOMICS & BIOINFORMATICS 2016; 14:235-43. [PMID: 27224515 PMCID: PMC4996851 DOI: 10.1016/j.gpb.2016.03.006] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/19/2015] [Revised: 03/01/2016] [Accepted: 03/22/2016] [Indexed: 11/01/2022]
Abstract
Prognostic models based on survival data frequently make use of the Cox proportional hazards model. Developing reliable Cox models with few events relative to the number of predictors can be challenging, even in low-dimensional datasets, with a much larger number of observations than variables. In such a setting we examined the performance of methods used to estimate a Cox model, including (i) full model using all available predictors and estimated by standard techniques, (ii) backward elimination (BE), (iii) ridge regression, (iv) least absolute shrinkage and selection operator (lasso), and (v) elastic net. Based on a prospective cohort of patients with manifest coronary artery disease (CAD), we performed a simulation study to compare the predictive accuracy, calibration, and discrimination of these approaches. Candidate predictors for incident cardiovascular events we used included clinical variables, biomarkers, and a selection of genetic variants associated with CAD. The penalized methods, i.e., ridge, lasso, and elastic net, showed a comparable performance, in terms of predictive accuracy, calibration, and discrimination, and outperformed BE and the full model. Excessive shrinkage was observed in some cases for the penalized methods, mostly on the simulation scenarios having the lowest ratio of a number of events to the number of variables. We conclude that in similar settings, these three penalized methods can be used interchangeably. The full model and backward elimination are not recommended in rare event scenarios.
Collapse
Affiliation(s)
- Francisco M Ojeda
- Department of General and Interventional Cardiology, University Heart Center Hamburg-Eppendorf, 20246 Hamburg, Germany.
| | - Christian Müller
- Department of General and Interventional Cardiology, University Heart Center Hamburg-Eppendorf, 20246 Hamburg, Germany; German Center for Cardiovascular Research (DZHK), Hamburg/Kiel/Luebeck, Germany
| | - Daniela Börnigen
- Department of General and Interventional Cardiology, University Heart Center Hamburg-Eppendorf, 20246 Hamburg, Germany; German Center for Cardiovascular Research (DZHK), Hamburg/Kiel/Luebeck, Germany
| | - David-Alexandre Trégouët
- Sorbonne Universités, Université Pierre et Marie Curie Paris 06, Institut National pour la Santé et la Recherche Médicale (INSERM), Unité Mixte de Recherche en Santé (UMR_S) 1166, F-75013 Paris, France; Institute for Cardiometabolism and Nutrition (ICAN), F-75013 Paris, France
| | - Arne Schillert
- Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, 23562 Lübeck, Germany; German Center for Cardiovascular Research (DZHK), Hamburg/Kiel/Luebeck, Germany
| | - Matthias Heinig
- Institute of Computational Biology, German Research Center for Environmental Health, Helmholtz Zentrum München, 85764 Neuherberg, Germany
| | - Tanja Zeller
- Department of General and Interventional Cardiology, University Heart Center Hamburg-Eppendorf, 20246 Hamburg, Germany; German Center for Cardiovascular Research (DZHK), Hamburg/Kiel/Luebeck, Germany
| | - Renate B Schnabel
- Department of General and Interventional Cardiology, University Heart Center Hamburg-Eppendorf, 20246 Hamburg, Germany; German Center for Cardiovascular Research (DZHK), Hamburg/Kiel/Luebeck, Germany
| |
Collapse
|
46
|
Reulen H, Kneib T. Boosting multi-state models. LIFETIME DATA ANALYSIS 2016; 22:241-262. [PMID: 25990764 DOI: 10.1007/s10985-015-9329-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/11/2014] [Accepted: 05/09/2015] [Indexed: 06/04/2023]
Abstract
One important goal in multi-state modelling is to explore information about conditional transition-type-specific hazard rate functions by estimating influencing effects of explanatory variables. This may be performed using single transition-type-specific models if these covariate effects are assumed to be different across transition-types. To investigate whether this assumption holds or whether one of the effects is equal across several transition-types (cross-transition-type effect), a combined model has to be applied, for instance with the use of a stratified partial likelihood formulation. Here, prior knowledge about the underlying covariate effect mechanisms is often sparse, especially about ineffectivenesses of transition-type-specific or cross-transition-type effects. As a consequence, data-driven variable selection is an important task: a large number of estimable effects has to be taken into account if joint modelling of all transition-types is performed. A related but subsequent task is model choice: is an effect satisfactory estimated assuming linearity, or is the true underlying nature strongly deviating from linearity? This article introduces component-wise Functional Gradient Descent Boosting (short boosting) for multi-state models, an approach performing unsupervised variable selection and model choice simultaneously within a single estimation run. We demonstrate that features and advantages in the application of boosting introduced and illustrated in classical regression scenarios remain present in the transfer to multi-state models. As a consequence, boosting provides an effective means to answer questions about ineffectiveness and non-linearity of single transition-type-specific or cross-transition-type effects.
Collapse
|
47
|
Niu MC, Morris SA, Krenek M, DE LA Uz CM, Pedroza C, Miyake CY, Kim JJ, Valdés SO. Reassessing Risk Factors in Pediatric Patients With Pacemakers Implanted for Atrioventricular Block: The Impact of Nonsustained Ventricular Tachycardia. J Cardiovasc Electrophysiol 2016; 27:471-9. [PMID: 27074776 DOI: 10.1111/jce.12897] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/26/2015] [Revised: 11/04/2015] [Accepted: 12/11/2015] [Indexed: 12/01/2022]
Abstract
INTRODUCTION In pediatric patients with pacemakers implanted for atrioventricular block (AVB), nonsustained ventricular tachycardia (NSVT) detected during routine surveillance is a finding of unknown significance. We sought to describe the incidence of NSVT and determine if there was an association between NSVT and adverse outcomes in these patients. METHODS AND RESULTS This is a single-center retrospective study of 136 patients (1971-2013) with pacemakers implanted for advanced and complete AVB. EXCLUSION CRITERIA structural heart disease, diagnoses of myocarditis, cardiomyopathy or channelopathy preceding AVB diagnosis, and sustained or polymorphic ventricular tachycardia (VT) as the first occurring arrhythmia after pacemaker implant. During median follow-up of 11.6 years (IQR 4.3 years, 17 years), 14 (10%) patients had NSVT. There were 6 (4.4%) deaths. Overall, Kaplan-Meier 20-year survival from time of implant was 93%. By univariate analysis, earlier mortality was associated with NSVT (P = 0.010), sustained left ventricular (LV) dysfunction (P = 0.004), maternal autoantibodies (P = 0.017), and acquired AVB (P = 0.049). By multivariate analysis, earlier mortality was associated with NSVT (HR: 5.39 [95% CI: 1.02-28.41]; P = 0.047) and sustained LV dysfunction (HR: 10.24 [95% CI: 1.83-57.32]; P = 0.008). CONCLUSIONS In children with pacemakers implanted for AVB, NSVT is not uncommon and may be associated with increased mortality. Persistent LV dysfunction may also be a potential factor associated with death. Closer follow-up should be considered in patients with these findings. Large, multicenter studies should be considered to confirm these findings and identify risk stratification methods for this unique patient population.
Collapse
Affiliation(s)
- Mary C Niu
- Lillie Frank Abercrombie Section of Pediatric Cardiology, Texas Children's Hospital, Baylor College of Medicine Houston, Texas, USA.,Oklahoma Children's Heart Center, Oklahoma University Health Sciences Center, Oklahoma City, Oklahoma, USA
| | - Shaine A Morris
- Lillie Frank Abercrombie Section of Pediatric Cardiology, Texas Children's Hospital, Baylor College of Medicine Houston, Texas, USA
| | - Michele Krenek
- Lillie Frank Abercrombie Section of Pediatric Cardiology, Texas Children's Hospital, Baylor College of Medicine Houston, Texas, USA
| | - Caridad M DE LA Uz
- Lillie Frank Abercrombie Section of Pediatric Cardiology, Texas Children's Hospital, Baylor College of Medicine Houston, Texas, USA
| | - Claudia Pedroza
- Center for Clinical Research and Evidence-Based Medicine, Department of Pediatrics, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Christina Y Miyake
- Lillie Frank Abercrombie Section of Pediatric Cardiology, Texas Children's Hospital, Baylor College of Medicine Houston, Texas, USA
| | - Jeffrey J Kim
- Lillie Frank Abercrombie Section of Pediatric Cardiology, Texas Children's Hospital, Baylor College of Medicine Houston, Texas, USA
| | - Santiago O Valdés
- Lillie Frank Abercrombie Section of Pediatric Cardiology, Texas Children's Hospital, Baylor College of Medicine Houston, Texas, USA
| |
Collapse
|
48
|
Zucknick M, Saadati M, Benner A. Nonidentical twins: Comparison of frequentist and Bayesian lasso for Cox models. Biom J 2015; 57:959-81. [PMID: 26417963 DOI: 10.1002/bimj.201400160] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2014] [Revised: 05/12/2015] [Accepted: 05/12/2015] [Indexed: 11/07/2022]
Abstract
One important task in translational cancer research is the search for new prognostic biomarkers to improve survival prognosis for patients. The use of high-throughput technologies allows simultaneous measurement of genome-wide gene expression or other genomic data for all patients in a clinical trial. Penalized likelihood methods such as lasso regression can be applied to such high-dimensional data, where the number of (genomic) covariables is usually much larger than the sample size. There is a connection between the lasso and the Bayesian regression model with independent Laplace priors on the regression parameters, and understanding this connection has been useful for understanding the properties of lasso estimates in linear models (e.g. Park and Casella, 2008). In this paper, we study the lasso in the frequentist and Bayesian frameworks in the context of Cox models. For the Bayesian lasso we extend the approach by Lee et al. (2011). In particular, we impose the lasso penalty only on the genome features, but not on relevant clinical covariates, to allow the mandatory inclusion of important established factors. We investigate the models in high- and low-dimensional simulation settings and in an application to chronic lymphocytic leukemia.
Collapse
Affiliation(s)
- Manuela Zucknick
- Division of Biostatistics, German Cancer Research Center, Heidelberg 69120, Germany.,Oslo Center for Biostatistics and Epidemiology, Department of Biostatistics, Institute of Basic Medical Sciences, University of Oslo, PO Box 1122 Blindern, 0317 Oslo, Norway
| | - Maral Saadati
- Division of Biostatistics, German Cancer Research Center, Heidelberg 69120, Germany
| | - Axel Benner
- Division of Biostatistics, German Cancer Research Center, Heidelberg 69120, Germany
| |
Collapse
|
49
|
Wallden B, Storhoff J, Nielsen T, Dowidar N, Schaper C, Ferree S, Liu S, Leung S, Geiss G, Snider J, Vickery T, Davies SR, Mardis ER, Gnant M, Sestak I, Ellis MJ, Perou CM, Bernard PS, Parker JS. Development and verification of the PAM50-based Prosigna breast cancer gene signature assay. BMC Med Genomics 2015; 8:54. [PMID: 26297356 PMCID: PMC4546262 DOI: 10.1186/s12920-015-0129-6] [Citation(s) in RCA: 300] [Impact Index Per Article: 33.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2015] [Accepted: 08/17/2015] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND The four intrinsic subtypes of breast cancer, defined by differential expression of 50 genes (PAM50), have been shown to be predictive of risk of recurrence and benefit of hormonal therapy and chemotherapy. Here we describe the development of Prosigna™, a PAM50-based subtype classifier and risk model on the NanoString nCounter Dx Analysis System intended for decentralized testing in clinical laboratories. METHODS 514 formalin-fixed, paraffin-embedded (FFPE) breast cancer patient samples were used to train prototypical centroids for each of the intrinsic subtypes of breast cancer on the NanoString platform. Hierarchical cluster analysis of gene expression data was used to identify the prototypical centroids defined in previous PAM50 algorithm training exercises. 304 FFPE patient samples from a well annotated clinical cohort in the absence of adjuvant systemic therapy were then used to train a subtype-based risk model (i.e. Prosigna ROR score). 232 samples from a tamoxifen-treated patient cohort were used to verify the prognostic accuracy of the algorithm prior to initiating clinical validation studies. RESULTS The gene expression profiles of each of the four Prosigna subtype centroids were consistent with those previously published using the PCR-based PAM50 method. Similar to previously published classifiers, tumor samples classified as Luminal A by Prosigna had the best prognosis compared to samples classified as one of the three higher-risk tumor subtypes. The Prosigna Risk of Recurrence (ROR) score model was verified to be significantly associated with prognosis as a continuous variable and to add significant information over both commonly available IHC markers and Adjuvant! Online. CONCLUSIONS The results from the training and verification data sets show that the FDA-cleared and CE marked Prosigna test provides an accurate estimate of the risk of distant recurrence in hormone receptor positive breast cancer and is also capable of identifying a tumor's intrinsic subtype that is consistent with the previously published PCR-based PAM50 assay. Subsequent analytical and clinical validation studies confirm the clinical accuracy and technical precision of the Prosigna PAM50 assay in a decentralized setting.
Collapse
Affiliation(s)
- Brett Wallden
- NanoString Technologies, Inc, 530 Fairview Avenue North, Suite 2000, Seattle, WA, 98109, USA.
| | - James Storhoff
- NanoString Technologies, Inc, 530 Fairview Avenue North, Suite 2000, Seattle, WA, 98109, USA.
| | - Torsten Nielsen
- Genetic Pathology Evaluation Centre, Vancouver Coastal Health Research Institute and British Columbia Cancer Agency, 2655 Oak St, Vancouver, BC, V5Z 1M9, Canada.
| | - Naeem Dowidar
- NanoString Technologies, Inc, 530 Fairview Avenue North, Suite 2000, Seattle, WA, 98109, USA.
| | | | - Sean Ferree
- NanoString Technologies, Inc, 530 Fairview Avenue North, Suite 2000, Seattle, WA, 98109, USA.
| | - Shuzhen Liu
- Genetic Pathology Evaluation Centre, Vancouver Coastal Health Research Institute and British Columbia Cancer Agency, 2655 Oak St, Vancouver, BC, V5Z 1M9, Canada.
| | - Samuel Leung
- Genetic Pathology Evaluation Centre, Vancouver Coastal Health Research Institute and British Columbia Cancer Agency, 2655 Oak St, Vancouver, BC, V5Z 1M9, Canada.
| | - Gary Geiss
- NanoString Technologies, Inc, 530 Fairview Avenue North, Suite 2000, Seattle, WA, 98109, USA.
| | - Jacqueline Snider
- Washington University School of Medicine, 660 S Euclid, St. Louis, MO, 63110, USA.
| | - Tammi Vickery
- Washington University School of Medicine, 660 S Euclid, St. Louis, MO, 63110, USA.
| | - Sherri R Davies
- Washington University School of Medicine, 660 S Euclid, St. Louis, MO, 63110, USA.
| | - Elaine R Mardis
- Washington University School of Medicine, 660 S Euclid, St. Louis, MO, 63110, USA.
| | - Michael Gnant
- Department of Surgery and Comprehensive Cancer Center, Medical University of Vienna, Vienna, Austria.
| | - Ivana Sestak
- Centre for Cancer Prevention, Wolfson Institute of Preventive Medicine, Queen Mary University of London, Charterhouse Sq, London, EC1M 6BQ, UK.
| | - Matthew J Ellis
- Lester and Sue Smith Breast Center, Baylor College of Medicine, One Baylor Plaza, MS 600, Houston, TX, 77030, USA.
| | - Charles M Perou
- Lineberger Comprehensive Cancer Center, Department of Genetics, University of North Carolina at Chapel Hill, 450 West Drive, Chapel Hill, NC, 27599, USA.
| | - Philip S Bernard
- Huntsman Comprehensive Cancer Center, Department of Pathology, 2000 Circle of Hope, Salt Lake City, UT, 84103, USA.
| | - Joel S Parker
- Lineberger Comprehensive Cancer Center, Department of Genetics, University of North Carolina at Chapel Hill, 450 West Drive, Chapel Hill, NC, 27599, USA.
| |
Collapse
|
50
|
Pavlou M, Ambler G, Seaman SR, Guttmann O, Elliott P, King M, Omar RZ. How to develop a more accurate risk prediction model when there are few events. BMJ 2015; 351:h3868. [PMID: 26264962 PMCID: PMC4531311 DOI: 10.1136/bmj.h3868] [Citation(s) in RCA: 358] [Impact Index Per Article: 39.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
When the number of events is low relative to the number of predictors, standard regression could produce overfitted risk models that make inaccurate predictions. Use of penalised regression may improve the accuracy of risk prediction
Collapse
Affiliation(s)
- Menelaos Pavlou
- Department of Statistical Science, University College London, WC1E 6BT London, UK
| | - Gareth Ambler
- Department of Statistical Science, University College London, WC1E 6BT London, UK
| | | | - Oliver Guttmann
- School of Life and Medical Sciences, Institute of Cardiovascular Science, University College London
| | - Perry Elliott
- Inherited Cardiac Disease Unit, the Heart Hospital, London
| | - Michael King
- Division of Psychiatry, University College London
| | - Rumana Z Omar
- Department of Statistical Science, University College London, WC1E 6BT London, UK
| |
Collapse
|