Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Schmid M, Hielscher T, Augustin T, Gefeller O. A Robust Alternative to the Schemper-Henderson Estimator of Prediction Error. Biometrics 2010;67:524-35. [DOI: 10.1111/j.1541-0420.2010.01459.x] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

For:	Schmid M, Hielscher T, Augustin T, Gefeller O. A Robust Alternative to the Schemper-Henderson Estimator of Prediction Error. Biometrics 2010;67:524-35. [DOI: 10.1111/j.1541-0420.2010.01459.x] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Number

Cited by Other Article(s)

Javaras KN, Franco VF, Ren B, Bulik CM, Crow SJ, McElroy SL, Pope HG, Hudson JI. The natural course of binge-eating disorder: findings from a prospective, community-based study of adults. Psychol Med 2024:1-11. [PMID: 38803271 DOI: 10.1017/s0033291724000977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]

Darabi P, Gharibzadeh S, Khalili D, Bagherpour-Kalo M, Janani L. Optimizing cardiovascular disease mortality prediction: a super learner approach in the tehran lipid and glucose study. BMC Med Inform Decis Mak 2024;24:97. [PMID: 38627734 PMCID: PMC11020797 DOI: 10.1186/s12911-024-02489-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Accepted: 03/22/2024] [Indexed: 04/19/2024] Open

Abstract

BACKGROUND & AIM

Cardiovascular disease (CVD) is the most important cause of death in the world and has a potential impact on health care costs, this study aimed to evaluate the performance of machine learning survival models and determine the optimum model for predicting CVD-related mortality.

METHOD

In this study, the research population was all participants in Tehran Lipid and Glucose Study (TLGS) aged over 30 years. We used the Gradient Boosting model (GBM), Support Vector Machine (SVM), Super Learner (SL), and Cox proportional hazard (Cox-PH) models to predict the CVD-related mortality using 26 features. The dataset was randomly divided into training (80%) and testing (20%). To evaluate the performance of the methods, we used the Brier Score (BS), Prediction Error (PE), Concordance Index (C-index), and time-dependent Area Under the Curve (TD-AUC) criteria. Four different clinical models were also performed to improve the performance of the methods.

RESULTS

Out of 9258 participants with a mean age of (SD; range) 43.74 (15.51; 20-91), 56.60% were female. The CVD death proportion was 2.5% (228 participants). The death proportion was significantly higher in men (67.98% M, 32.02% F). Based on predefined selection criteria, the SL method has the best performance in predicting CVD-related mortality (TD-AUC > 93.50%). Among the machine learning (ML) methods, The SVM has the worst performance (TD-AUC = 90.13%). According to the relative effect, age, fasting blood sugar, systolic blood pressure, smoking, taking aspirin, diastolic blood pressure, Type 2 diabetes mellitus, hip circumference, body mss index (BMI), and triglyceride were identified as the most influential variables in predicting CVD-related mortality.

CONCLUSION

According to the results of our study, compared to the Cox-PH model, Machine Learning models showed promising and sometimes better performance in predicting CVD-related mortality. This finding is based on the analysis of a large and diverse urban population from Tehran, Iran.

Collapse

Zhang Y, Wong G, Mann G, Muller S, Yang JYH. SurvBenchmark: comprehensive benchmarking study of survival analysis methods using both omics data and clinical data. Gigascience 2022;11:6652188. [PMID: 35906887 PMCID: PMC9338425 DOI: 10.1093/gigascience/giac071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 05/16/2022] [Accepted: 06/22/2022] [Indexed: 11/24/2022] Open

Bertrand F, Maumy-Bertrand M. Fitting and Cross-Validating Cox Models to Censored Big Data With Missing Values Using Extensions of Partial Least Squares Regression Models. Front Big Data 2021;4:684794. [PMID: 34790895 PMCID: PMC8591675 DOI: 10.3389/fdata.2021.684794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Accepted: 10/07/2021] [Indexed: 11/22/2022] Open

Abstract

Fitting Cox models in a big data context -on a massive scale in terms of volume, intensity, and complexity exceeding the capacity of usual analytic tools-is often challenging. If some data are missing, it is even more difficult. We proposed algorithms that were able to fit Cox models in high dimensional settings using extensions of partial least squares regression to the Cox models. Some of them were able to cope with missing data. We were recently able to extend our most recent algorithms to big data, thus allowing to fit Cox model for big data with missing values. When cross-validating standard or extended Cox models, the commonly used criterion is the cross-validated partial loglikelihood using a naive or a van Houwelingen scheme -to make efficient use of the death times of the left out data in relation to the death times of all the data. Quite astonishingly, we will show, using a strong simulation study involving three different data simulation algorithms, that these two cross-validation methods fail with the extensions, either straightforward or more involved ones, of partial least squares regression to the Cox model. This is quite an interesting result for at least two reasons. Firstly, several nice features of PLS based models, including regularization, interpretability of the components, missing data support, data visualization thanks to biplots of individuals and variables -and even parsimony or group parsimony for Sparse partial least squares or sparse group SPLS based models, account for a common use of these extensions by statisticians who usually select their hyperparameters using cross-validation. Secondly, they are almost always featured in benchmarking studies to assess the performance of a new estimation technique used in a high dimensional or big data context and often show poor statistical properties. We carried out a vast simulation study to evaluate more than a dozen of potential cross-validation criteria, either AUC or prediction error based. Several of them lead to the selection of a reasonable number of components. Using these newly found cross-validation criteria to fit extensions of partial least squares regression to the Cox model, we performed a benchmark reanalysis that showed enhanced performances of these techniques. In addition, we proposed sparse group extensions of our algorithms and defined a new robust measure based on the Schmid score and the R coefficient of determination for least absolute deviation: the integrated R Schmid Score weighted. The R-package used in this article is available on the CRAN, http://cran.r-project.org/web/packages/plsRcox/index.html. The R package bigPLS will soon be available on the CRAN and, until then, is available on Github https://github.com/fbertran/bigPLS.

Collapse

Qayed M, Ahn KW, Kitko CL, Johnson MH, Shah NN, Dvorak C, Mellgren K, Friend BD, Verneris MR, Leung W, Toporski J, Levine J, Chewning J, Wayne A, Kapoor U, Triplett B, Schultz KR, Yanik GA, Eapen M. A validated pediatric disease risk index for allogeneic hematopoietic cell transplantation. Blood 2021;137:983-993. [PMID: 33206937 PMCID: PMC7918183 DOI: 10.1182/blood.2020009342] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Accepted: 11/04/2020] [Indexed: 12/16/2022] Open

Affiliation(s)

Muna Qayed Division of Pediatric Hematology/Oncology, Emory University School of Medicine, Atlanta, GA Children's Healthcare of Atlanta, Atlanta, GA
Kwang Woo Ahn Center for International Blood and Marrow Transplant Research, Department of Medicine, and Division of Biostatics, Institute for Heath and Equity, Medical College of Wisconsin, Milwaukee, WI
Carrie L Kitko Division of Hematology/Stem Cell Transplant, Vanderbilt University Medical Center, Nashville, TN
Mariam H Johnson Center for International Blood and Marrow Transplant Research, Department of Medicine, and
Nirali N Shah Division of Pediatric Oncology, National Cancer Institute, Bethesda, MD
Christopher Dvorak Division of Pediatric Allergy, Immunology and Bone Marrow Transplantation, Benioff Children's Hospital, University of California San Francisco, San Francisco, CA
Karin Mellgren Department of Pediatric Oncology, Sahlgrenska University Hospital, Gothenburg, Sweden
Brian D Friend Center for Cell and Gene Therapy, Department of Pediatrics, Baylor College of Medicine, TX
Michael R Verneris Division of Cancer and Blood Disorders, Department of Pediatrics, University Of Colorado, Aurora, CO
Wing Leung Pediatric Academic Clinical Program, Duke-National University of Singapore (NUS) Medical School, Singapore
Jacek Toporski Section of Pediatric Hematology, Oncology, Immunology and Nephrology, Department of Pediatrics, Skåne University Hospital, Lund, Sweden
John Levine Blood and Marrow Transplant Program, Icahn School of Medicine at Mount Sinai, New York, NY
Joseph Chewning Division of Hematology/Oncology, University of Alabama at Birmingham, Birmingham, AL
Alan Wayne Division of Hematology-Oncology, Children's Hospital of Los Angeles, Los Angeles, CA
Urvi Kapoor Department of Pediatrics, SUNY Downstate Medical Center, Brooklyn, NY
Brandon Triplett Division of Bone Marrow Transplantation, St Jude Children's Research Hospital, Memphis, TN
Kirk R Schultz Department of Pediatric Hematology, Oncology and Bone Marrow Transplant, British Columbia's Children's Hospital, The University of British Columbia, Vancouver, BC, Canada
Gregory A Yanik Division of Pediatric Hematology/Oncology, C.S. Mott Children's Hospital, The University of Michigan, Ann Arbor, MI; and
Mary Eapen Center for International Blood and Marrow Transplant Research, Department of Medicine, and Division of Hematology/Oncology, Department of Medicine, Medical College of Wisconsin, Milwaukee, WI

Collapse

Sonabend R, Király FJ, Bender A, Bischl B, Lang M. mlr3proba: An R Package for Machine Learning in Survival Analysis. Bioinformatics 2021;37:2789-2791. [PMID: 33523131 PMCID: PMC8428574 DOI: 10.1093/bioinformatics/btab039] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Revised: 12/06/2020] [Accepted: 01/18/2021] [Indexed: 11/14/2022] Open

Zhou Y, Leung SW, Mizutani S, Takagi T, Tian YS. MEPHAS: an interactive graphical user interface for medical and pharmaceutical statistical analysis with R and Shiny. BMC Bioinformatics 2020;21:183. [PMID: 32393166 PMCID: PMC7216538 DOI: 10.1186/s12859-020-3494-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2019] [Accepted: 04/15/2020] [Indexed: 11/20/2022] Open

Abstract

Background

Even though R is one of the most commonly used statistical computing environments, it lacks a graphical user interface (GUI) that appeals to students, researchers, lecturers, and practitioners in medicine and pharmacy for conducting standard data analytics. Current GUIs built on top of R, such as EZR and R-Commander, aim to facilitate R coding and visualization, but most of the functionalities are still accessed through a command-line interface (CLI). To assist practitioners of medicine and pharmacy and researchers to run most routines in fundamental statistical analysis, we developed an interactive GUI; i.e., MEPHAS, to support various web-based systems that are accessible from laptops, workstations, or tablets, under Windows, macOS (and IOS), or Linux. In addition to fundamental statistical analysis, advanced statistics such as the extended Cox regression and dimensional analyses including partial least squares regression (PLS-R) and sparse partial least squares regression (SPLS-R), are also available in MEPHAS.

Results

MEPHAS is a web-based GUI (https://alain003.phs.osaka-u.ac.jp/mephas/) that is based on a shiny framework. We also created the corresponding R package mephas (https://mephas.github.io/). Thus far, MEPHAS has supported four categories of statistics, including probability, hypothesis testing, regression models, and dimensional analyses. Instructions and help menus were accessible during the entire analytical process via the web-based GUI, particularly advanced dimensional data analysis that required much explanation. The GUI was designed to be intuitive for non-technical users to perform various statistical functions, e.g., managing data, customizing plots, setting parameters, and monitoring real-time results, without any R coding from users. All generated graphs can be saved to local machines, and tables can be downloaded as CSV files.

Conclusion

MEPHAS is a free and open-source web-interactive GUI that was designed to support statistical data analyses and prediction for medical and pharmaceutical practitioners and researchers. It enables various medical and pharmaceutical statistical analyses through interactive parameter settings and dynamic visualization of the results.

Collapse

Korepanova N, Seibold H, Steffen V, Hothorn T. Survival forests under test: Impact of the proportional hazards assumption on prognostic and predictive forests for amyotrophic lateral sclerosis survival. Stat Methods Med Res 2020;29:1403-1419. [PMID: 31304888 DOI: 10.1177/0962280219862586] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Wu C, Li L. Quantifying and estimating the predictive accuracy for censored time-to-event data with competing risks. Stat Med 2018;37:3106-3124. [PMID: 29766537 DOI: 10.1002/sim.7806] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2017] [Revised: 03/29/2018] [Accepted: 04/11/2018] [Indexed: 01/13/2023]

Rahman MS, Ambler G, Choodari-Oskooei B, Omar RZ. Review and evaluation of performance measures for survival prediction models in external validation settings. BMC Med Res Methodol 2017;17:60. [PMID: 28420338 PMCID: PMC5395888 DOI: 10.1186/s12874-017-0336-2] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2017] [Accepted: 04/03/2017] [Indexed: 01/09/2023] Open

Abstract

BACKGROUND

When developing a prediction model for survival data it is essential to validate its performance in external validation settings using appropriate performance measures. Although a number of such measures have been proposed, there is only limited guidance regarding their use in the context of model validation. This paper reviewed and evaluated a wide range of performance measures to provide some guidelines for their use in practice.

METHODS

An extensive simulation study based on two clinical datasets was conducted to investigate the performance of the measures in external validation settings. Measures were selected from categories that assess the overall performance, discrimination and calibration of a survival prediction model. Some of these have been modified to allow their use with validation data, and a case study is provided to describe how these measures can be estimated in practice. The measures were evaluated with respect to their robustness to censoring and ease of interpretation. All measures are implemented, or are straightforward to implement, in statistical software.

RESULTS

Most of the performance measures were reasonably robust to moderate levels of censoring. One exception was Harrell's concordance measure which tended to increase as censoring increased.

CONCLUSIONS

We recommend that Uno's concordance measure is used to quantify concordance when there are moderate levels of censoring. Alternatively, Gönen and Heller's measure could be considered, especially if censoring is very high, but we suggest that the prediction model is re-calibrated first. We also recommend that Royston's D is routinely reported to assess discrimination since it has an appealing interpretation. The calibration slope is useful for both internal and external validation settings and recommended to report routinely. Our recommendation would be to use any of the predictive accuracy measures and provide the corresponding predictive accuracy curves. In addition, we recommend to investigate the characteristics of the validation data such as the level of censoring and the distribution of the prognostic index derived in the validation setting before choosing the performance measures.

Collapse

Mayr A, Hofner B, Schmid M. Boosting the discriminatory power of sparse survival models via optimization of the concordance index and stability selection. BMC Bioinformatics 2016;17:288. [PMID: 27444890 PMCID: PMC4957316 DOI: 10.1186/s12859-016-1149-8] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2015] [Accepted: 07/13/2016] [Indexed: 12/15/2022] Open

Abstract

Background

When constructing new biomarker or gene signature scores for time-to-event outcomes, the underlying aims are to develop a discrimination model that helps to predict whether patients have a poor or good prognosis and to identify the most influential variables for this task. In practice, this is often done fitting Cox models. Those are, however, not necessarily optimal with respect to the resulting discriminatory power and are based on restrictive assumptions. We present a combined approach to automatically select and fit sparse discrimination models for potentially high-dimensional survival data based on boosting a smooth version of the concordance index (C-index). Due to this objective function, the resulting prediction models are optimal with respect to their ability to discriminate between patients with longer and shorter survival times. The gradient boosting algorithm is combined with the stability selection approach to enhance and control its variable selection properties.

Results

The resulting algorithm fits prediction models based on the rankings of the survival times and automatically selects only the most stable predictors. The performance of the approach, which works best for small numbers of informative predictors, is demonstrated in a large scale simulation study: C-index boosting in combination with stability selection is able to identify a small subset of informative predictors from a much larger set of non-informative ones while controlling the per-family error rate. In an application to discover biomarkers for breast cancer patients based on gene expression data, stability selection yielded sparser models and the resulting discriminatory power was higher than with lasso penalized Cox regression models.

Conclusion

The combination of stability selection and C-index boosting can be used to select small numbers of informative biomarkers and to derive new prediction rules that are optimal with respect to their discriminatory power. Stability selection controls the per-family error rate which makes the new approach also appealing from an inferential point of view, as it provides an alternative to classical hypothesis tests for single predictor effects. Due to the shrinkage and variable selection properties of statistical boosting algorithms, the latter tests are typically unfeasible for prediction models fitted by boosting.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-016-1149-8) contains supplementary material, which is available to authorized users.

Collapse

Alotaibi R, Fiaccone R, Henderson R, Stare J. Explained variation for recurrent event data. Biom J 2015;57:571-91. [PMID: 25899247 DOI: 10.1002/bimj.201300143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2013] [Revised: 02/06/2015] [Accepted: 02/14/2015] [Indexed: 11/07/2022]

Bastien P, Bertrand F, Meyer N, Maumy-Bertrand M. Deviance residuals-based sparse PLS and sparse kernel PLS regression for censored data. ACTA ACUST UNITED AC 2014;31:397-404. [PMID: 25286920 DOI: 10.1093/bioinformatics/btu660] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

Abstract

MOTIVATION

A vast literature from the past decade is devoted to relating gene profiles and subject survival or time to cancer recurrence. Biomarker discovery from high-dimensional data, such as transcriptomic or single nucleotide polymorphism profiles, is a major challenge in the search for more precise diagnoses. The proportional hazard regression model suggested by Cox (1972), to study the relationship between the time to event and a set of covariates in the presence of censoring is the most commonly used model for the analysis of survival data. However, like multivariate regression, it supposes that more observations than variables, complete data, and not strongly correlated variables are available. In practice, when dealing with high-dimensional data, these constraints are crippling. Collinearity gives rise to issues of over-fitting and model misidentification. Variable selection can improve the estimation accuracy by effectively identifying the subset of relevant predictors and enhance the model interpretability with parsimonious representation. To deal with both collinearity and variable selection issues, many methods based on least absolute shrinkage and selection operator penalized Cox proportional hazards have been proposed since the reference paper of Tibshirani. Regularization could also be performed using dimension reduction as is the case with partial least squares (PLS) regression. We propose two original algorithms named sPLSDR and its non-linear kernel counterpart DKsPLSDR, by using sparse PLS regression (sPLS) based on deviance residuals. We compared their predicting performance with state-of-the-art algorithms on both simulated and real reference benchmark datasets.

RESULTS

sPLSDR and DKsPLSDR compare favorably with other methods in their computational time, prediction and selectivity, as indicated by results based on benchmark datasets. Moreover, in the framework of PLS regression, they feature other useful tools, including biplots representation, or the ability to deal with missing data. Therefore, we view them as a useful addition to the toolbox of estimation and prediction methods for the widely used Cox's model in the high-dimensional and low-sample size settings.

AVAILABILITY AND IMPLEMENTATION

The R-package plsRcox is available on the CRAN and is maintained by Frédéric Bertrand. http://cran.r-project.org/web/packages/plsRcox/index.html.

CONTACT

pbastien@rd.loreal.com or fbertran@math.unistra.fr.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Schmid M, Potapov S. A comparison of estimators to evaluate the discriminatory power of time-to-event models. Stat Med 2012;31:2588-609. [PMID: 22829422 DOI: 10.1002/sim.5464] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2011] [Accepted: 03/24/2012] [Indexed: 01/14/2023]

Schoop R, Schumacher M, Graf E. Measures of prediction error for survival data with longitudinal covariates. Biom J 2011;53:275-93. [PMID: 21308724 DOI: 10.1002/bimj.201000145] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2010] [Revised: 10/25/2010] [Accepted: 11/22/2010] [Indexed: 11/12/2022]