1
|
Zhou B, Min B, Liu W, Li Y, Zhu F, Huang J, Fang J, Chen Q, Wu D. Construction of a five-gene-based prognostic model for relapsed/refractory acute lymphoblastic leukemia. Hematology 2024; 29:2412952. [PMID: 39453390 DOI: 10.1080/16078454.2024.2412952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2024] [Accepted: 09/30/2024] [Indexed: 10/26/2024] Open
Abstract
BACKGROUND Relapsed/refractory acute lymphoblastic leukemia (R/R ALL) continues to be a major cause of mortality in children worldwide, with around 15% of ALL patients experiencing relapse and approximately 10% eventually dying from the disease. Early identification of R/R ALL in children has posed a longstanding clinical challenge. METHOD Genetic analysis of survival outcomes in pediatric patients with ALL from the TARGET-ALL dataset revealed five risk score factors identified through the intersection of differential genes (relapse/non-relapse) from the GSE17703 and GSE6092 databases. A risk score equation was formulated using these factors and validated against prognostic data from 46 ALL cases at our institution. Patients from multiple datasets were stratified into high and low-score groups based on this equation. Protein-protein interaction networks (PPI) were then constructed using the intersecting differential genes from all three datasets to identify hub nodes and predict interacting transcription factors. Additionally, genes related to cell pyroptosis with varying expression across these datasets were screened, and a multifactorial ROC curve (incorporating risk score and differential expression of pyroptosis-related genes) was generated. Furthermore, relationships among variables in the predictive model were depicted using a nomogram, and model efficacy was assessed through decision curve analysis (DCA). RESULTS By analyzing the TARGET-ALL, GSE17703, and GSE6092 databases, we developed a prognostic risk assessment model for pediatric ALL incorporating BAG2, EPHA4, FBXO9, SNX10, and WNK1. Validation of this model was conducted using data from 46 pediatric ALL cases obtained from our institution. Following the identification of 27 differentially expressed genes, we constructed a PPI and identified the top 10 hub genes (PTPRC, BTK, LCK, PRKCQ, CD3D, CD27, CD3G, BLNK, RASGRP1, VPREB1). Using this network, we predicted the top 5 transcription factors (HOXB4, MYC, SOX2, E2F1, NANOG). ROC and DCA were conducted on pyroptosis-related genes exhibiting differential expression and risk scores. Subsequently, a nomogram was generated, demonstrating the effectiveness of the risk score in predicting prognosis for pediatric ALL patients. CONCLUSIONS We have developed a risk prediction model for pediatric R/R ALL utilizing the genes BAG2, EPHA4, FBXO9, SNX10, and WNK1. This model provides a scientific foundation for early identification of R/R ALL in children.
Collapse
Affiliation(s)
- Bi Zhou
- Department of Pediatric, Suzhou Hospital of AnHui Medical University, Suzhou City, People's Republic of China
| | - BoJie Min
- Department of Pediatrics, the First Affiliated Hospital of AnHui Medical University, Hefei City, People's Republic of China
| | - WenYuan Liu
- Department of Pediatrics, The Second Affiliated Hospital of AnHui Medical University, Hefei City, People's Republic of China
| | - Ying Li
- Department of Pediatric, Suzhou Hospital of AnHui Medical University, Suzhou City, People's Republic of China
| | - Feng Zhu
- Department of Pediatric, Suzhou Hospital of AnHui Medical University, Suzhou City, People's Republic of China
| | - Jin Huang
- Department of Pediatric, Suzhou Hospital of AnHui Medical University, Suzhou City, People's Republic of China
| | - Jing Fang
- Graduate School, Bengbu Medical College, Bengbu City, People's Republic of China
| | - Qin Chen
- Department of Nursing, Suzhou Hospital of AnHui Medical University, Suzhou City, People's Republic of China
| | - De Wu
- Department of Pediatrics, the First Affiliated Hospital of AnHui Medical University, Hefei City, People's Republic of China
| |
Collapse
|
2
|
Sun J, Lee KY. Generalized functional linear model with a point process predictor. Stat Med 2024; 43:1564-1576. [PMID: 38332307 DOI: 10.1002/sim.10023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Revised: 12/17/2023] [Accepted: 01/15/2024] [Indexed: 02/10/2024]
Abstract
Point process data have become increasingly popular these days. For example, many of the data captured in electronic health records (EHR) are in the format of point process data. It is of great interest to study the association between a point process predictor and a scalar response using generalized functional linear regression models. Various generalized functional linear regression models have been developed under different settings in the past decades. However, existing methods can only deal with functional or longitudinal predictors, not point process predictors. In this article, we propose a novel generalized functional linear regression model for a point process predictor. Our proposed model is based on the joint modeling framework, where we adopt a log-Gaussian Cox process model for the point process predictor and a generalized linear regression model for the outcome. We also develop a new algorithm for fast model estimation based on the Gaussian variational approximation method. We conduct extensive simulation studies to evaluate the performance of our proposed method and compare it to competing methods. The performance of our proposed method is further demonstrated on an EHR dataset of patients admitted into the intensive care units of the Beth Israel Deaconess Medical Center between 2001 and 2008.
Collapse
Affiliation(s)
- Jiehuan Sun
- Division of Epidemiology and Biostatistics, School of Public Health, University of Illinois Chicago, Chicago, Illinois, USA
| | - Kuang-Yao Lee
- Department of Statistics, Operations, and Data Science, Temple University, Philadelphia, Pennsylvania, USA
| |
Collapse
|
3
|
Gao J, Bonzel CL, Hong C, Varghese P, Zakir K, Gronsbell J. Semi-supervised ROC analysis for reliable and streamlined evaluation of phenotyping algorithms. J Am Med Inform Assoc 2024; 31:640-650. [PMID: 38128118 PMCID: PMC10873838 DOI: 10.1093/jamia/ocad226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 09/22/2023] [Accepted: 11/20/2023] [Indexed: 12/23/2023] Open
Abstract
OBJECTIVE High-throughput phenotyping will accelerate the use of electronic health records (EHRs) for translational research. A critical roadblock is the extensive medical supervision required for phenotyping algorithm (PA) estimation and evaluation. To address this challenge, numerous weakly-supervised learning methods have been proposed. However, there is a paucity of methods for reliably evaluating the predictive performance of PAs when a very small proportion of the data is labeled. To fill this gap, we introduce a semi-supervised approach (ssROC) for estimation of the receiver operating characteristic (ROC) parameters of PAs (eg, sensitivity, specificity). MATERIALS AND METHODS ssROC uses a small labeled dataset to nonparametrically impute missing labels. The imputations are then used for ROC parameter estimation to yield more precise estimates of PA performance relative to classical supervised ROC analysis (supROC) using only labeled data. We evaluated ssROC with synthetic, semi-synthetic, and EHR data from Mass General Brigham (MGB). RESULTS ssROC produced ROC parameter estimates with minimal bias and significantly lower variance than supROC in the simulated and semi-synthetic data. For the 5 PAs from MGB, the estimates from ssROC are 30% to 60% less variable than supROC on average. DISCUSSION ssROC enables precise evaluation of PA performance without demanding large volumes of labeled data. ssROC is also easily implementable in open-source R software. CONCLUSION When used in conjunction with weakly-supervised PAs, ssROC facilitates the reliable and streamlined phenotyping necessary for EHR-based research.
Collapse
Affiliation(s)
- Jianhui Gao
- Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada
| | - Clara-Lea Bonzel
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - Chuan Hong
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, United States
| | - Paul Varghese
- Health Informatics, Verily Life Sciences, Cambridge, MA, United States
| | - Karim Zakir
- Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada
| | - Jessica Gronsbell
- Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada
- Department of Family and Community Medicine, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
4
|
Huang TJ, Luedtke A, McKeague IW. EFFICIENT ESTIMATION OF THE MAXIMAL ASSOCIATION BETWEEN MULTIPLE PREDICTORS AND A SURVIVAL OUTCOME. Ann Stat 2023; 51:1965-1988. [PMID: 38405375 PMCID: PMC10888526 DOI: 10.1214/23-aos2313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
This paper develops a new approach to post-selection inference for screening high-dimensional predictors of survival outcomes. Post-selection inference for right-censored outcome data has been investigated in the literature, but much remains to be done to make the methods both reliable and computationally-scalable in high-dimensions. Machine learning tools are commonly used to provide predictions of survival outcomes, but the estimated effect of a selected predictor suffers from confirmation bias unless the selection is taken into account. The new approach involves the construction of semi-parametrically efficient estimators of the linear association between the predictors and the survival outcome, which are used to build a test statistic for detecting the presence of an association between any of the predictors and the outcome. Further, a stabilization technique reminiscent of bagging allows a normal calibration for the resulting test statistic, which enables the construction of confidence intervals for the maximal association between predictors and the outcome and also greatly reduces computational cost. Theoretical results show that this testing procedure is valid even when the number of predictors grows superpolynomially with sample size, and our simulations support this asymptotic guarantee at moderate sample sizes. The new approach is applied to the problem of identifying patterns in viral gene expression associated with the potency of an antiviral drug.
Collapse
Affiliation(s)
- Tzu-Jung Huang
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center
| | - Alex Luedtke
- Department of Statistics, University of Washington
| | | |
Collapse
|
5
|
Zhou L, Li Q, Xu J, Wang S, Song Z, Chen X, Ma Y, Lin Z, Chen B, Huang H. Cerebrospinal fluid metabolic markers predict prognosis behavior of primary central nervous system lymphoma with high-dose methotrexate-based chemotherapeutic treatment. Neurooncol Adv 2023; 5:vdac181. [PMID: 36879663 PMCID: PMC9985165 DOI: 10.1093/noajnl/vdac181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Background Primary central nervous system lymphoma (PCNSL) is a highly aggressive non-Hodgkin's B-cell lymphoma which normally treated by high-dose methotrexate (HD-MTX)-based chemotherapy. However, such treatment cannot always guarantee a good prognosis (GP) outcome while suffering several side effects. Thus, biomarkers or biomarker-based models that can predict PCNSL patient prognosis would be beneficial. Methods We first collected 48 patients with PCNSL and applied HPLC-MS/MS-based metabolomic analysis on such retrospective PCNSL patient samples. We then selected the highly dysregulated metabolites to build a logical regression model that can distinguish the survival time length by a scoring standard. Finally, we validated the logical regression model on a 33-patient prospective PCNSL cohort. Results Six metabolic features were selected from the cerebrospinal fluid (CSF) that can form a logical regression model to distinguish the patients with relatively GP (Z score ≤0.06) from the discovery cohort. We applied the metabolic marker-based model to a prospective recruited PCNSL patient cohort for further validation, and the model preformed nicely on such a validation cohort (AUC = 0.745). Conclusions We developed a logical regression model based on metabolic markers in CSF that can effectively predict PCNSL patient prognosis before the HD-MTX-based chemotherapy treatments.
Collapse
Affiliation(s)
- Liying Zhou
- Shanghai Key Laboratory of Metabolic Remodeling and Health, Institute of Metabolism and Integrative Biology, Fudan University, Shanghai, 200438, China
| | - Qing Li
- Department of Hematology, Huashan Hospital, Fudan University, Shanghai, 200438, China
| | - Jingshen Xu
- Shanghai Key Laboratory of Metabolic Remodeling and Health, Institute of Metabolism and Integrative Biology, Fudan University, Shanghai, 200438, China
| | - Shuaikang Wang
- Shanghai Key Laboratory of Metabolic Remodeling and Health, Institute of Metabolism and Integrative Biology, Fudan University, Shanghai, 200438, China
| | - Zhiqiang Song
- Shanghai Key Laboratory of Metabolic Remodeling and Health, Institute of Metabolism and Integrative Biology, Fudan University, Shanghai, 200438, China
- School of Life Sciences, Inner Mongolia University, Hohhot Inner Mongolia, 010021, China
| | - Xinyi Chen
- Shanghai Key Laboratory of Metabolic Remodeling and Health, Institute of Metabolism and Integrative Biology, Fudan University, Shanghai, 200438, China
| | - Yan Ma
- Department of Hematology, Huashan Hospital, Fudan University, Shanghai, 200438, China
| | - Zhiguang Lin
- Department of Hematology, Huashan Hospital, Fudan University, Shanghai, 200438, China
| | - Bobin Chen
- Department of Hematology, Huashan Hospital, Fudan University, Shanghai, 200438, China
| | - He Huang
- Shanghai Key Laboratory of Metabolic Remodeling and Health, Institute of Metabolism and Integrative Biology, Fudan University, Shanghai, 200438, China
- Shanghai Qi Zhi Institute, Shanghai, 200030, China
| |
Collapse
|
6
|
Su H, Wang Y, Li H. RNA m6A Methylation Regulators Multi-Omics Analysis in Prostate Cancer. Front Genet 2021; 12:768041. [PMID: 34899855 PMCID: PMC8661905 DOI: 10.3389/fgene.2021.768041] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Accepted: 11/12/2021] [Indexed: 01/29/2023] Open
Abstract
RNA N6-methyladenosine (m6A) methylation is known to be the most popular RNA modification in animals. Many research reports have elaborated on the effects of m6A regulators in medical practice, such as diagnosis, prognosis, and treatment. M6A modification has evident impacts on many aspects of RNA metabolism, just like RNA splicing, processing, translation, and stability. M6A also has a magnificent role in numerous types of cancers. We analyzed the prostate cancer datasets, from The Cancer Genome Atlas (TCGA) database, for every recognized m6A regulator in their gene expression, DNA methylation status and copy number variations (CNVs). We also systematically analyzed the relationship between different m6A regulators and the prognosis of prostate cancer. The results illustrated considerable differences in the expression of various m6A regulators between the prostate and normal cancer samples. At the same time, there were evident differences in the expression of various m6A regulators in prostate cancers with different Gleason scores. Subsequently, we determined CBLL1, FTO, YTHDC1, HNRNPA2B1 as crucial m6A regulators of prostate cancer. Premised on the expression of CBLL1, we also identified potential therapeutic agents for prostate cancer, and knockdown of FTO prominently inhibited prostate cells migration and invasion in vitro experiment.
Collapse
Affiliation(s)
- Hao Su
- Department of Urology, Chinese Academy of Medical Sciences, Peking Union Medical College, Peking Union Medical College Hospital, Beijing, China
| | - Yutao Wang
- Department of Urology, The First Affiliated Hospital of China Medical University, Shenyang, China
| | - Hongjun Li
- Department of Urology, Chinese Academy of Medical Sciences, Peking Union Medical College, Peking Union Medical College Hospital, Beijing, China
| |
Collapse
|
7
|
Constantino CS, Carvalho AM, Vinga S. Coupling sparse Cox models with clustering of longitudinal transcriptomics data for trauma prognosis. BioData Min 2021; 14:25. [PMID: 33853663 PMCID: PMC8048345 DOI: 10.1186/s13040-021-00257-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 03/29/2021] [Indexed: 11/18/2022] Open
Abstract
Background Longitudinal gene expression analysis and survival modeling have been proved to add valuable biological and clinical knowledge. This study proposes a novel framework to discover gene signatures and patterns in a high-dimensional time series transcriptomics data and to assess their association with hospital length of stay. Methods We investigated a longitudinal and high-dimensional gene expression dataset from 168 blunt-force trauma patients followed during the first 28 days after injury. To model the length of stay, an initial dimensionality reduction step was performed by applying Cox regression with elastic net regularization using gene expression data from the first hospitalization days. Also, a novel methodology to impute missing values to the genes selected previously was proposed. We then applied multivariate time series (MTS) clustering to analyse gene expression over time and to stratify patients with similar trajectories. The validation of the patients’ partitions obtained by MTS clustering was performed using Kaplan-Meier curves and log-rank tests. Results We were able to unravel 22 genes strongly associated with hospital’s discharge. Their expression values in the first days after trauma showed to be good predictors of the length of stay. The proposed mixed imputation method allowed to achieve a complete dataset of short time series with a minimum loss of information for the 28 days of follow-up. MTS clustering enabled to group patients with similar genes trajectories and, notably, with similar discharge days from the hospital. Patients within each cluster have comparable genes’ trajectories and may have an analogous response to injury. Conclusion The proposed framework was able to tackle the joint analysis of time-to-event information with longitudinal multivariate high-dimensional data. The application to length of stay and transcriptomics data revealed a strong relationship between gene expression trajectory and patients’ recovery, which may improve trauma patient’s management by healthcare systems. The proposed methodology can be easily adapted to other medical data, towards more effective clinical decision support systems for health applications.
Collapse
Affiliation(s)
- Cláudia S Constantino
- INESC-ID, Instituto Superior Técnico, ULisboa, R. Alves Redol 9, Lisbon, 1000-029, Portugal
| | - Alexandra M Carvalho
- Instituto de Telecomunicações, Instituto Superior Técnico, ULisboa, Av. Rovisco Pais 1, Lisbon, 1049-001, Portugal
| | - Susana Vinga
- INESC-ID, Instituto Superior Técnico, ULisboa, R. Alves Redol 9, Lisbon, 1000-029, Portugal. .,IDMEC, Instituto Superior Técnico, ULisboa, Av. Rovisco Pais 1, Lisbon, 1049-001, Portugal.
| |
Collapse
|
8
|
Huang TJ, McKeague IW, Qian M. Marginal screening for high-dimensional predictors of survival outcomes. Stat Sin 2019; 29:2105-2139. [PMID: 31938013 PMCID: PMC6959482 DOI: 10.5705/ss.202017.0298] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
This study develops a marginal screening test to detect the presence of significant predictors for a right-censored time-to-event outcome under a high-dimensional accelerated failure time (AFT) model. Establishing a rigorous screening test in this setting is challenging, because of the right censoring and the post-selection inference. In the latter case, an implicit variable selection step needs to be included to avoid inflating the Type-I error. A prior study solved this problem by constructing an adaptive resampling test under an ordinary linear regression. To accommodate right censoring, we develop a new approach based on a maximally selected Koul-Susarla-Van Ryzin estimator from a marginal AFT working model. A regularized bootstrap method is used to calibrate the test. Our test is more powerful and less conservative than both a Bonferroni correction of the marginal tests and other competing methods. The proposed method is evaluated in simulation studies and applied to two real data sets.
Collapse
Affiliation(s)
| | | | - Min Qian
- Department of Biostatistics, Columbia University
| |
Collapse
|
9
|
Ojal J, Goldblatt D, Tigoi C, Scott JAG. Effect of Maternally Derived Anti-protein and Anticapsular IgG Antibodies on the Rate of Acquisition of Nasopharyngeal Carriage of Pneumococcus in Newborns. Clin Infect Dis 2019; 66:121-130. [PMID: 29020230 PMCID: PMC5850545 DOI: 10.1093/cid/cix742] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2017] [Accepted: 08/11/2017] [Indexed: 12/23/2022] Open
Abstract
Background In developing countries, introduction of pneumococcal conjugate vaccine has not eliminated circulation of vaccine serotypes. Vaccinating pregnant mothers to increase antibody concentrations in their newborn infants may reduce the acquisition of pneumococcal carriage and subsequent risk of disease. We explored the efficacy of passive immunity, attributable to anti-protein and anticapsular pneumococcal antibodies, against acquisition of carriage. Methods We examined the rate of nasopharyngeal acquisition of pneumococci in the first 90 days of life associated with varying anticapsular and anti-protein antibody concentrations in infant cord/maternal venous blood in Kilifi, Kenya. We used multivariable Cox proportional hazard models to estimate continuous functions relating acquisition of nasopharyngeal carriage to the concentration of maternally derived antibody. Results Cord blood or maternal venous samples were collected from 976 mother-infant pairs. Pneumococci were acquired 561 times during 33,905 person-days of follow-up. Increasing concentrations of anti-protein antibodies were associated with either a reduction (PhtD1, PspAFam2, Spr0096, StkP) or, paradoxically, an increase (CbpA, LytC, PcpA, PiaA, PspAFam1, RrgBT4) in acquisition rate. We observed a nonsignificant reduction in the incidence of homologous carriage acquisition with high concentrations of maternally derived anticapsular antibodies to 5 serotypes (6A, 6B, 14, 19F, and 23F). Conclusion The protective efficacy of several anti-protein antibodies supports the strategy of maternal vaccination to protect young infants from carriage and invasive disease. We were not able to demonstrate that passive anticapsular antibodies were protective against carriage acquisition at naturally occurring concentrations though it remains possible they may do so at the higher concentrations elicited by vaccination.
Collapse
Affiliation(s)
- John Ojal
- KEMRI-Wellcome Trust Research Programme, Centre for Geographic Medicine-Coast, Kilifi, Kenya.,Department of Infectious Disease Epidemiology, Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - David Goldblatt
- Great Ormond Street Institute of Child Health, University College, London, United Kingdom
| | - Caroline Tigoi
- KEMRI-Wellcome Trust Research Programme, Centre for Geographic Medicine-Coast, Kilifi, Kenya
| | - J Anthony G Scott
- KEMRI-Wellcome Trust Research Programme, Centre for Geographic Medicine-Coast, Kilifi, Kenya.,Department of Infectious Disease Epidemiology, Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London, United Kingdom
| |
Collapse
|
10
|
Sun J, Herazo-Maya JD, Molyneaux PL, Maher TM, Kaminski N, Zhao H. Regularized Latent Class Model for Joint Analysis of High-Dimensional Longitudinal Biomarkers and a Time-to-Event Outcome. Biometrics 2018; 75:69-77. [PMID: 30178494 DOI: 10.1111/biom.12964] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
Although many modeling approaches have been developed to jointly analyze longitudinal biomarkers and a time-to-event outcome, most of these methods can only handle one or a few biomarkers. In this article, we propose a novel joint latent class model to deal with high dimensional longitudinal biomarkers. Our model has three components: a class membership model, a survival submodel, and a longitudinal submodel. In our model, we assume that covariates can potentially affect biomarkers and class membership. We adopt a penalized likelihood approach to infer which covariates have random effects and/or fixed effects on biomarkers, and which covariates are informative for the latent classes. Through extensive simulation studies, we show that our proposed method has improved performance in prediction and assigning subjects to the correct classes over other joint modeling methods and that bootstrap can be used to do inference for our model. We then apply our method to a dataset of patients with idiopathic pulmonary fibrosis, for whom gene expression profiles were measured longitudinally. We are able to identify four interesting latent classes with one class being at much higher risk of death compared to the other classes. We also find that each of the latent classes has unique trajectories in some genes, yielding novel biological insights.
Collapse
Affiliation(s)
- Jiehuan Sun
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, U.S.A
| | - Jose D Herazo-Maya
- Internal Medicine: Pulmonary, Critical Care & Sleep Medicine, Yale School of Medicine, New Haven, Connecticut, U.S.A
| | - Philip L Molyneaux
- Fibrosis Research Group, National Heart and Lung Institute, Imperial College, London.,Royal Brompton Hospital, Interstitial Lung Disease Unit, London
| | - Toby M Maher
- Fibrosis Research Group, National Heart and Lung Institute, Imperial College, London.,Royal Brompton Hospital, Interstitial Lung Disease Unit, London
| | - Naftali Kaminski
- Internal Medicine: Pulmonary, Critical Care & Sleep Medicine, Yale School of Medicine, New Haven, Connecticut, U.S.A
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, U.S.A
| |
Collapse
|
11
|
Ferrer L, Putter H, Proust-Lima C. Individual dynamic predictions using landmarking and joint modelling: Validation of estimators and robustness assessment. Stat Methods Med Res 2018; 28:3649-3666. [DOI: 10.1177/0962280218811837] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
After the diagnosis of a disease, one major objective is to predict cumulative probabilities of events such as clinical relapse or death from the individual information collected up to a prediction time, usually including biomarker repeated measurements. Several competing estimators have been proposed, mainly from two approaches: joint modelling and landmarking. These approaches differ by the information used, the model assumptions and the complexity of the computational procedures. This paper aims to review the two approaches, precisely define the derived estimators of dynamic predictions and compare their performances notably in case of misspecification. The ultimate goal is to provide key elements for the use of individual dynamic predictions in clinical practice. Prediction of two competing causes of prostate cancer progression from the history of prostate-specific antigen is used as a motivated example. We formally define the quantity to estimate and its estimators, propose techniques to assess the uncertainty around predictions and validate them. We then conduct an in-depth simulation study compare the estimators in terms of prediction error, discriminatory power, efficiency and robustness to model assumptions. We show that prediction tools should be handled with care, in particular by properly specifying models and estimators.
Collapse
Affiliation(s)
- Loïc Ferrer
- INSERM, UMR1219, Univ. Bordeaux, ISPED, Bordeaux, France
| | - Hein Putter
- Leiden University Medical Center, Leiden, the Netherlands
| | | |
Collapse
|
12
|
Park JE, Kim HS. Radiomics as a Quantitative Imaging Biomarker: Practical Considerations and the Current Standpoint in Neuro-oncologic Studies. Nucl Med Mol Imaging 2018; 52:99-108. [PMID: 29662558 DOI: 10.1007/s13139-017-0512-7] [Citation(s) in RCA: 66] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2017] [Revised: 11/29/2017] [Accepted: 12/28/2017] [Indexed: 12/29/2022] Open
Abstract
Radiomics utilizes high-dimensional imaging data to discover the association with diagnostic, prognostic, predictive endpoint or radiogenomics. It is an emerging field of study that potentially depicts the intratumoral heterogeneity from quantitative and classified high-throughput data. The radiomics approach has an analytic pipeline where the imaging features are extracted, processed and analyzed. At this point, special data handling is essential because it faces issues of a high-dimensional biomarker compared to a single biomarker approach. This article describes the potential role of radiomics in oncologic studies, the basic analytic pipeline and special data handling with high-dimensional data to facilitate the radiomics approach as a tool for personalized medicine in oncology.
Collapse
Affiliation(s)
- Ji Eun Park
- Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center, 88 Olympic-ro 43-gil, Songpa-gu, Seoul, 05505 South Korea
| | - Ho Sung Kim
- Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center, 88 Olympic-ro 43-gil, Songpa-gu, Seoul, 05505 South Korea
| |
Collapse
|
13
|
Ternès N, Rotolo F, Michiels S. Robust estimation of the expected survival probabilities from high-dimensional Cox models with biomarker-by-treatment interactions in randomized clinical trials. BMC Med Res Methodol 2017; 17:83. [PMID: 28532387 PMCID: PMC5441049 DOI: 10.1186/s12874-017-0354-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2016] [Accepted: 04/27/2017] [Indexed: 11/10/2022] Open
Abstract
Background Thanks to the advances in genomics and targeted treatments, more and more prediction models based on biomarkers are being developed to predict potential benefit from treatments in a randomized clinical trial. Despite the methodological framework for the development and validation of prediction models in a high-dimensional setting is getting more and more established, no clear guidance exists yet on how to estimate expected survival probabilities in a penalized model with biomarker-by-treatment interactions. Methods Based on a parsimonious biomarker selection in a penalized high-dimensional Cox model (lasso or adaptive lasso), we propose a unified framework to: estimate internally the predictive accuracy metrics of the developed model (using double cross-validation); estimate the individual survival probabilities at a given timepoint; construct confidence intervals thereof (analytical or bootstrap); and visualize them graphically (pointwise or smoothed with spline). We compared these strategies through a simulation study covering scenarios with or without biomarker effects. We applied the strategies to a large randomized phase III clinical trial that evaluated the effect of adding trastuzumab to chemotherapy in 1574 early breast cancer patients, for which the expression of 462 genes was measured. Results In our simulations, penalized regression models using the adaptive lasso estimated the survival probability of new patients with low bias and standard error; bootstrapped confidence intervals had empirical coverage probability close to the nominal level across very different scenarios. The double cross-validation performed on the training data set closely mimicked the predictive accuracy of the selected models in external validation data. We also propose a useful visual representation of the expected survival probabilities using splines. In the breast cancer trial, the adaptive lasso penalty selected a prediction model with 4 clinical covariates, the main effects of 98 biomarkers and 24 biomarker-by-treatment interactions, but there was high variability of the expected survival probabilities, with very large confidence intervals. Conclusion Based on our simulations, we propose a unified framework for: developing a prediction model with biomarker-by-treatment interactions in a high-dimensional setting and validating it in absence of external data; accurately estimating the expected survival probability of future patients with associated confidence intervals; and graphically visualizing the developed prediction model. All the methods are implemented in the R package biospear, publicly available on the CRAN. Electronic supplementary material The online version of this article (doi:10.1186/s12874-017-0354-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Nils Ternès
- Service de Biostatistique et d'Epidémiologie, Gustave Roussy, B2M, RdC.114 rue Edouard-Vaillant, 94805, Villejuif, France.,CESP, Fac. de médecine - Univ. Paris-Sud, Fac. de médecine - UVSQ, INSERM, Université Paris-Saclay, Villejuif, 94805, France
| | - Federico Rotolo
- Service de Biostatistique et d'Epidémiologie, Gustave Roussy, B2M, RdC.114 rue Edouard-Vaillant, 94805, Villejuif, France.,CESP, Fac. de médecine - Univ. Paris-Sud, Fac. de médecine - UVSQ, INSERM, Université Paris-Saclay, Villejuif, 94805, France
| | - Stefan Michiels
- Service de Biostatistique et d'Epidémiologie, Gustave Roussy, B2M, RdC.114 rue Edouard-Vaillant, 94805, Villejuif, France. .,CESP, Fac. de médecine - Univ. Paris-Sud, Fac. de médecine - UVSQ, INSERM, Université Paris-Saclay, Villejuif, 94805, France.
| |
Collapse
|