1
|
Huang TJ, Luedtke A, McKeague IW. EFFICIENT ESTIMATION OF THE MAXIMAL ASSOCIATION BETWEEN MULTIPLE PREDICTORS AND A SURVIVAL OUTCOME. Ann Stat 2023; 51:1965-1988. [PMID: 38405375 PMCID: PMC10888526 DOI: 10.1214/23-aos2313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
This paper develops a new approach to post-selection inference for screening high-dimensional predictors of survival outcomes. Post-selection inference for right-censored outcome data has been investigated in the literature, but much remains to be done to make the methods both reliable and computationally-scalable in high-dimensions. Machine learning tools are commonly used to provide predictions of survival outcomes, but the estimated effect of a selected predictor suffers from confirmation bias unless the selection is taken into account. The new approach involves the construction of semi-parametrically efficient estimators of the linear association between the predictors and the survival outcome, which are used to build a test statistic for detecting the presence of an association between any of the predictors and the outcome. Further, a stabilization technique reminiscent of bagging allows a normal calibration for the resulting test statistic, which enables the construction of confidence intervals for the maximal association between predictors and the outcome and also greatly reduces computational cost. Theoretical results show that this testing procedure is valid even when the number of predictors grows superpolynomially with sample size, and our simulations support this asymptotic guarantee at moderate sample sizes. The new approach is applied to the problem of identifying patterns in viral gene expression associated with the potency of an antiviral drug.
Collapse
Affiliation(s)
- Tzu-Jung Huang
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center
| | - Alex Luedtke
- Department of Statistics, University of Washington
| | | |
Collapse
|
2
|
Liu Y, Li G. Sure Joint Screening for High Dimensional Cox's Proportional Hazards Model Under the Case-Cohort Design. J Comput Biol 2023; 30:663-677. [PMID: 37140454 PMCID: PMC10282795 DOI: 10.1089/cmb.2022.0416] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/05/2023] Open
Abstract
This study develops a sure joint feature screening method for the case-cohort design with ultrahigh-dimensional covariates. Our method is based on a sparsity-restricted Cox proportional hazards model. An iterative reweighted hard thresholding algorithm is proposed to approximate the sparsity-restricted, pseudo-partial likelihood estimator for joint screening. We rigorously show that our method possesses the sure screening property, with the probability of retaining all relevant covariates tending to 1 as the sample size goes to infinity. Our simulation results demonstrate that the proposed procedure has substantially improved screening performance over some existing feature screening methods for the case-cohort design, especially when some covariates are jointly correlated, but marginally uncorrelated, with the event time outcome. A real data illustration is provided using breast cancer data with high-dimensional genomic covariates. We have implemented the proposed method using MATLAB and made it available to readers through GitHub.
Collapse
Affiliation(s)
- Yi Liu
- Department of Mathematics, School of Mathematical Sciences, Ocean University of China, Qingdao, China
| | - Gang Li
- Department of Biostatistics, University of California at Los Angeles, Los Angeles, California, USA
| |
Collapse
|
3
|
Chen LP. A note of feature screening via a rank-based coefficient of correlation. Biom J 2023:e2100373. [PMID: 37160692 DOI: 10.1002/bimj.202100373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Revised: 09/13/2022] [Accepted: 02/05/2023] [Indexed: 05/11/2023]
Abstract
Feature screening is a useful and popular tool to detect informative predictors for ultrahigh-dimensional data before developing statistical analysis or constructing statistical models. While a large body of feature screening procedures has been developed, most methods are restricted to examine either continuous or discrete responses. Moreover, even though many model-free feature screening methods have been proposed, additional assumptions are imposed in those methods to ensure their theoretical results. To address those difficulties and provide simple implementation, in this paper we extend the rank-based coefficient of correlation to develop a feature screening procedure. We show that this new screening criterion is able to deal with continuous and binary responses. Theoretically, the sure screening property is established to justify the proposed method. Simulation studies demonstrate that the predictors with nonlinear and oscillatory trajectories are successfully retained regardless of the distribution of the response. Finally, the proposed method is implemented to analyze two microarray datasets.
Collapse
Affiliation(s)
- Li-Pang Chen
- Department of Statistics, National Chengchi University, Taipei, Taiwan, ROC
| |
Collapse
|
4
|
Xiong W, Chen Y, Ma S. Unified model-free interaction screening via CV-entropy filter. Comput Stat Data Anal 2023; 180:107684. [PMID: 36910335 PMCID: PMC9997997 DOI: 10.1016/j.csda.2022.107684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
For many practical high-dimensional problems, interactions have been increasingly found to play important roles beyond main effects. A representative example is gene-gene interaction. Joint analysis, which analyzes all interactions and main effects in a single model, can be seriously challenged by high dimensionality. For high-dimensional data analysis in general, marginal screening has been established as effective for reducing computational cost, increasing stability, and improving estimation/selection performance. Most of the existing marginal screening methods are designed for the analysis of main effects only. The existing screening methods for interaction analysis are often limited by making stringent model assumptions, lacking robustness, and/or requiring predictors to be continuous (and hence lacking flexibility). A unified marginal screening approach tailored to interaction analysis is developed, which can be applied to regression, classification, and survival analysis. Predictors are allowed to be continuous and discrete. The proposed approach is built on Coefficient of Variation (CV) filters based on information entropy. Statistical properties are rigorously established. It is shown that the CV filters are almost insensitive to the distribution tails of predictors, correlation structure among predictors, and sparsity level of signals. An efficient two-stage algorithm is developed to make the proposed approach scalable to ultrahigh-dimensional data. Simulations and the analysis of TCGA LUAD data further establish the practical superiority of the proposed approach.
Collapse
Affiliation(s)
- Wei Xiong
- School of Statistics, University of International Business and Economics, Beijing 100872, PR China
| | - Yaxian Chen
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong
| | - Shuangge Ma
- Department of Biostatistics, Yale School of Public Health, USA
| |
Collapse
|
5
|
Yuan Z, Chen J, Qiu H, Huang Y. Quantile-Adaptive Sufficient Variable Screening by Controlling False Discovery. ENTROPY (BASEL, SWITZERLAND) 2023; 25:524. [PMID: 36981413 PMCID: PMC10048075 DOI: 10.3390/e25030524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 03/06/2023] [Accepted: 03/14/2023] [Indexed: 06/18/2023]
Abstract
Sufficient variable screening rapidly reduces dimensionality with high probability in ultra-high dimensional modeling. To rapidly screen out the null predictors, a quantile-adaptive sufficient variable screening framework is developed by controlling the false discovery. Without any specification of an actual model, we first introduce a compound testing procedure based on the conditionally imputing marginal rank correlation at different quantile levels of response to select active predictors in high dimensionality. The testing statistic can capture sufficient dependence through two paths: one is to control false discovery adaptively and the other is to control the false discovery rate by giving a prespecified threshold. It is computationally efficient and easy to implement. We establish the theoretical properties under mild conditions. Numerical studies including simulation studies and real data analysis contain supporting evidence that the proposal performs reasonably well in practical settings.
Collapse
Affiliation(s)
- Zihao Yuan
- Department of Statistics, Wuhan University of Technology, Wuhan 430070, China
| | - Jiaqing Chen
- Department of Statistics, Wuhan University of Technology, Wuhan 430070, China
| | - Han Qiu
- Department of Statistics, Wuhan University of Technology, Wuhan 430070, China
| | - Yangxin Huang
- Department of Epidemiology and Biostatistics, University of South Florida, Tampa, FL 33612, USA
| |
Collapse
|
6
|
Salerno S, Li Y. High-Dimensional Survival Analysis: Methods and Applications. ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION 2023; 10:25-49. [PMID: 36968638 PMCID: PMC10038209 DOI: 10.1146/annurev-statistics-032921-022127] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
In the era of precision medicine, time-to-event outcomes such as time to death or progression are routinely collected, along with high-throughput covariates. These high-dimensional data defy classical survival regression models, which are either infeasible to fit or likely to incur low predictability due to over-fitting. To overcome this, recent emphasis has been placed on developing novel approaches for feature selection and survival prognostication. We will review various cutting-edge methods that handle survival outcome data with high-dimensional predictors, highlighting recent innovations in machine learning approaches for survival prediction. We will cover the statistical intuitions and principles behind these methods and conclude with extensions to more complex settings, where competing events are observed. We exemplify these methods with applications to the Boston Lung Cancer Survival Cohort study, one of the largest cancer epidemiology cohorts investigating the complex mechanisms of lung cancer.
Collapse
Affiliation(s)
- Stephen Salerno
- Department of Biostatistics, University of Michigan, Ann Arbor, United States, 48109
| | - Yi Li
- Department of Biostatistics, University of Michigan, Ann Arbor, United States, 48109
| |
Collapse
|
7
|
Wang M, Zhang X, Wan ATK, You K, Zou G. Jackknife model averaging for high-dimensional quantile regression. Biometrics 2023; 79:178-189. [PMID: 34608993 DOI: 10.1111/biom.13574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2019] [Revised: 07/08/2021] [Accepted: 09/14/2021] [Indexed: 11/29/2022]
Abstract
In this paper, we propose a frequentist model averaging method for quantile regression with high-dimensional covariates. Although research on these subjects has proliferated as separate approaches, no study has considered them in conjunction. Our method entails reducing the covariate dimensions through ranking the covariates based on marginal quantile utilities. The second step of our method implements model averaging on the models containing the covariates that survive the screening of the first step. We use a delete-one cross-validation method to select the model weights, and prove that the resultant estimator possesses an optimal asymptotic property uniformly over any compact (0,1) subset of the quantile indices. Our proof, which relies on empirical process theory, is arguably more challenging than proofs of similar results in other contexts owing to the high-dimensional nature of the problem and our relaxation of the conventional assumption of the weights summing to one. Our investigation of finite-sample performance demonstrates that the proposed method exhibits very favorable properties compared to the least absolute shrinkage and selection operator (LASSO) and smoothly clipped absolute deviation (SCAD) penalized regression methods. The method is applied to a microarray gene expression data set.
Collapse
Affiliation(s)
- Miaomiao Wang
- School of Chinese Materia Medica, Beijing University of Chinese Medicine, Beijing, China
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
- University of the Chinese Academy of Sciences, Beijing, China
| | - Xinyu Zhang
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
- Beijing Academy of Artificial Intelligence, Beijing, China
| | - Alan T K Wan
- Department of Management Sciences, City University of Hong Kong, Kowloon, Hong Kong
| | - Kang You
- School of Mathematical Sciences, Capital Normal University, Beijing, China
| | - Guohua Zou
- School of Mathematical Sciences, Capital Normal University, Beijing, China
| |
Collapse
|
8
|
Li T, Yu J, Meng C. Scalable model-free feature screening via sliced-Wasserstein dependency. J Comput Graph Stat 2023. [DOI: 10.1080/10618600.2023.2183213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
Affiliation(s)
- Tao Li
- Center for Applied Statistics, Institute of Statistics and Big Data, Renmin University of China
| | - Jun Yu
- School of Mathematics and Statistics, Beijing Institute of Technology
| | - Cheng Meng
- Center for Applied Statistics, Institute of Statistics and Big Data, Renmin University of China
| |
Collapse
|
9
|
Liu Y, Luo S. Feature selection in ultrahigh-dimensional additive models with heterogenous frequency component functions. J Stat Plan Inference 2023. [DOI: 10.1016/j.jspi.2023.01.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
|
10
|
Ke C, Bandyopadhyay D, Acunzo M, Winn R. Gene Screening in High-Throughput Right-Censored Lung Cancer Data. ONCO 2022; 2:305-318. [PMID: 37066112 PMCID: PMC10100230 DOI: 10.3390/onco2040017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
Background Advances in sequencing technologies have allowed collection of massive genome-wide information that substantially advances lung cancer diagnosis and prognosis. Identifying influential markers for clinical endpoints of interest has been an indispensable and critical component of the statistical analysis pipeline. However, classical variable selection methods are not feasible or reliable for high-throughput genetic data. Our objective is to propose a model-free gene screening procedure for high-throughput right-censored data, and to develop a predictive gene signature for lung squamous cell carcinoma (LUSC) with the proposed procedure. Methods A gene screening procedure was developed based on a recently proposed independence measure. The Cancer Genome Atlas (TCGA) data on LUSC was then studied. The screening procedure was conducted to narrow down the set of influential genes to 378 candidates. A penalized Cox model was then fitted to the reduced set, which further identified a 6-gene signature for LUSC prognosis. The 6-gene signature was validated on datasets from the Gene Expression Omnibus. Results Both model-fitting and validation results reveal that our method selected influential genes that lead to biologically sensible findings as well as better predictive performance, compared to existing alternatives. According to our multivariable Cox regression analysis, the 6-gene signature was indeed a significant prognostic factor (p-value < 0.001) while controlling for clinical covariates. Conclusions Gene screening as a fast dimension reduction technique plays an important role in analyzing high-throughput data. The main contribution of this paper is to introduce a fundamental yet pragmatic model-free gene screening approach that aids statistical analysis of right-censored cancer data, and provide a lateral comparison with other available methods in the context of LUSC.
Collapse
Affiliation(s)
- Chenlu Ke
- Department of Statistical Sciences and Operations Research, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Dipankar Bandyopadhyay
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA 23284, USA
- Correspondence: ; Tel.: +1-804-827-2058
| | - Mario Acunzo
- Department of Internal Medicine, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Robert Winn
- Massey Cancer Center, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
11
|
Li L, Ke C, Yin X, Yu Z. Generalized martingale difference divergence: Detecting conditional mean independence with applications in variable screening. Comput Stat Data Anal 2022. [DOI: 10.1016/j.csda.2022.107618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
12
|
Feature screening and FDR control with knockoff features for ultrahigh-dimensional right-censored data. Comput Stat Data Anal 2022. [DOI: 10.1016/j.csda.2022.107504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
13
|
Tong Z, Cai Z, Yang S, Li R. Model-Free Conditional Feature Screening with FDR Control. J Am Stat Assoc 2022. [DOI: 10.1080/01621459.2022.2063130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Zhaoxue Tong
- Pennsylvania State University, University Park, PA
| | | | | | - Runze Li
- Pennsylvania State University, University Park, PA
| |
Collapse
|
14
|
Ma W, Xiao J, Yang Y, Ye F. Model-free feature screening for ultrahigh dimensional data via a Pearson chi-square based index. J STAT COMPUT SIM 2022. [DOI: 10.1080/00949655.2022.2062358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Weidong Ma
- Department of Mathematical Sciences, Tsinghua University, Beijing, People's Republic of China
| | - Jingsong Xiao
- Department of Mathematical Sciences, Tsinghua University, Beijing, People's Republic of China
| | - Ying Yang
- Department of Mathematical Sciences, Tsinghua University, Beijing, People's Republic of China
| | - Fei Ye
- School of Statistics, Capital University of Economics and Business, Beijing, People's Republic of China
| |
Collapse
|
15
|
Min K, Mai Q. A general framework for tensor screening through smoothing. Electron J Stat 2022. [DOI: 10.1214/21-ejs1954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Keqian Min
- Department of Statistics, Florida State University, Tallahassee, Florida 32306, U.S.A
| | - Qing Mai
- Department of Statistics, Florida State University, Tallahassee, Florida 32306, U.S.A
| |
Collapse
|
16
|
Yang B, Wu W, Yin X. On sufficient variable screening using log odds ratio filter. Electron J Stat 2022. [DOI: 10.1214/21-ejs1951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Baoying Yang
- Department of Statistics, College of Mathematics Southwest Jiaotong University, Chengdu, China
| | - Wenbo Wu
- Department of Management Science and Statistics The University of Texas at San Antonio, San Antonio, TX
| | - Xiangrong Yin
- Department of Statistics, University of Kentucky 319 Multidisciplinary Science Building, Lexington, KY 40536
| |
Collapse
|
17
|
Xiong W, Pan H. Interaction screening for high-dimensional heterogeneous data via robust hybrid metrics. Stat Med 2021; 40:6651-6673. [PMID: 34542189 DOI: 10.1002/sim.9204] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2021] [Revised: 07/22/2021] [Accepted: 09/02/2021] [Indexed: 11/07/2022]
Abstract
A novel model-free interaction screening approach called the hybrid metrics is introduced for high-dimensional heterogeneous data analysis. The metrics established based on the variation of conditional joint distribution function are measurements of interaction that include both size and direction. They are robust and can work with many types of response variables, including continuous, discrete, and categorical variables. We can apply the hybrid metrics to effective interaction selection for classification, response index models, and Poisson regression, among others. When dealing with classification, the hybrid metrics are capable of capturing both nonlinear category-general and category-specific interaction effects, providing us with a comprehensive overview and precise discovery of category information. When faced with a continuous response, the hybrid metrics perform fairly well even if the signal strength is weak, behaving as if the true interactions were known. To facilitate implementation, a fast two-stage procedure which naturally and efficiently enforces both strong and weak heredity is advocated. We further demonstrate their superior performances over popular competitors by exhaustive simulations and a SRBCT real data example. Supplementary materials for this article are available online.
Collapse
Affiliation(s)
- Wei Xiong
- School of Statistics, University of International Business and Economics, Beijing, China
| | - Han Pan
- School of Statistics, University of International Business and Economics, Beijing, China
| |
Collapse
|
18
|
Li Y, Li R, Qin Y, Lin C, Yang Y. Robust group variable screening based on maximum Lq-likelihood estimation. Stat Med 2021; 40:6818-6834. [PMID: 34658050 DOI: 10.1002/sim.9212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Revised: 08/11/2021] [Accepted: 09/13/2021] [Indexed: 11/06/2022]
Abstract
Variable screening plays an important role in ultra-high-dimensional data analysis. Most of the previous analyses have focused on individual predictor screening using marginal correlation or other rank-based techniques. When predictors can be naturally grouped, the structure information should be incorporated while applying variable screening. This study presents a group screening procedure that is based on maximum Lq-likelihood estimation, which is being increasingly used for robust estimation. The proposed method is robust against data contamination, including a heavy-tailed distribution of the response and a mixture of observations from different distributions. The sure screening property is rigorously established. Simulations demonstrate the competitive performance of the proposed method, especially in terms of its robustness against data contamination. Two real data analyses are presented to further illustrate its performance.
Collapse
Affiliation(s)
- Yang Li
- Center for Applied Statistics, Renmin University of China, Beijing, China.,School of Statistics, Renmin University of China, Beijing, China.,Statistical Consulting Center, Renmin University of China, Beijing, China
| | - Rong Li
- School of Statistics, Renmin University of China, Beijing, China.,Statistical Consulting Center, Renmin University of China, Beijing, China
| | - Yichen Qin
- Department of Operations, Business Analytics, and Information Systems, University of Cincinnati, Cincinnati, Ohio, USA
| | - Cunjie Lin
- Center for Applied Statistics, Renmin University of China, Beijing, China.,School of Statistics, Renmin University of China, Beijing, China.,Statistical Consulting Center, Renmin University of China, Beijing, China
| | - Yuhong Yang
- School of Statistics, University of Minnesota, Minneapolis, Minnesota, USA
| |
Collapse
|
19
|
He H, Deng G. Grouped feature screening for ultra-high dimensional data for the classification model. J STAT COMPUT SIM 2021. [DOI: 10.1080/00949655.2021.1981901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Hanji He
- College of Science, Guilin University of Technology, Guilin, People’s Republic of China
| | - Guangming Deng
- Applied Statistics Institute, Guilin University of Technology, Guilin, People’s Republic of China
| |
Collapse
|
20
|
Fei Z, Zheng Q, Hong HG, Li Y. Inference for High-Dimensional Censored Quantile Regression. J Am Stat Assoc 2021. [DOI: 10.1080/01621459.2021.1957900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Zhe Fei
- Department of Biostatistics, University of California, Los Angeles, CA
| | - Qi Zheng
- Department of Bioinformatics and Biostatistics, University of Louisville, KY
| | - Hyokyoung G. Hong
- Department of Statistics and Probability, Michigan State University, East Lansing, MI
| | - Yi Li
- Department of Biostatistics, University of Michigan, Ann Arbor, MI
| |
Collapse
|
21
|
Zhang J, Liu Y. Model-free slice screening for ultrahigh-dimensional survival data. J Appl Stat 2021; 48:1755-1774. [DOI: 10.1080/02664763.2020.1772734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
- Jing Zhang
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, Hubei, People's Republic of China
| | - Yanyan Liu
- School of Mathematics and Statistics, Wuhan University, Wuhan, Hubei, People's Republic of China
| |
Collapse
|
22
|
Li N, Peng X, Kawaguchi E, Suchard MA, Li G. A scalable surrogate L0 sparse regression method for generalized linear models with applications to large scale data. J Stat Plan Inference 2021. [DOI: 10.1016/j.jspi.2020.12.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
23
|
Zhong W, Wang J, Chen X. Censored mean variance sure independence screening for ultrahigh dimensional survival data. Comput Stat Data Anal 2021. [DOI: 10.1016/j.csda.2021.107206] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
|
24
|
Partition-based feature screening for categorical data via RKHS embeddings. Comput Stat Data Anal 2021. [DOI: 10.1016/j.csda.2021.107176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
25
|
Lu S, Chen X, Wang H. Conditional distance correlation sure independence screening for ultra-high dimensional survival data. COMMUN STAT-THEOR M 2021. [DOI: 10.1080/03610926.2019.1657454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Affiliation(s)
- Shuiyun Lu
- School of Statistics, Qufu Normal University, Qufu, China
| | - Xiaolin Chen
- School of Statistics, Qufu Normal University, Qufu, China
| | - Hong Wang
- School of Mathematics and Statistics, Central South University, Changsha, China
| |
Collapse
|
26
|
Chen X, Li C, Zhang T, Gao Z. On correlation rank screening for ultra-high dimensional competing risks data. J Appl Stat 2021; 49:1848-1864. [DOI: 10.1080/02664763.2021.1884209] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- Xiaolin Chen
- School of Statistics, Qufu Normal University, Qufu, People's Republic of China
| | - Chenguang Li
- School of Statistics, Qufu Normal University, Qufu, People's Republic of China
| | - Tao Zhang
- School of Science, Guangxi University of Science and Technology, Liuzhou, People's Republic of China
| | - Zhenlong Gao
- School of Statistics, Qufu Normal University, Qufu, People's Republic of China
| |
Collapse
|
27
|
Chen F, He X, Wang J. Learning sparse conditional distribution: An efficient kernel-based approach. Electron J Stat 2021. [DOI: 10.1214/21-ejs1824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Fang Chen
- School of Statistics and Management, Shanghai University of Finance and Economics
| | - Xin He
- School of Statistics and Management, Shanghai University of Finance and Economics
| | - Junhui Wang
- School of Data Science, City University of Hong Kong
| |
Collapse
|
28
|
Hu Q, Zhu L, Liu Y, Sun J, Srivastava DK, Robison LL. Nonparametric screening and feature selection for ultrahigh-dimensional Case II interval-censored failure time data. Biom J 2020; 62:1909-1925. [PMID: 32677168 PMCID: PMC7988961 DOI: 10.1002/bimj.201900154] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 05/16/2020] [Accepted: 05/18/2020] [Indexed: 11/07/2022]
Abstract
For the analysis of ultrahigh-dimensional data, the first step is often to perform screening and feature selection to effectively reduce the dimensionality while retaining all the active or relevant variables with high probability. For this, many methods have been developed under various frameworks but most of them only apply to complete data. In this paper, we consider an incomplete data situation, case II interval-censored failure time data, for which there seems to be no screening procedure. Basing on the idea of cumulative residual, a model-free or nonparametric method is developed and shown to have the sure independent screening property. In particular, the approach is shown to tend to rank the active variables above the inactive ones in terms of their association with the failure time of interest. A simulation study is conducted to demonstrate the usefulness of the proposed method and, in particular, indicates that it works well with general survival models and is capable of capturing the nonlinear covariates with interactions. Also the approach is applied to a childhood cancer survivor study that motivated this investigation.
Collapse
Affiliation(s)
- Qiang Hu
- School of Statistics, Renmin University of China, Beijing, P. R. China
| | - Liang Zhu
- Division of Clinical and Translational Sciences, Department of Internal Medicine, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Yanyan Liu
- School of Mathematics and Statistics, Wuhan University, Wuhan, P. R. China
| | - Jianguo Sun
- Department of Statistics, University of Missouri, Columbia, MO, USA
| | - Deo Kumar Srivastava
- Biostatistics Department, St. Jude Children’s Research Hospital, Memphis, TN, USA
| | - Leslie L. Robison
- Epidemiology and Cancer Control, St. Jude Children’s Research Hospital, Memphis, TN, USA
| |
Collapse
|
29
|
Dong Y, Yu Z, Zhu L. Model-free variable selection for conditional mean in regression. Comput Stat Data Anal 2020. [DOI: 10.1016/j.csda.2020.107042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
30
|
Xu J, Li WK, Ying Z. Variable screening for survival data in the presence of heterogeneous censoring. Scand Stat Theory Appl 2020. [DOI: 10.1111/sjos.12458] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Jinfeng Xu
- Department of Statistics Actuarial Science The University of Hong Kong Hong Kong
| | - Wai Keung Li
- Faculty of Liberal Arts and Social Sciences The Education University of Hong Kong Hong Kong
| | | |
Collapse
|
31
|
Huo L, Wen XM, Yu Z. A model-free conditional screening approach via sufficient dimension reduction. J Nonparametr Stat 2020. [DOI: 10.1080/10485252.2020.1834554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
- Lei Huo
- Department of Mathematics and Statistics, Missouri University of Science and Technology, Rolla, MO, USA
| | - Xuerong Meggie Wen
- Department of Mathematics and Statistics, Missouri University of Science and Technology, Rolla, MO, USA
| | - Zhou Yu
- School of Statistics, East China Normal University, Shanghai, People's Republic of China
| |
Collapse
|
32
|
Liu Y, Xu J, Li G. Sure joint feature screening in nonparametric transformation model for right censored data. CAN J STAT 2020. [DOI: 10.1002/cjs.11575] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Yi Liu
- School of Mathematical Sciences Ocean University of China Qingdao China
| | - Jinfeng Xu
- Department of Statistics & Actuarial Science The University of Hong Kong Hong Kong China
| | - Gang Li
- Department of Biostatistics University of California at Los Angeles Los Angeles CA U.S.A
| |
Collapse
|
33
|
|
34
|
An efficient algorithm for joint feature screening in ultrahigh-dimensional Cox’s model. Comput Stat 2020. [DOI: 10.1007/s00180-020-01032-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
35
|
Zhang J, Wang Q, Kang J. Feature screening under missing indicator imputation with non-ignorable missing response. Comput Stat Data Anal 2020. [DOI: 10.1016/j.csda.2020.106975] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
36
|
Chen X, Zhang Y, Liu Y, Chen X. Model-free feature screening for ultra-high dimensional competing risks data. Stat Probab Lett 2020. [DOI: 10.1016/j.spl.2020.108815] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
37
|
Chu Y, Lin L. Conditional SIRS for nonparametric and semiparametric models by marginal empirical likelihood. Stat Pap (Berl) 2020. [DOI: 10.1007/s00362-018-0993-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
38
|
Liu W, Ke Y, Liu J, Li R. Model-Free Feature Screening and FDR Control With Knockoff Features. J Am Stat Assoc 2020. [DOI: 10.1080/01621459.2020.1783274] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
- Wanjun Liu
- Department of Statistics, The Pennsylvania State University, University Park, PA
| | - Yuan Ke
- Department of Statistics, University of Georgia, Athens, GA
| | - Jingyuan Liu
- MOE Key Laboratory of Econometrics, Department of Statistics, School of Economics, Wang Yanan Institute for Studies in Economics, and Fujian Key Lab of Statistics, Xiamen University, Xiamen, China
| | - Runze Li
- Department of Statistics, The Pennsylvania State University, University Park, PA
| |
Collapse
|
39
|
Chen X, Liu W, Chen X. Model-free survival conditional feature screening. COMMUN STAT-SIMUL C 2020. [DOI: 10.1080/03610918.2020.1779293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
- Xiaolin Chen
- School of Statistics, Qufu Normal University, Qufu, China
| | - Wei Liu
- School of Statistics, Qufu Normal University, Qufu, China
| | - Xiaojing Chen
- School of Statistics, Qufu Normal University, Qufu, China
| |
Collapse
|
40
|
Lu S, Chen X, Xu S, Liu C. Joint model-free feature screening for ultra-high dimensional semi-competing risks data. Comput Stat Data Anal 2020. [DOI: 10.1016/j.csda.2020.106942] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
41
|
Liu Z, Xiong Z. Non-marginal feature screening for additive hazard model with ultrahigh-dimensional covariates. COMMUN STAT-THEOR M 2020. [DOI: 10.1080/03610926.2020.1770288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
- Zili Liu
- School of Mathematics and Statistics, Central China Normal University, Wuhan, China
| | - Zikang Xiong
- School of Mathematics and Statistics, Central China Normal University, Wuhan, China
| |
Collapse
|
42
|
Liu Y, Chen X, Li G. A new joint screening method for right-censored time-to-event data with ultra-high dimensional covariates. Stat Methods Med Res 2020; 29:1499-1513. [PMID: 31359834 PMCID: PMC8285086 DOI: 10.1177/0962280219864710] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In an ultra-high dimensional setting with a huge number of covariates, variable screening is useful for dimension reduction before applying a more refined method for model selection and statistical analysis. This paper proposes a new sure joint screening procedure for right-censored time-to-event data based on a sparsity-restricted semiparametric accelerated failure time model. Our method, referred to as Buckley-James assisted sure screening (BJASS), consists of an initial screening step using a sparsity-restricted least-squares estimate based on a synthetic time variable and a refinement screening step using a sparsity-restricted least-squares estimate with the Buckley-James imputed event times. The refinement step may be repeated several times to obtain more stable results. We show that with any fixed number of refinement steps, the BJASS procedure retains all important variables with probability tending to 1. Simulation results are presented to illustrate its performance in comparison with some marginal screening methods. Real data examples are provided using a diffuse large-B-cell lymphoma (DLBCL) data and a breast cancer data. We have implemented the BJASS method using Matlab and made it available to readers through Github https://github.com/yiucla/BJASS .
Collapse
Affiliation(s)
- Yi Liu
- School of Mathematical Sciences, Ocean University of China, Qingdao, China
| | - Xiaolin Chen
- School of Statistics, Qufu Normal University, Qufu, China
| | - Gang Li
- Department of Biostatistics, University of California at Los Angeles, Los Angeles, CA, USA
| |
Collapse
|
43
|
Zhao N, Xu Q, Tang ML, Wang H. Variable Screening for Near Infrared (NIR) Spectroscopy Data Based on Ridge Partial Least Squares Regression. Comb Chem High Throughput Screen 2020; 23:740-756. [PMID: 32342803 DOI: 10.2174/1386207323666200428114823] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2019] [Revised: 01/17/2020] [Accepted: 02/29/2020] [Indexed: 11/22/2022]
Abstract
AIM AND OBJECTIVE Near Infrared (NIR) spectroscopy data are featured by few dozen to many thousands of samples and highly correlated variables. Quantitative analysis of such data usually requires a combination of analytical methods with variable selection or screening methods. Commonly-used variable screening methods fail to recover the true model when (i) some of the variables are highly correlated, and (ii) the sample size is less than the number of relevant variables. In these cases, Partial Least Squares (PLS) regression based approaches can be useful alternatives. MATERIALS AND METHODS In this research, a fast variable screening strategy, namely the preconditioned screening for ridge partial least squares regression (PSRPLS), is proposed for modelling NIR spectroscopy data with high-dimensional and highly correlated covariates. Under rather mild assumptions, we prove that using Puffer transformation, the proposed approach successfully transforms the problem of variable screening with highly correlated predictor variables to that of weakly correlated covariates with less extra computational effort. RESULTS We show that our proposed method leads to theoretically consistent model selection results. Four simulation studies and two real examples are then analyzed to illustrate the effectiveness of the proposed approach. CONCLUSION By introducing Puffer transformation, high correlation problem can be mitigated using the PSRPLS procedure we construct. By employing RPLS regression to our approach, it can be made more simple and computational efficient to cope with the situation where model size is larger than the sample size while maintaining a high precision prediction.
Collapse
Affiliation(s)
- Naifei Zhao
- School of Mathematics and Statistics, Changsha University of Science & Technology, Changsha, P.R. China
| | - Qingsong Xu
- School of Mathematics and Statistics, Central South University Changsha, Hunan, P.R. China
| | - Man-Lai Tang
- Department of Mathematics and Statistics, Hang Seng University of Hong Kong, Hong Kong, P.R. China
| | - Hong Wang
- School of Mathematics and Statistics, Central South University Changsha, Hunan, P.R. China
| |
Collapse
|
44
|
Liu S, Li X, Zhang J. Ultrahigh dimensional feature screening for additive model with multivariate response. J STAT COMPUT SIM 2020. [DOI: 10.1080/00949655.2019.1703371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
- Shishi Liu
- Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing, People's Republic of China
| | - Xiangjie Li
- Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing, People's Republic of China
| | - Jingxiao Zhang
- Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing, People's Republic of China
| |
Collapse
|
45
|
Tang N, Yan X, Zhao X. Penalized generalized empirical likelihood with a diverging number of general estimating equations for censored data. Ann Stat 2020. [DOI: 10.1214/19-aos1870] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
46
|
Li X, Tang N, Xie J, Yan X. A nonparametric feature screening method for ultrahigh-dimensional missing response. Comput Stat Data Anal 2020. [DOI: 10.1016/j.csda.2019.106828] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
47
|
Ando T, Bai J. Quantile Co-Movement in Financial Markets: A Panel Quantile Model With Unobserved Heterogeneity. J Am Stat Assoc 2020. [DOI: 10.1080/01621459.2018.1543598] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
- Tomohiro Ando
- Melbourne Business School, Melbourne University, Carlton, Victoria, Australia
| | - Jushan Bai
- Department of Economics, Columbia University, New York, NY
- School of Finance, Nankai University, Tianjin, China
| |
Collapse
|
48
|
|
49
|
Honda T, Ing CK, Wu WY. Adaptively weighted group Lasso for semiparametric quantile regression models. BERNOULLI 2019. [DOI: 10.3150/18-bej1091] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
50
|
Regularized quantile regression for ultrahigh-dimensional data with nonignorable missing responses. METRIKA 2019. [DOI: 10.1007/s00184-019-00744-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|