1
|
Fang X, Ahn KW, Cai J, Kim S. Efficient estimation for left-truncated competing risks regression for case-cohort studies. Biometrics 2024; 80:ujad008. [PMID: 38281769 PMCID: PMC10826882 DOI: 10.1093/biomtc/ujad008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 09/15/2023] [Accepted: 11/06/2023] [Indexed: 01/30/2024]
Abstract
The case-cohort study design provides a cost-effective study design for a large cohort study with competing risk outcomes. The proportional subdistribution hazards model is widely used to estimate direct covariate effects on the cumulative incidence function for competing risk data. In biomedical studies, left truncation often occurs and brings extra challenges to the analysis. Existing inverse probability weighting methods for case-cohort studies with competing risk data not only have not addressed left truncation, but also are inefficient in regression parameter estimation for fully observed covariates. We propose an augmented inverse probability-weighted estimating equation for left-truncated competing risk data to address these limitations of the current literature. We further propose a more efficient estimator when extra information from the other causes is available. The proposed estimators are consistent and asymptotically normally distributed. Simulation studies show that the proposed estimator is unbiased and leads to estimation efficiency gain in the regression parameter estimation. We analyze the Atherosclerosis Risk in Communities study data using the proposed methods.
Collapse
Affiliation(s)
- Xi Fang
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, WI 53226, United States
| | - Kwang Woo Ahn
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, WI 53226, United States
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, NC 27599, United States
| | - Soyoung Kim
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, WI 53226, United States
| |
Collapse
|
2
|
Wen F, Li C, Liang B, You J, Li X, Wang J, Liu H, Wang F, Dong Z, Zhang Y. Efficacy of high-dose-rate brachytherapy with different radiation source activities among cervical cancer patients and risk factors for long-term outcomes: A 6-year retrospective study. Brachytherapy 2024; 23:35-44. [PMID: 37919124 DOI: 10.1016/j.brachy.2023.09.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 07/20/2023] [Accepted: 09/14/2023] [Indexed: 11/04/2023]
Abstract
PURPOSE This study aimed to assess the impact of dose rates due to natural decay of Iridium-192 sources and the risk factors of clinical outcomes for cervical cancer patients treated with high-dose-rate (HDR) brachytherapy. METHODS AND MATERIALS Four ninety-four patients were divided into relatively-high-radioactive (rHR), relatively-medium-radioactive (rMR), and relatively-low-radioactive (rLR) groups for retrospective treatment response comparison. The short-term outcomes were evaluated using the 1-month /3-month follow-up results based on RECIST 1.1. Local recurrence-free survival (LRFS) and metastatic recurrence-free survival (MRFS) were selected as long-term outcomes. A class of transformation models with adaptive lasso was applied to assess the risk factors of long-term outcomes. RESULTS No significant difference was identified in short- or long-term outcomes of different radioactive groups. Subgroup analyses demonstrated similar findings. In multivariate factor analysis, advanced stage was significantly associated with higher risk of local recurrence and metastatic recurrence (HR = 1.66, 95%confidence interval [CI] = 1.14-2.43, p = 0.008; HR = 1.57, 95%CI = 1.23-2.00, p < 0.001). Significant associations were observed between local recurrence and pathology, and between metastatic recurrence and pre-treatment serum indices, respectively (HR = 8.62, 95%CI = 2.28-32.60, p = 0.002; HR = 1.98, 95%CI=1.20-2.26, p = 0.008). CONCLUSIONS Overall, there was no significant difference in long- or short-term efficacy of the HDR brachytherapy among the groups with different levels of activity of radiation sources. Stage, pathology, and pretreatment serum indices were crucial factors that affected the long-term outcomes.
Collapse
Affiliation(s)
- Fengyu Wen
- Department of Health Data Science, Institute of Medical Technology, Peking University Health Science Center, Beijing, China
| | - Chenguang Li
- Department of Radiation Oncology Physics, Institute of Medical Technology, Peking University, Beijing, China; Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Department of Radiation Oncology, Peking University Cancer Hospital & Institute, Beijing, China
| | - Baosheng Liang
- Department of Biostatistics, School of Public Health, Peking University, Beijing, China
| | - Jing You
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Department of Radiation Oncology, Peking University Cancer Hospital & Institute, Beijing, China
| | - Xiaofan Li
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Department of Radiation Oncology, Peking University Cancer Hospital & Institute, Beijing, China
| | - Jingyuan Wang
- Department of Biostatistics, School of Public Health, Peking University, Beijing, China
| | - Hongjia Liu
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Department of Radiation Oncology, Peking University Cancer Hospital & Institute, Beijing, China
| | - Fulin Wang
- Department of Health Data Science, Institute of Medical Technology, Peking University Health Science Center, Beijing, China
| | - Zhengkun Dong
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Department of Radiation Oncology, Peking University Cancer Hospital & Institute, Beijing, China
| | - Yibao Zhang
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Department of Radiation Oncology, Peking University Cancer Hospital & Institute, Beijing, China.
| |
Collapse
|
3
|
Zhong W, Diao G. Joint semiparametric models for case-cohort designs. Biometrics 2023; 79:1959-1971. [PMID: 35917392 DOI: 10.1111/biom.13728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Accepted: 07/20/2022] [Indexed: 11/28/2022]
Abstract
Two-phase studies such as case-cohort and nested case-control studies are widely used cost-effective sampling strategies. In the first phase, the observed failure/censoring time and inexpensive exposures are collected. In the second phase, a subgroup of subjects is selected for measurements of expensive exposures based on the information from the first phase. One challenging issue is how to utilize all the available information to conduct efficient regression analyses of the two-phase study data. This paper proposes a joint semiparametric modeling of the survival outcome and the expensive exposures. Specifically, we assume a class of semiparametric transformation models and a semiparametric density ratio model for the survival outcome and the expensive exposures, respectively. The class of semiparametric transformation models includes the proportional hazards model and the proportional odds model as special cases. The density ratio model is flexible in modeling multivariate mixed-type data. We develop efficient likelihood-based estimation and inference procedures and establish the large sample properties of the nonparametric maximum likelihood estimators. Extensive numerical studies reveal that the proposed methods perform well under practical settings. The proposed methods also appear to be reasonably robust under various model mis-specifications. An application to the National Wilms Tumor Study is provided.
Collapse
Affiliation(s)
- Weibin Zhong
- Global Biometrics & Data Sciences, Bristol Myers Squibb, Berkeley Heights, New Jersey, USA
| | - Guoqing Diao
- Department of Biostatistics and Bioinformatics, The George Washington University, Washington, District of Columbia, USA
| |
Collapse
|
4
|
Generalized accelerated failure time model with censored data from case-cohort studies. J Stat Plan Inference 2023. [DOI: 10.1016/j.jspi.2022.10.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
5
|
Mao F, Cook RJ. Two-phase designs with current status data. Stat Med 2023; 42:1207-1232. [PMID: 36690474 DOI: 10.1002/sim.9666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 11/01/2022] [Accepted: 01/05/2023] [Indexed: 01/25/2023]
Abstract
We consider the design and analysis of two-phase studies aiming to assess the relation between a fixed (eg, genetic) marker and an event time under current status observation. We consider a common setting in which a phase I sample is comprised of a large cohort of individuals with outcome (ie, current status) data and a vector of inexpensive covariates. Stored biospecimens for individuals in the phase I sample can be assayed to record the marker of interest for individuals selected in a phase II sub-sample. The design challenge is then to select the phase II sub-sample in order to maximize the precision of the marker effect on the time of interest under a proportional hazards model. This problem has not been examined before for current status data and the role of the assessment time is highlighted. Inference based on likelihood and inverse probability weighted estimating functions are considered, with designs centered on score-based residuals, extreme current status observations, or stratified sampling schemes. Data from a registry of patients with psoriatic arthritis is used in an illustration where we study the risk of diabetes as a comorbidity.
Collapse
Affiliation(s)
- Fangya Mao
- Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Canada
| | - Richard J Cook
- Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Canada
| |
Collapse
|
6
|
Pan Y, Deng L. Generalized case-cohort and inference for Cox’s model with parameter constraints. COMMUN STAT-SIMUL C 2022. [DOI: 10.1080/03610918.2020.1714661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
- Yingli Pan
- Hubei Key Laboratory of Applied Mathematics, Faculty of Mathematics and Statistics, Hubei University, Wuhan, China
| | - Lifeng Deng
- College of Mathematics and Systems Science, Shandong University of Science and Technology, Qingdao, Shandong, China
| |
Collapse
|
7
|
Xu Y, Kim S, Zhang MJ, Couper D, Ahn KW. Competing risks regression models with covariates-adjusted censoring weight under the generalized case-cohort design. LIFETIME DATA ANALYSIS 2022; 28:241-262. [PMID: 35034255 PMCID: PMC8977245 DOI: 10.1007/s10985-022-09546-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/03/2021] [Accepted: 12/31/2021] [Indexed: 06/14/2023]
Abstract
A generalized case-cohort design has been used when measuring exposures is expensive and events are not rare in the full cohort. This design collects expensive exposure information from a (stratified) randomly selected subset from the full cohort, called the subcohort, and a fraction of cases outside the subcohort. For the full cohort study with competing risks, He et al. (Scand J Stat 43:103-122, 2016) studied the non-stratified proportional subdistribution hazards model with covariate-dependent censoring to directly evaluate covariate effects on the cumulative incidence function. In this paper, we propose a stratified proportional subdistribution hazards model with covariate-adjusted censoring weights for competing risks data under the generalized case-cohort design. We consider a general class of weight functions to account for the generalized case-cohort design. Then, we derive the optimal weight function which minimizes the asymptotic variance of parameter estimates within the general class of weight functions. The proposed estimator is shown to be consistent and asymptotically normally distributed. The simulation studies show (i) the proposed estimator with covariate-adjusted weight is unbiased when the censoring distribution depends on covariates; and (ii) the proposed estimator with the optimal weight function gains parameter estimation efficiency. We apply the proposed method to stem cell transplantation and diabetes data sets.
Collapse
Affiliation(s)
- Yayun Xu
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, WI, 53226-0509, USA
| | - Soyoung Kim
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, WI, 53226-0509, USA.
| | - Mei-Jie Zhang
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, WI, 53226-0509, USA
| | - David Couper
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Kwang Woo Ahn
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, WI, 53226-0509, USA
| |
Collapse
|
8
|
Shang W. Statistical inference for Cox model under case-cohort design with subgroup survival information. J Korean Stat Soc 2022. [DOI: 10.1007/s42952-022-00166-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
9
|
Nonparametric inference for distribution functions with stratified samples. J Stat Plan Inference 2021. [DOI: 10.1016/j.jspi.2021.05.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
10
|
Zhang J, Zhou H, Liu Y, Cai J. Conditional screening for ultrahigh-dimensional survival data in case-cohort studies. LIFETIME DATA ANALYSIS 2021; 27:632-661. [PMID: 34417679 PMCID: PMC8561435 DOI: 10.1007/s10985-021-09531-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Accepted: 08/05/2021] [Indexed: 06/13/2023]
Abstract
The case-cohort design has been widely used to reduce the cost of covariate measurements in large cohort studies. In many such studies, the number of covariates is very large, and the goal of the research is to identify active covariates which have great influence on response. Since the introduction of sure independence screening, screening procedures have achieved great success in terms of effectively reducing the dimensionality and identifying active covariates. However, commonly used screening methods are based on marginal correlation or its variants, they may fail to identify hidden active variables which are jointly important but are weakly correlated with the response. Moreover, these screening methods are mainly proposed for data under the simple random sampling and can not be directly applied to case-cohort data. In this paper, we consider the ultrahigh-dimensional survival data under the case-cohort design, and propose a conditional screening method by incorporating some important prior known information of active variables. This method can effectively detect hidden active variables. Furthermore, it possesses the sure screening property under some mild regularity conditions and does not require any complicated numerical optimization. We evaluate the finite sample performance of the proposed method via extensive simulation studies and further illustrate the new approach through a real data set from patients with breast cancer.
Collapse
Affiliation(s)
- Jing Zhang
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, 430073, China
| | - Haibo Zhou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599-7420, USA
| | - Yanyan Liu
- School of Mathematics and Statistics, Wuhan University, Wuhan, 430072, China
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599-7420, USA.
| |
Collapse
|
11
|
Zhang J, Zhou H, Liu Y, Cai J. Feature screening for case‐cohort studies with failure time outcome. Scand Stat Theory Appl 2020; 48:349-370. [DOI: 10.1111/sjos.12503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Jing Zhang
- School of Statistics and Mathematics Zhongnan University of Economics and Law Wuhan China
| | - Haibo Zhou
- Department of Biostatistics University of North Carolina at Chapel Hill Chapel Hill North Carolina USA
| | - Yanyan Liu
- School of Mathematics and Statistics Wuhan University Wuhan China
| | - Jianwen Cai
- Department of Biostatistics University of North Carolina at Chapel Hill Chapel Hill North Carolina USA
| |
Collapse
|
12
|
Che M, Lawless JF, Han P. Empirical and conditional likelihoods for two‐phase studies. CAN J STAT 2020. [DOI: 10.1002/cjs.11566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Menglu Che
- Department of Statistics and Actuarial Science University of Waterloo Waterloo Ontario Canada
| | - Jerald F. Lawless
- Department of Statistics and Actuarial Science University of Waterloo Waterloo Ontario Canada
| | - Peisong Han
- Department of Biostatistics, School of Public Health University of Michigan Ann Arbor MI U.S.A
| |
Collapse
|
13
|
Zhou J, Shen G, Chen X, Lin Y. Efficient fused learning for distributed imbalanced data. COMMUN STAT-THEOR M 2020. [DOI: 10.1080/03610926.2020.1759641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
- Jie Zhou
- School of Mathematics, Capital Normal University, Beijing, China
| | - Guohao Shen
- Department of Statistics, Chinese University of Hong Kong, Hong Kong, China
| | - Xuan Chen
- School of Mathematics, Capital Normal University, Beijing, China
| | - Yuanyuan Lin
- Department of Statistics, Chinese University of Hong Kong, Hong Kong, China
| |
Collapse
|
14
|
Du M, Zhou Q, Zhao S, Sun J. Regression Analysis of Case-cohort Studies in the Presence of Dependent Interval Censoring. J Appl Stat 2020; 48:846-865. [PMID: 33767519 DOI: 10.1080/02664763.2020.1752633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
The case-cohort design is widely used as a means of reducing the cost in large cohort studies, especially when the disease rate is low and covariate measurements may be expensive, and has been discussed by many authors. In this paper, we discuss regression analysis of case-cohort studies that produce interval-censored failure time with dependent censoring, a situation for which there does not seem to exist an established approach. For inference, a sieve inverse probability weighting estimation procedure is developed with the use of Bernstein polynomials to approximate the unknown baseline cumulative hazard functions. The proposed estimators are shown to be consistent and the asymptotic normality of the resulting regression parameter estimators are established. A simulation study is conducted to assess the finite sample properties of the proposed approach and indicates that it works well in practical situations. The proposed method is applied to an HIV/AIDS case-cohort study that motivated this investigation.
Collapse
Affiliation(s)
- Mingyue Du
- Center for Applied Statistical Research and College of Mathematics, Jilin University, Changchun, China
| | - Qingning Zhou
- Department of Mathematics and Statistics, The University of North Carolina at Charlotte, Charlotte, NC, USA
| | - Shishun Zhao
- Center for Applied Statistical Research and College of Mathematics, Jilin University, Changchun, China
| | - Jianguo Sun
- Department of Statistics, University of Missouri, Columbia, MO, USA
| |
Collapse
|
15
|
Han B, Wang X. Semiparametric estimation for the non-mixture cure model in case-cohort and nested case-control studies. Comput Stat Data Anal 2020. [DOI: 10.1016/j.csda.2019.106874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
16
|
Wang L, Williams ML, Chen Y, Chen J. Novel two-phase sampling designs for studying binary outcomes. Biometrics 2020; 76:210-223. [PMID: 31449330 PMCID: PMC7042058 DOI: 10.1111/biom.13140] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2017] [Accepted: 08/06/2019] [Indexed: 11/26/2022]
Abstract
In biomedical cohort studies for assessing the association between an outcome variable and a set of covariates, usually, some covariates can only be measured on a subgroup of study subjects. An important design question is-which subjects to select into the subgroup to increase statistical efficiency. When the outcome is binary, one may adopt a case-control sampling design or a balanced case-control design where cases and controls are further matched on a small number of complete discrete covariates. While the latter achieves success in estimating odds ratio (OR) parameters for the matching covariates, similar two-phase design options have not been explored for the remaining covariates, especially the incompletely collected ones. This is of great importance in studies where the covariates of interest cannot be completely collected. To this end, assuming that an external model is available to relate the outcome and complete covariates, we propose a novel sampling scheme that oversamples cases and controls with worse goodness-of-fit based on the external model and further matches them on complete covariates similarly to the balanced design. We develop a pseudolikelihood method for estimating OR parameters. Through simulation studies and explorations in a real-cohort study, we find that our design generally leads to reduced asymptotic variances of the OR estimates and the reduction for the matching covariates is comparable to that of the balanced design.
Collapse
Affiliation(s)
- Le Wang
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Mathematics and Statistics, Villanova University, Villanova, PA 19085, USA
| | - Matthew L Williams
- Division of Cardiovascular Surgery, Department of Surgery, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Yong Chen
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Jinbo Chen
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
17
|
Shin YE, Pfeiffer RM, Graubard BI, Gail MH. Weight calibration to improve the efficiency of pure risk estimates from case‐control samples nested in a cohort. Biometrics 2020; 76:1087-1097. [DOI: 10.1111/biom.13209] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Revised: 10/17/2019] [Accepted: 12/16/2019] [Indexed: 11/30/2022]
Affiliation(s)
- Yei Eun Shin
- Biostatistics Branch Division of Cancer Epidemiology and Genetics National Cancer Institute Rockville Maryland
| | - Ruth M. Pfeiffer
- Biostatistics Branch Division of Cancer Epidemiology and Genetics National Cancer Institute Rockville Maryland
| | - Barry I. Graubard
- Biostatistics Branch Division of Cancer Epidemiology and Genetics National Cancer Institute Rockville Maryland
| | - Mitchell H. Gail
- Biostatistics Branch Division of Cancer Epidemiology and Genetics National Cancer Institute Rockville Maryland
| |
Collapse
|
18
|
Pan Y. Generalized case-cohort analysis for constrained estimation in the Cox’s model. COMMUN STAT-SIMUL C 2020. [DOI: 10.1080/03610918.2018.1475008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
- Yingli Pan
- School of Mathematics and Statistics, Huazhong University of Science and Technology, Wuhan, China
| |
Collapse
|
19
|
Zhou Q, Cai J, Zhou H. Semiparametric inference for a two-stage outcome-dependent sampling design with interval-censored failure time data. LIFETIME DATA ANALYSIS 2020; 26:85-108. [PMID: 30617753 PMCID: PMC6612481 DOI: 10.1007/s10985-019-09461-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/27/2017] [Accepted: 01/02/2019] [Indexed: 06/09/2023]
Abstract
We propose a two-stage outcome-dependent sampling design and inference procedure for studies that concern interval-censored failure time outcomes. This design enhances the study efficiency by allowing the selection probabilities of the second-stage sample, for which the expensive exposure variable is ascertained, to depend on the first-stage observed interval-censored failure time outcomes. In particular, the second-stage sample is enriched by selectively including subjects who are known or observed to experience the failure at an early or late time. We develop a sieve semiparametric maximum pseudo likelihood procedure that makes use of all available data from the proposed two-stage design. The resulting regression parameter estimator is shown to be consistent and asymptotically normal, and a consistent estimator for its asymptotic variance is derived. Simulation results demonstrate that the proposed design and inference procedure performs well in practical situations and is more efficient than the existing designs and methods. An application to a phase 3 HIV vaccine trial is provided.
Collapse
Affiliation(s)
- Qingning Zhou
- Department of Mathematics and Statistics, University of North Carolina at Charlotte, Fretwell 335L, 9201 University City Blvd., Charlotte, NC, 28223, USA.
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, 3101D McGavran-Greenberg Hall, Chapel Hill, NC, 27599, USA
| | - Haibo Zhou
- Department of Biostatistics, University of North Carolina at Chapel Hill, 3104C McGavran-Greenberg Hall, Chapel Hill, NC, 27599, USA
| |
Collapse
|
20
|
Cao Y, Shi Y, Yu J. Statistical inference for the accelerated failure time model under two-stage generalized case–cohort design. COMMUN STAT-THEOR M 2019. [DOI: 10.1080/03610926.2018.1528363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
- Yongxiu Cao
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, China
| | - Yueyong Shi
- School of Economics and Management, China University of Geosciences, Wuhan, China
- Center for Resources and Environmental Economic Research, China University of Geosciences, Wuhan, China
| | - Jichang Yu
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, China
| |
Collapse
|
21
|
Abstract
The two-phase design is a cost-effective sampling strategy to evaluate the effects of covariates on an outcome when certain covariates are too expensive to be measured on all study subjects. Under such a design, the outcome and inexpensive covariates are measured on all subjects in the first phase and the first-phase information is used to select subjects for measurements of expensive covariates in the second phase. Previous research on two-phase studies has focused largely on the inference procedures rather than the design aspects. We investigate the design efficiency of the two-phase study, as measured by the semiparametric efficiency bound for estimating the regression coefficients of expensive covariates. We consider general two-phase studies, where the outcome variable can be continuous, discrete, or censored, and the second-phase sampling can depend on the first-phase data in any manner. We develop optimal or approximately optimal two-phase designs, which can be substantially more efficient than the existing designs. We demonstrate the improvements of the new designs over the existing ones through extensive simulation studies and two large medical studies.
Collapse
Affiliation(s)
- Ran Tao
- Department of Biostatistics and Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37232.,Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599
| | - Donglin Zeng
- Department of Biostatistics and Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37232.,Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599
| | - Dan-Yu Lin
- Department of Biostatistics and Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37232.,Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599
| |
Collapse
|
22
|
Lin C, Zheng M, Yu W, Wu M. Robust inference for the proportional hazards model with two-phase cohort sampling data. Stat Probab Lett 2019. [DOI: 10.1016/j.spl.2019.05.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
23
|
Pan Y, Ding J, Liu Y. Statistical inference for generalized case-cohort design under the proportional hazards model with parameter constraints. COMMUN STAT-SIMUL C 2019. [DOI: 10.1080/03610918.2018.1458128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Affiliation(s)
- Yingli Pan
- School of Mathematics and Statistics, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Jieli Ding
- School of Mathematics and Statistics, Wuhan University, Wuhan, Hubei, China
| | - Yanyan Liu
- School of Mathematics and Statistics, Wuhan University, Wuhan, Hubei, China
| |
Collapse
|
24
|
Lawless JF, Cook RJ. A new perspective on loss to follow‐up in failure time and life history studies. Stat Med 2019; 38:4583-4610. [DOI: 10.1002/sim.8318] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2018] [Revised: 05/07/2019] [Accepted: 06/20/2019] [Indexed: 11/11/2022]
Affiliation(s)
- Jerald F. Lawless
- Department of Statistics and Actuarial ScienceUniversity of Waterloo Waterloo Ontario Canada
| | - Richard J. Cook
- Department of Statistics and Actuarial ScienceUniversity of Waterloo Waterloo Ontario Canada
| |
Collapse
|
25
|
Lin DY. Discussion of the Paper by R. L. Prentice and Y. Huang - Optimal Designs and Efficient Inference for Biomarker Studies. STATISTICAL THEORY AND RELATED FIELDS 2018; 2:21-22. [PMID: 30662976 PMCID: PMC6333203 DOI: 10.1080/24754269.2018.1493630] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Affiliation(s)
- D Y Lin
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599-7420, U.S.A
| |
Collapse
|
26
|
Keogh RH, Seaman SR, Bartlett JW, Wood AM. Multiple imputation of missing data in nested case-control and case-cohort studies. Biometrics 2018; 74:1438-1449. [PMID: 29870056 PMCID: PMC6481559 DOI: 10.1111/biom.12910] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2017] [Revised: 04/01/2018] [Accepted: 04/01/2018] [Indexed: 12/18/2022]
Abstract
The nested case-control and case-cohort designs are two main approaches for carrying out a substudy within a prospective cohort. This article adapts multiple imputation (MI) methods for handling missing covariates in full-cohort studies for nested case-control and case-cohort studies. We consider data missing by design and data missing by chance. MI analyses that make use of full-cohort data and MI analyses based on substudy data only are described, alongside an intermediate approach in which the imputation uses full-cohort data but the analysis uses only the substudy. We describe adaptations to two imputation methods: the approximate method (MI-approx) of White and Royston (2009) and the “substantive model compatible” (MI-SMC) method of Bartlett et al. (2015). We also apply the “MI matched set” approach of Seaman and Keogh (2015) to nested case-control studies, which does not require any full-cohort information. The methods are investigated using simulation studies and all perform well when their assumptions hold. Substantial gains in efficiency can be made by imputing data missing by design using the full-cohort approach or by imputing data missing by chance in analyses using the substudy only. The intermediate approach brings greater gains in efficiency relative to the substudy approach and is more robust to imputation model misspecification than the full-cohort approach. The methods are illustrated using the ARIC Study cohort. Supplementary Materials provide R and Stata code.
Collapse
Affiliation(s)
- Ruth H Keogh
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, U.K
| | | | | | - Angela M Wood
- Department of Public Health and Primary Care, University of Cambridge, Cambridge, U.K
| |
Collapse
|
27
|
Deng L, Ding J, Liu Y, Wei C. Regression analysis for the proportional hazards model with parameter constraints under case-cohort design. Comput Stat Data Anal 2018. [DOI: 10.1016/j.csda.2017.08.013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
28
|
Kang S, Lu W, Zhang J. ON ESTIMATION OF THE OPTIMAL TREATMENT REGIME WITH THE ADDITIVE HAZARDS MODEL. Stat Sin 2018; 28:1539-1560. [PMID: 30135619 DOI: 10.5705/ss.202016.0543] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
We propose a doubly robust estimation method for the optimal treatment regime based on an additive hazards model with censored survival data. Specifically, we introduce a new semiparametric additive hazard model which allows flexible baseline covariate effects in the control group and incorporates marginal treatment effect and its linear interaction with covariates. In addition, we propose a time-dependent propensity score to construct an A-learning type of estimating equations. The resulting estimator is shown to be consistent and asymptotically normal when either the baseline effect model for covariates or the propensity score is correctly specified. The asymptotic variance of the estimator is consistently estimated using a simple resampling method. Simulation studies are conducted to evaluate the finite-sample performance of the estimators and an application to AIDS clinical trial data is also given to illustrate the methodology.
Collapse
Affiliation(s)
- Suhyun Kang
- North Carolina State University and University of South Carolina
| | - Wenbin Lu
- North Carolina State University and University of South Carolina
| | - Jiajia Zhang
- North Carolina State University and University of South Carolina
| |
Collapse
|
29
|
Lawless JF. Two-phase outcome-dependent studies for failure times and testing for effects of expensive covariates. LIFETIME DATA ANALYSIS 2018; 24:28-44. [PMID: 27900633 DOI: 10.1007/s10985-016-9386-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2016] [Accepted: 11/23/2016] [Indexed: 06/06/2023]
Abstract
Two- or multi-phase study designs are often used in settings involving failure times. In most studies, whether or not certain covariates are measured on an individual depends on their failure time and status. For example, when failures are rare, case-cohort or case-control designs are used to increase the number of failures relative to a random sample of the same size. Another scenario is where certain covariates are expensive to measure, so they are obtained only for selected individuals in a cohort. This paper considers such situations and focuses on cases where we wish to test hypotheses of no association between failure time and expensive covariates. Efficient score tests based on maximum likelihood are developed and shown to have a simple form for a wide class of models and sampling designs. Some numerical comparisons of study designs are presented.
Collapse
Affiliation(s)
- J F Lawless
- Department of Statistics and Actuarial Science, University of Waterloo, 200 University Avenue West, Waterloo, ON, N2L 3G1, Canada.
| |
Collapse
|
30
|
Espin-Garcia O, Craiu RV, Bull SB. Two-phase designs for joint quantitative-trait-dependent and genotype-dependent sampling in post-GWAS regional sequencing. Genet Epidemiol 2017; 42:104-116. [PMID: 29239496 PMCID: PMC5814750 DOI: 10.1002/gepi.22099] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2017] [Revised: 10/23/2017] [Accepted: 10/23/2017] [Indexed: 11/09/2022]
Abstract
We evaluate two‐phase designs to follow‐up findings from genome‐wide association study (GWAS) when the cost of regional sequencing in the entire cohort is prohibitive. We develop novel expectation‐maximization‐based inference under a semiparametric maximum likelihood formulation tailored for post‐GWAS inference. A GWAS‐SNP (where SNP is single nucleotide polymorphism) serves as a surrogate covariate in inferring association between a sequence variant and a normally distributed quantitative trait (QT). We assess test validity and quantify efficiency and power of joint QT‐SNP‐dependent sampling and analysis under alternative sample allocations by simulations. Joint allocation balanced on SNP genotype and extreme‐QT strata yields significant power improvements compared to marginal QT‐ or SNP‐based allocations. We illustrate the proposed method and evaluate the sensitivity of sample allocation to sampling variation using data from a sequencing study of systolic blood pressure.
Collapse
Affiliation(s)
- Osvaldo Espin-Garcia
- Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada.,Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
| | - Radu V Craiu
- Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada
| | - Shelley B Bull
- Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada.,Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
| |
Collapse
|
31
|
Bull SB, Andrulis IL, Paterson AD. Statistical challenges in high-dimensional molecular and genetic epidemiology. CAN J STAT 2017. [DOI: 10.1002/cjs.11342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Shelley B. Bull
- Lunenfeld-Tanenbaum Research Institute; Sinai Health System; Toronto Ontario, Canada M5T 3L9
- Dalla Lana School of Public Health; University of Toronto; Toronto, Ontario Canada M5T 3M7
| | - Irene L. Andrulis
- Lunenfeld-Tanenbaum Research Institute; Sinai Health System; Toronto Ontario, Canada M5T 3L9
- Department of Molecular Genetics; University of Toronto; Toronto, Ontario Canada M5S 1A8
| | - Andrew D. Paterson
- Dalla Lana School of Public Health; University of Toronto; Toronto, Ontario Canada M5T 3M7
- Genetics and Genome Biology Program; The Hospital for Sick Children; Toronto, Ontario Canada M5G 0A4
| |
Collapse
|
32
|
Wu M, Zheng M, Yu W, Wu R. Estimation and variable selection for semiparametric transformation models under a more efficient cohort sampling design. TEST-SPAIN 2017. [DOI: 10.1007/s11749-017-0562-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
33
|
Yan Y, Zhou H, Cai J. Improving efficiency of parameter estimation in case-cohort studies with multivariate failure time data. Biometrics 2017; 73:1042-1052. [PMID: 28112795 DOI: 10.1111/biom.12657] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2016] [Revised: 12/01/2016] [Accepted: 12/01/2016] [Indexed: 11/30/2022]
Abstract
The case-cohort study design is an effective way to reduce cost of assembling and measuring expensive covariates in large cohort studies. Recently, several weighted estimators were proposed for the case-cohort design when multiple diseases are of interest. However, these existing weighted estimators do not make effective use of the covariate information available in the whole cohort. Furthermore, the auxiliary information for the expensive covariates, which may be available in the studies, cannot be incorporated directly. In this article, we propose a class of updated-estimators. We show that, by making effective use of the whole cohort information, the proposed updated-estimators are guaranteed to be more efficient than the existing weighted estimators asymptotically. Furthermore, they are flexible to incorporate the auxiliary information whenever available. The advantages of the proposed updated-estimators are demonstrated in simulation studies and a real data analysis.
Collapse
Affiliation(s)
- Ying Yan
- Department of Mathematics and Statistics, University of Calgary, Calgary, Alberta, Canada, T2N 1N4
| | - Haibo Zhou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| |
Collapse
|
34
|
Zhou Q, Cai J, Zhou H. Outcome-dependent sampling with interval-censored failure time data. Biometrics 2017; 74:58-67. [PMID: 28771664 DOI: 10.1111/biom.12744] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Revised: 06/01/2017] [Accepted: 06/01/2017] [Indexed: 11/30/2022]
Abstract
Epidemiologic studies and disease prevention trials often seek to relate an exposure variable to a failure time that suffers from interval-censoring. When the failure rate is low and the time intervals are wide, a large cohort is often required so as to yield reliable precision on the exposure-failure-time relationship. However, large cohort studies with simple random sampling could be prohibitive for investigators with a limited budget, especially when the exposure variables are expensive to obtain. Alternative cost-effective sampling designs and inference procedures are therefore desirable. We propose an outcome-dependent sampling (ODS) design with interval-censored failure time data, where we enrich the observed sample by selectively including certain more informative failure subjects. We develop a novel sieve semiparametric maximum empirical likelihood approach for fitting the proportional hazards model to data from the proposed interval-censoring ODS design. This approach employs the empirical likelihood and sieve methods to deal with the infinite-dimensional nuisance parameters, which greatly reduces the dimensionality of the estimation problem and eases the computation difficulty. The consistency and asymptotic normality of the resulting regression parameter estimator are established. The results from our extensive simulation study show that the proposed design and method works well for practical situations and is more efficient than the alternative designs and competing approaches. An example from the Atherosclerosis Risk in Communities (ARIC) study is provided for illustration.
Collapse
Affiliation(s)
- Qingning Zhou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| | - Haibo Zhou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| |
Collapse
|
35
|
Ning Y, Yi G, Reid N. A Class of Weighted Estimating Equations for Semiparametric Transformation Models with Missing Covariates. Scand Stat Theory Appl 2017. [DOI: 10.1111/sjos.12289] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Affiliation(s)
- Yang Ning
- Department of Statistical Science; Cornell University; Ithaca USA
| | - Grace Yi
- Department of Statistics and Actuarial Science; University of Waterloo; Waterloo Canada
| | - Nancy Reid
- Department of Statistical Sciences; University of Toronto; Toronto Canada
| |
Collapse
|
36
|
Zheng Y, Brown M, Lok A, Cai T. IMPROVING EFFICIENCY IN BIOMARKER INCREMENTAL VALUE EVALUATION UNDER TWO-PHASE DESIGNS. Ann Appl Stat 2017; 11:638-654. [PMID: 28943991 PMCID: PMC5604898 DOI: 10.1214/16-aoas997] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Cost-effective yet efficient designs are critical to the success of biomarker evaluation research. Two-phase sampling designs, under which expensive markers are only measured on a subsample of cases and non-cases within a prospective cohort, are useful in novel biomarker studies for preserving study samples and minimizing cost of biomarker assaying. Statistical methods for quantifying the predictiveness of biomarkers under two-phase studies have been proposed (Cai and Zheng, 2012; Liu, Cai and Zheng, 2012). These methods are based on a class of inverse probability weighted (IPW) estimators where weights are 'true' sampling weights that simply reflect the sampling strategy of the study. While simple to implement, existing IPW estimators are limited by lack of practicality and efficiency. In this manuscript, we investigate a variety of two-phase design options and provide statistical approaches aimed at improving the efficiency of simple IPW estimators by incorporating auxiliary information available for the entire cohort. We consider accuracy summary estimators that accommodate auxiliary information in the context of evaluating the incremental values of novel biomarkers over existing prediction tools. In addition, we evaluate the relative efficiency of a variety of sampling and estimation options under two-phase studies, shedding light on issues pertaining to both the design and analysis of biomarker validation studies. We apply our methods to the evaluation of a novel biomarker for liver cancer risk conducted with a two-phase nested case control design (Lok et al., 2010).
Collapse
Affiliation(s)
- Yingye Zheng
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109
| | - Marshall Brown
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109
| | - Anna Lok
- Division of Gastroenterology, University of Michigan Ann Arbor, MI 48109
| | - Tianxi Cai
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115
| |
Collapse
|
37
|
Liu D, Cai T, Lok A, Zheng Y. Nonparametric Maximum Likelihood Estimators of Time-Dependent Accuracy Measures for Survival Outcome Under Two-Stage Sampling Designs. J Am Stat Assoc 2017; 113:882-892. [PMID: 30555194 PMCID: PMC6291304 DOI: 10.1080/01621459.2017.1295866] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2014] [Revised: 12/01/2016] [Indexed: 12/24/2022]
Abstract
Large prospective cohort studies of rare chronic diseases require thoughtful planning of study designs, especially for biomarker studies when measurements are based on stored tissue or blood specimens. Two-phase designs, including nested case-control (Thomas, 1977) and case-cohort (Prentice, 1986) sampling designs, provide cost-effective strategies for conducting biomarker evaluation studies. Existing literature for biomarker assessment under two-phase designs largely focuses on simple inverse probability weighting (IPW) estimators (Cai and Zheng, 2011; Liu et al., 2012). Drawing on recent theoretical development on the maximum likelihood estimators for relative risk parameters in two-phase studies (Scheike and Martinussen, 2004; Zeng et al., 2006), we propose nonparametric maximum likelihood based estimators to evaluate the accuracy and predictiveness of a risk prediction biomarker under both types of two-phase designs. In addition, hybrid estimators that combine IPW estimators and maximum likelihood estimation procedure are proposed to improve efficiency and alleviate computational burden. We derive large sample properties of proposed estimators and evaluate their finite sample performance using numerical studies. We illustrate new procedures using a two-phase biomarker study aiming to evaluate the accuracy of a novel biomarker, des-γ-carboxy prothrombin, for early detection of hepatocellular carcinoma (Lok et al., 2010).
Collapse
Affiliation(s)
- Dandan Liu
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37232
| | - Tianxi Cai
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts 02115
| | - Anna Lok
- Division of Gastroenterology, University of Michigan, Ann Arbor, MI 48109
| | - Yingye Zheng
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109
| |
Collapse
|
38
|
Abstract
The case-cohort design has been widely used as a means of cost reduction in assembling or measuring expensive covariates in large cohort studies. The existing literature on the case-cohort design is mainly focused on right-censored data. In practice, however, the failure time is often subject to interval-censoring; it is known only to fall within some random time interval. In this paper, we consider the case-cohort study design for interval-censored failure time and develop a sieve semiparametric likelihood approach for analyzing data from this design under the proportional hazards model. We construct the likelihood function using inverse probability weighting and build the sieves with Bernstein polynomials. The consistency and asymptotic normality of the resulting regression parameter estimator are established and a weighted bootstrap procedure is considered for variance estimation. Simulations show that the proposed method works well for practical situations, and an application to real data is provided.
Collapse
Affiliation(s)
- Q Zhou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| | - H Zhou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| | - J Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| |
Collapse
|
39
|
Steingrimsson JA, Strawderman RL. Estimation in the semiparametric accelerated failure time model with missing covariates: improving efficiency through augmentation. J Am Stat Assoc 2017; 112:1221-1235. [PMID: 33033419 DOI: 10.1080/01621459.2016.1205500] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
This paper considers linear regression with missing covariates and a right censored outcome. We first consider a general two-phase outcome sampling design, where full covariate information is only ascertained for subjects in phase two and sampling occurs under an independent Bernoulli sampling scheme with known subject-specific sampling probabilities that depend on phase one information (e.g., survival time, failure status and covariates). The semiparametric information bound is derived for estimating the regression parameter in this setting. We also introduce a more practical class of augmented estimators that is shown to improve asymptotic efficiency over simple but inefficient inverse probability of sampling weighted estimators. Estimation for known sampling weights and extensions to the case of estimated sampling weights are both considered. The allowance for estimated sampling weights permits covariates to be missing at random according to a monotone but unknown mechanism. The asymptotic properties of the augmented estimators are derived and simulation results demonstrate substantial efficiency improvements over simpler inverse probability of sampling weighted estimators in the indicated settings. With suitable modification, the proposed methodology can also be used to improve augmented estimators previously used for missing covariates in a Cox regression model.
Collapse
Affiliation(s)
| | - Robert L Strawderman
- Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY 14642,
| |
Collapse
|
40
|
Kang S, Lu W, Liu M. Efficient estimation for accelerated failure time model under case-cohort and nested case-control sampling. Biometrics 2016; 73:114-123. [PMID: 27479331 DOI: 10.1111/biom.12573] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2015] [Revised: 06/01/2016] [Accepted: 06/01/2016] [Indexed: 11/28/2022]
Abstract
Case-cohort (Prentice, 1986) and nested case-control (Thomas, 1977) designs have been widely used as a cost-effective alternative to the full-cohort design. In this article, we propose an efficient likelihood-based estimation method for the accelerated failure time model under case-cohort and nested case-control designs. An EM algorithm is developed to maximize the likelihood function and a kernel smoothing technique is adopted to facilitate the estimation in the M-step of the EM algorithm. We show that the proposed estimators for the regression coefficients are consistent and asymptotically normal. The asymptotic variance of the estimators can be consistently estimated using an EM-aided numerical differentiation method. Simulation studies are conducted to evaluate the finite-sample performance of the estimators and an application to a Wilms tumor data set is also given to illustrate the methodology.
Collapse
Affiliation(s)
- Suhyun Kang
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, U.S.A
| | - Wenbin Lu
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, U.S.A
| | - Mengling Liu
- Department of Environmental Medicine, New York University School of Medicine, New York, New York, U.S.A
| |
Collapse
|
41
|
Breslow NE, Hu J, Wellner JA. Z-estimation and stratified samples: application to survival models. LIFETIME DATA ANALYSIS 2015; 21:493-516. [PMID: 25588605 PMCID: PMC4503541 DOI: 10.1007/s10985-014-9317-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2014] [Accepted: 12/29/2014] [Indexed: 06/04/2023]
Abstract
The infinite dimensional Z-estimation theorem offers a systematic approach to joint estimation of both Euclidean and non-Euclidean parameters in probability models for data. It is easily adapted for stratified sampling designs. This is important in applications to censored survival data because the inverse probability weights that modify the standard estimating equations often depend on the entire follow-up history. Since the weights are not predictable, they complicate the usual theory based on martingales. This paper considers joint estimation of regression coefficients and baseline hazard functions in the Cox proportional and Lin-Ying additive hazards models. Weighted likelihood equations are used for the former and weighted estimating equations for the latter. Regression coefficients and baseline hazards may be combined to estimate individual survival probabilities. Efficiency is improved by calibrating or estimating the weights using information available for all subjects. Although inefficient in comparison with likelihood inference for incomplete data, which is often difficult to implement, the approach provides consistent estimates of desired population parameters even under model misspecification.
Collapse
Affiliation(s)
- Norman E Breslow
- Department of Biostatistics, University of Washington, Seattle, WA, USA.
| | - Jie Hu
- Department of Biostatistics, University of Washington, Seattle, WA, USA.
| | - Jon A Wellner
- Department of Statistics, University of Washington, Seattle, WA, USA.
| |
Collapse
|
42
|
Derkach A, Lawless JF, Sun L. Score tests for association under response-dependent sampling designs for expensive covariates. Biometrika 2015. [DOI: 10.1093/biomet/asv038] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|